transformer_rankers.datasets.preprocess_crr.read_crr_tsv_as_df

transformer_rankers.datasets.preprocess_crr.read_crr_tsv_as_df(path, nrows=- 1, add_turn_separator=True)[source]

Transforms conversation response ranking tsv file to a pandas DataFrame.

The format is label utterance_1 utterance_2 …… candidate_response. See https://guzpenha.github.io/MANtIS/ for more details. Since we do the negative sampling ourselves, we do not get the negative samples from the tsv files, and only read lines with label = 1.

Parameters
  • path – str with the path for the .tsv file.

  • nrows – int indicating the number of rows to read from the file

  • add_turn_separator – whether to add [TURN_SEP] to the context every 2 utterances or not.

Returns: pandas DataFrame containing two columns “context” and “response”.