transformer_rankers.datasets.preprocess_crr.read_crr_tsv_as_df¶
-
transformer_rankers.datasets.preprocess_crr.
read_crr_tsv_as_df
(path, nrows=- 1, add_turn_separator=True)[source]¶ Transforms conversation response ranking tsv file to a pandas DataFrame.
The format is label utterance_1 utterance_2 …… candidate_response. See https://guzpenha.github.io/MANtIS/ for more details. Since we do the negative sampling ourselves, we do not get the negative samples from the tsv files, and only read lines with label = 1.
- Parameters
path – str with the path for the .tsv file.
nrows – int indicating the number of rows to read from the file
add_turn_separator – whether to add [TURN_SEP] to the context every 2 utterances or not.
Returns: pandas DataFrame containing two columns “context” and “response”.