transformer_rankers.datasets.preprocess_crr.transform_dstc8_to_tsv

transformer_rankers.datasets.preprocess_crr.transform_dstc8_to_tsv(path)[source]

Transforms dstc8 json format to conversation response ranking tsv file.

See https://github.com/dstc8-track2/NOESIS-II/ for more details of the input format. The output format is label utterance_1 utterance_2 …… candidate_response. Since we do the negative sampling ourselves, we do not get the negative samples from the tsv files, and only read lines with label = 1.

Parameters

path – str with the path for the json file.

Returns: list with the tsv lines.