transformer_rankers.datasets.processors

Functions

clariq_processor(data_folder)

Gets the train and dev files downloaded from the github and transform it into a DF with [“query”, “clarifying_question”]

linkso_processor(data_folder)

Extracts the LINKSO files downloaded from the drive and creates dfs with [“question”, “similar_question”].

mantis_processor(data_folder)

Gets the compresse file “drive_file” containing the mantis data and preprocess it into a DF with [“conversational_context”, “response”] columns.

msdialog_processor(data_folder)

Gets the compresse file “drive_file” containing the msdialog data and preprocess it into a DF with [“conversational_context”, “response”] columns.

qqp_processor(data_folder)

Extracts the files from Quora Question Pairs downloaded from the drive and creates dfs with [“question”, “similar_question”].

trec2020pr_processor(data_folder)

Extracts the files downloaded and process them into a DF with [“query”, “passage”]

ubuntu_dstc8_processor(data_folder)

Gets the compresse file “drive_file” containing the ubuntu_dstc8 data and preprocess it into a DF with [“conversational_context”, “response”] columns.