transformer_rankers.datasets.preprocess_sqr.transform_linkso_to_duplicates_dfs

transformer_rankers.datasets.preprocess_sqr.transform_linkso_to_duplicates_dfs(path)[source]

Transforms linkso files to train, test and valid pandas DF [“Q1”, “Q2”] containing only duplicated questions. Since the list of test and valid ids provided by the authors is identical (https://sites.google.com/view/linkso) we just split valid into two.

Parameters
  • path – str with the folder containing java javascript python folders each containing

  • following files [<language>_cosidf.txt, <language>_qid2all.txt, <language>_test_qid.txt, (the) –

  • <language>_valid_qid.txt] (<language>_train_qid.txt,) –

Returns: (train, valid, test) pandas DataFrames