transformer_rankers.datasets.preprocess_sqr.transform_linkso_to_duplicates_dfs¶
-
transformer_rankers.datasets.preprocess_sqr.
transform_linkso_to_duplicates_dfs
(path)[source]¶ Transforms linkso files to train, test and valid pandas DF [“Q1”, “Q2”] containing only duplicated questions. Since the list of test and valid ids provided by the authors is identical (https://sites.google.com/view/linkso) we just split valid into two.
- Parameters
path – str with the folder containing java javascript python folders each containing
following files [<language>_cosidf.txt, <language>_qid2all.txt, <language>_test_qid.txt, (the) –
<language>_valid_qid.txt] (<language>_train_qid.txt,) –
Returns: (train, valid, test) pandas DataFrames