transformer_rankers.datasets.preprocess_pr.transform_trec2020pr_to_dfs¶
-
transformer_rankers.datasets.preprocess_pr.
transform_trec2020pr_to_dfs
(path)[source]¶ Transforms TREC 2020 Passage Ranking files (https://microsoft.github.io/TREC-2020-Deep-Learning/) to train, valid and test dfs containing only positive query-passage combinations.
- Parameters
path – str with the path for the TREC folder containing: - collection.tar.gz (uncompressed: collection.tsv) - queries.tar.gz (uncompressed: queries.train.tsv, queries.dev.tsv) - qrels.dev.tsv - qrels.train.tsv
Returns: (train, valid, test) pandas DataFrames