transformer_rankers.datasets.preprocess_pr.transform_trec2020pr_to_dfs

transformer_rankers.datasets.preprocess_pr.transform_trec2020pr_to_dfs(path)[source]

Transforms TREC 2020 Passage Ranking files (https://microsoft.github.io/TREC-2020-Deep-Learning/) to train, valid and test dfs containing only positive query-passage combinations.

Parameters

path – str with the path for the TREC folder containing: - collection.tar.gz (uncompressed: collection.tsv) - queries.tar.gz (uncompressed: queries.train.tsv, queries.dev.tsv) - qrels.dev.tsv - qrels.train.tsv

Returns: (train, valid, test) pandas DataFrames