transformer_rankers.negative_samplers.negative_sampling.BM25NegativeSamplerPyserini¶
-
class
transformer_rankers.negative_samplers.negative_sampling.
BM25NegativeSamplerPyserini
(candidates, num_candidates_samples, path_index, sample_data, anserini_folder, set_rm3=False, seed=42)[source]¶ Bases:
object
Sample candidates from a list of candidates using BM25.
The class uses anserini and pyserini which requires JAVA and a installation of anserini. It first generates the candidates, saving then to files, then creates the index via anserini IndexCollection.
- Parameters
candidates – list of str containing the candidates
num_candidates_samples – int containing the number of negative samples for each query.
path_index – str containing the path to create/load the anserini index.
sample_data – int indicating amount of candidates in the index (-1 if all)
anserini_folder – str containing the bin <anserini_folder>/target/appassembler/bin/IndexCollection
set_rm3 – boolean indicating whether to use rm3 or not.
seed – int with the random seed
-
__init__
(candidates, num_candidates_samples, path_index, sample_data, anserini_folder, set_rm3=False, seed=42)[source]¶ Initialize self. See help(type(self)) for accurate signature.
Methods
__init__
(candidates, num_candidates_samples, …)Initialize self.
sample
(query_str, relevant_docs[, max_query_len])Samples from a list of candidates using BM25.
-
sample
(query_str, relevant_docs, max_query_len=512)[source]¶ Samples from a list of candidates using BM25.
If the samples match the relevant doc, then removes it and re-samples randomly.
- Parameters
query_str – the str of the query to be used for BM25
relevant_docs – list with the str of the relevant documents, to avoid sampling them as negative sample.
max_query_len – int containing the maximum number of characters to use as input. (Very long queries will raise a maxClauseCount from anserini.)
- Returns
First the sampled_documents, their respective scores and then indicators if the NS retrieved the relevant document, and if so at which position.