gnes.score_fn.chunk module¶

class gnes.score_fn.chunk.BM25ChunkScoreFn(threshold=0.8, *args, **kwargs)[source]¶

Bases: gnes.score_fn.base.CombinedScoreFn

score = relevance * idf(q_chunk) * tf(q_chunk) * (k1 + 1) / (tf(q_chunk) +
k1 * (1 - b + b * (chunk_in_doc / avg_chunk_in_doc)))
in bm25 algorithm:
idf(q_chunk) = log(1 + (doc_count - f(q_chunk) +0.5) / (f(q_chunk) + 0.5)),

where f(q_chunk) is number of docs that contains q_chunk. In our system, this denotes number of docs appearing in query results.

In elastic search, b = 0.75, k1 = 1.2

train(*args, **kwargs)¶

Train the model, need to be overrided

class gnes.score_fn.chunk.CoordChunkScoreFn(score_mode='multiply', *args, **kwargs)[source]¶

Bases: gnes.score_fn.base.CombinedScoreFn

score = relevance * query_coordination query_coordination: #chunks return / #chunks in this doc(query doc)

Parameters:score_mode (str) – specifies how the computed scores are combined
train(*args, **kwargs)¶

Train the model, need to be overrided

class gnes.score_fn.chunk.TFIDFChunkScoreFn(threshold=0.8, *args, **kwargs)[source]¶

Bases: gnes.score_fn.base.CombinedScoreFn

score = relevance * tf(q_chunk) * (idf(q_chunk)**2) tf(q_chunk) is calculated based on the relevance of query result. tf(q_chunk) = number of queried chunks where relevance >= threshold idf(q_chunk) = log(total_chunks / tf(q_chunk) + 1)

train(*args, **kwargs)¶

Train the model, need to be overrided

class gnes.score_fn.chunk.WeightedChunkOffsetScoreFn(score_mode='multiply', *args, **kwargs)[source]¶

Bases: gnes.score_fn.base.CombinedScoreFn

score = d_chunk.weight * relevance * offset_divergence * q_chunk.weight offset_divergence is calculated based on doc_type:

TEXT && VIDEO && AUDIO: offset is 1-D IMAGE: offset is 2-D
Parameters:score_mode (str) – specifies how the computed scores are combined
train(*args, **kwargs)¶

Train the model, need to be overrided

class gnes.score_fn.chunk.WeightedChunkScoreFn(score_mode='multiply', *args, **kwargs)[source]¶

Bases: gnes.score_fn.base.CombinedScoreFn

score = d_chunk.weight * relevance * q_chunk.weight

Parameters:score_mode (str) – specifies how the computed scores are combined
train(*args, **kwargs)¶

Train the model, need to be overrided