Project

General

Profile

Feature #1809

Checking the context of the KP matched with "word1.word2" header variant

Added by Nandini Bansal about 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Target version:
Start date:
10/11/2021
Due date:
% Done:

28%

Estimated time:
3.00 h (Total: 10.00 h)

Description

We have generally observed if the KP is matching with the "word1.word2" header variant with KP being equivalent to "word2", the linking of KP and the similar doc is out of context at most of the places as these subsection headers are quite specific to the documents they are part of.

For e.g. KP = method
header variant1 = class.method
header variant2 = request.method

KP is linked to both "header variant1" and "header variant2" with high confidence scores but they might or might not be relevant as "header variant2" is very specific about the topic it is relevant for.

To test this idea, we can fetch the sentence from which the KP is extracted and check if "word1" is present in it. If "word1" is an abbreviation, check for both "word1" & full-form of the "word1" in the context sentence.

Added screenshot of KP "dump" getting matched to "pickle.dump", "ast.dump", etc subsections with high confidence but they are out of context due to their specific nature.

Implementation in generate_candidates function.

Very important to check with the C-API book & Library Reference book.


Files


Subtasks

Feature #1810: Implement the penalty algorithm for KPs matching with "word1.word2" header variants in update_similarity_with_context functionResolved10/11/2021

Actions
Feature #1811: For matching "word1" in with the KP, make use of token_processing functionResolved10/26/2021

Actions
Feature #1812: Analysis for improvising the algorithmsRejected10/27/2021Rohit Choudhary

Actions
Feature #1813: Analysis for improvising the algorithmsIn Progress10/27/2021Nandini Bansal

Actions
#1

Updated by Nandini Bansal about 3 years ago

  • Status changed from New to In Progress
#2

Updated by Anonymous about 3 years ago

  • Status changed from In Progress to Resolved

Also available in: Atom PDF