Project

General

Profile

Bug #1755

In partial_header_match, increase penalty for some KPs where start and end words are same as header variant

Added by Nandini Bansal about 3 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Target version:
Start date:
11/03/2021
Due date:
% Done:

0%

Estimated time:
2.00 h

Description

There are cases where word count of KP > word count of header variant and the uncommon word in the KP is VERB. The POS tag of the KP is to be found using Spacy. We need to increase the penalty of such cases to 0.5 as they are quite horrible looking KPs that shouldn't be tagged in the final annotated file.

E.g.
KP | Header Variant
1) modules provide interfaces | module interface
[('modules', 'NOUN'), ('provide', 'VERB'), ('interfaces', 'NOUN')]
[('module', 'NOUN'), ('interface', 'NOUN')]

2) module contains functions | module functions
[('module', 'NOUN'), ('contains', 'VERB'), ('functions', 'NOUN')]
[('module', 'NOUN'), ('functions', 'NOUN')]

3) python validates bytecode | python bytecode
[('python', 'PROPN'), ('validates', 'VERB'), ('bytecode', 'VERB')]
[('python', 'PROPN'), ('bytecode', 'PROPN')]

The KPs are from the Library Reference book. Recommended to create a dummy dataset.

#1

Updated by Nandini Bansal about 3 years ago

  • Description updated (diff)
#2

Updated by Nandini Bansal about 3 years ago

  • Assignee set to Anonymous
#3

Updated by Nandini Bansal about 3 years ago

  • Start date changed from 10/14/2021 to 11/03/2021
#4

Updated by Anonymous about 3 years ago

  • Status changed from New to In Progress
#5

Updated by Anonymous about 3 years ago

  • Status changed from In Progress to Resolved

Also available in: Atom PDF