Project

General

Profile

Bug #1752

In partial_header_match, reduce penalty for some KPs where start and end words are same as header variant

Added by Nandini Bansal about 3 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
Normal
Target version:
Start date:
10/14/2021
Due date:
% Done:

0%

Estimated time:
2.00 h

Description

In partial_header_match, we have a filter after the generation of candidates where we penalize the KPs because they start and end with the same words header variants. But it has been observed that some cases are being penalized unnecessarily.

We need to check the POS tags of the KPs with Spacy where word count of header variant < word count of KP and the uncommon word between header variant and KP is "ADJ" and rest of the words are "NOUN". Reduce the penalty of such cases to 0.05.

For e.g.
1) tkinter standard dialog |%| tkinter dialogs
[('tkinter', 'NOUN'), ('standard', 'ADJ'), ('dialog', 'NOUN')]
[('tkinter', 'NOUN'), ('dialogs', 'NOUN')]

2) ascii lower-case character |%| ascii characters
[('ascii', 'NOUN'), ('lower', 'ADJ'), ('-', 'PUNCT'), ('case', 'NOUN'), ('character', 'NOUN')]
[('ascii', 'NOUN'), ('characters', 'NOUN')]

3) ascii white-space characters |%| ascii characters
[('ascii', 'NOUN'), ('white', 'ADJ'), ('-', 'PUNCT'), ('space', 'NOUN'), ('characters', 'NOUN')]
[('ascii', 'NOUN'), ('characters', 'NOUN')]

#1

Updated by Nandini Bansal about 3 years ago

  • Description updated (diff)
#2

Updated by Nandini Bansal about 3 years ago

  • Assignee set to Rohit Choudhary
#3

Updated by Rohit Choudhary about 3 years ago

  • Status changed from New to In Progress
#4

Updated by Rohit Choudhary about 3 years ago

  • Status changed from In Progress to Resolved

Also available in: Atom PDF