Bug #1752: In partial_header_match, reduce penalty for some KPs where start and end words are same as header variant - RK-A - Redmine

Bug #1752

In partial_header_match, reduce penalty for some KPs where start and end words are same as header variant

Added by Nandini Bansal over 3 years ago. Updated over 3 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Rohit Choudhary

Target version:

Start date:

10/14/2021

Due date:

% Done:

Estimated time:

2.00 h

Description

In partial_header_match, we have a filter after the generation of candidates where we penalize the KPs because they start and end with the same words header variants. But it has been observed that some cases are being penalized unnecessarily.

We need to check the POS tags of the KPs with Spacy where word count of header variant < word count of KP and the uncommon word between header variant and KP is "ADJ" and rest of the words are "NOUN". Reduce the penalty of such cases to 0.05.

For e.g.
1) tkinter standard dialog |%| tkinter dialogs
[('tkinter', 'NOUN'), ('standard', 'ADJ'), ('dialog', 'NOUN')]
[('tkinter', 'NOUN'), ('dialogs', 'NOUN')]

2) ascii lower-case character |%| ascii characters
[('ascii', 'NOUN'), ('lower', 'ADJ'), ('-', 'PUNCT'), ('case', 'NOUN'), ('character', 'NOUN')]
[('ascii', 'NOUN'), ('characters', 'NOUN')]

3) ascii white-space characters |%| ascii characters
[('ascii', 'NOUN'), ('white', 'ADJ'), ('-', 'PUNCT'), ('space', 'NOUN'), ('characters', 'NOUN')]
[('ascii', 'NOUN'), ('characters', 'NOUN')]

Also available in: Atom PDF

Project

General

Profile

RK-A

Bug #1752

In partial_header_match, reduce penalty for some KPs where start and end words are same as header variant

Updated by Nandini Bansal over 3 years ago

Updated by Nandini Bansal over 3 years ago

Updated by Rohit Choudhary over 3 years ago

Updated by Rohit Choudhary over 3 years ago