Project

General

Profile

Task #1898

Skip the repeating KPs within 50 words according to new scheme - Contd

Added by Nandini Bansal about 3 years ago. Updated almost 3 years ago.

Status:
In Progress
Priority:
Normal
Target version:
Start date:
11/17/2021
Due date:
% Done:

0%

Estimated time:
3.00 h

Description

The repeating KPs within 50 words should be skipped as per the new scheme:

If multiple instances of the same KP are tagged within the threshold (50 words) like k1, k2, k3 (same order)
Case 5) if sk1 = sk2 = sk3,
all k1, k2, k3 will be tagged were previously getting tagged because their lemmatized forms are the same.
i.e. if the true/original form of KP k1, k2, k3 are the same, we shall not tag all three but k1 only

A hypothetical example to explain this would be, say
k1 "allocators", k2 "allocation", k3 "allocations" and sk1 = sk2 = sk3

Then k1 & k2 will be tagged

k1 & (k2, k3) are originally different but getting tagged due to the same lemmatized forms
Note: skn = sim_score of kn

Changes to be done in "check_repetition_multi_occurence" function

Also available in: Atom PDF