Project

General

Profile

Task #1898

Updated by Nandini Bansal about 3 years ago

The repeating KPs within 50 words should be skipped as per the new scheme: 

 If multiple instances of the same KP are tagged within the threshold (50 words) like k1, k2, k3 (same order) 
 Case 5) if sk1 = sk2 = sk3, 
 all k1, k2, k3 will be tagged were previously if they are different words originally but getting tagged because their due to the same lemmatized forms are the same. forms.  
 i.e. if the true/original form of KP k1, k2, k3 are the same, we shall not tag all three but k1 only 

 A hypothetical example to explain this would be, say 
 k1 "allocators", k2 "allocation", k3 "allocations" and sk1 = sk2 = sk3 

 Then k1 & k2 will be tagged 

 k1 & (k2, k3) are originally different but getting tagged due to the same lemmatized forms 
 Note: skn = sim_score of kn 

 Changes to be done in "check_repetition_multi_occurence" function

Back