Project

General

Profile

Bug #1958

Updated by Nandini Bansal almost 3 years ago

In most cases, we would like to remove the KPs if they are exact matches of the document header but there are some cases where we would like to discard KPs which are substrings of the document header.  

 E.g.  
 If the document with the header "the python interpreter" includes KP "interpreter", we would like to discard it. 

 If header is "more on lists", we would not like to discard "list" kp. Some other cases where KP will not be discarded are as follows: 
 (document header -> header variant) 
 1. using lists as stacks -> lists 
 2. nested list comprehension -> list comprehension 
 3. decimal floating point arithmetic -> floating point arithmetic 

 Try to see if POS tags can help here. You can start testing with Tutorial book and move to other books gradually

Back