Bug #1958
Updated by Nandini Bansal almost 3 years ago
In most cases, we would like to remove the KPs if they are exact matches of the document header but there are some cases where we would like to discard KPs which are substrings of the document header.
E.g.
If the document with the header "the python interpreter" includes KP "interpreter", we would like to discard it.
If header is "more on lists", we would not like to discard "list" kp. Some other cases where KP will not be discarded are as follows:
(document header -> header variant)
1. using lists as stacks -> lists
2. nested list comprehension -> list comprehension
3. decimal floating point arithmetic -> floating point arithmetic
Try to see if POS tags can help here. You can start testing with Tutorial book and move to other books gradually