Project

General

Profile

Bug #1616

Some unwanted removals of KPs starting with VBG/IN

Added by Nandini Bansal about 3 years ago. Updated about 3 years ago.

Status:
Closed
Priority:
Normal
Target version:
Start date:
09/06/2021
Due date:
% Done:

0%

Estimated time:
5.00 h

Description

As per Issue #1514 & #1520, we discarded some KPs which were starting with the following POS tags "IN", "VBG", "VB". However, I just noticed that along with the bad ones like "preceding regular expressions", "unforgiving regular expressions", etc, we also losing some good ones like:

1. defining a function -> defining and using functions
2. defining a function -> defining functions
3. importing standard library modules -> importing from python's standard library
4. while true loop -> while loops

The examples listed above are from the Whirlwind book. I am sure there must be other examples in Whirlwind, C API & Tutorial books as well.
Part A: Identify more such cases if there are any.
Part B: Make changes to ensure the above cases and their likes are not skipped. We can leverage the info provided by POS tags for the same. A basic outline of the idea is to check if the first words of header variants and KPs are the same (in their stemmed forms) and apply some filters based on NOUNs.

#1

Updated by Nandini Bansal about 3 years ago

  • Estimated time set to 5.00 h
#2

Updated by Rohit Choudhary about 3 years ago

  • Status changed from New to Resolved
#3

Updated by Nandini Bansal about 3 years ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF