Bug #1644
Testing Change: Modify the method of header variants generation using variations_in_common_section_words
0%
Description
For variations_in_common_section_words, we need to modify the header variants generated by this function. It makes use of a list of words that are the top 50 most common words called "sect_common_words" in the dataset. According to the current implementation, we check whether a header variant starts or ends with a word in "sect_common_words" and if found, we truncate these words from these header variants to generate new header variants.
As per the new testing modifications, instead of just checking one word from start & end, we need to keep checking until we come across a word that is not a part of "sect_common_words". The leftover string (if any) will be added as a header variant.
Testing with Whirlwind Book, Tutorial Book & Lib Ref Book