Task #3052
Sophisticated deduplication for getPurpleLinksInfoExtFull API results
Start date:
05/30/2023
Due date:
% Done:
100%
Estimated time:
Description
Today, we do not send exact duplicates. Going forward, mark the following also duplicates
- 'defining a function', 'define function'
- 'threads', 'threading'
- 'migrate', 'migration'
- 'fetch the data', 'fetch data'
- 'socket program', 'socket programming'
- 'installing anaconda', 'install anaconda'
Example URL that contains the above is https://www.youtube.com/watch?v=8O5kX73OkIY&list=PLsyeobzWxl7poL9JTVyndKe62ieoN-MZ3&index=54.
other than (a) adding articles like a, the in between words and checking, (b) it involves comparing stems. This and the (c) plurals work could use some java nlp library. you could pick the simplest from the list at https://xperti.io/blogs/java-natural-language-processing/ ?