Replicating the BootCat method to build web corpora from search engines
This post describes an easy and modern way to gather web sources using search engines by adapting the BootCat method, whose positive and negative aspects are discussed.
more ...This post describes an easy and modern way to gather web sources using search engines by adapting the BootCat method, whose positive and negative aspects are discussed.
more ...The language detector langid.py has become quite popular. Using the modernized fork py3langid as an example I show how to maintain and optimize a Python package.