Bits of Language: corpus linguistics, NLP and text analytics
  • Corpus Linguistics
  • Tutorials
  • Text Complexity

How to make language detection with langid.py faster

The language detector langid.py has become quite popular. Using the modernized fork py3langid as an example I show how to maintain and optimize a Python package.

more ...

About Adrien Barbaresi
I'm a research scientist at the
Berlin-Brandenburg Academy of Sciences

Welcome to my academic blog about web corpora, text mining, computational linguistics and digital humanities.

  • Social

    • Twitter
    • LinkedIn
    • GitHub
  • Tags

    • code snippet
    • corpus linguistics
    • data mining
    • python
    • readability assessment
    • research
    • text cleaning
    • trafilatura
    • web corpus construction
    • web crawling
  • Links

    • Homepage
    • Scientific Publications
    • Web text collections (DWDS)
    • Center for Digital Lexicography of German (ZDL)

© 2021 Adrien Barbaresi · Powered by pelican-bootstrap3, Pelican, Bootstrap

Creative Commons License Content licensed under a Creative Commons Attribution-ShareAlike 4.0 International License, except where indicated otherwise.

Back to top