How to make language detection with langid.py faster
A simple multilingual lemmatizer for Python
Evaluation of date extraction tools for Python
Evaluating scraping and text extraction tools for Python
Indexing text with ElasticSearch
Parsing and converting HTML documents to XML format using Python’s lxml
A note on Computational Models of Psycholinguistics
Review of the readability checker DeLite
On global vs. local visualization of readability
“Gerolinguistics” and text comprehension
Microsoft to analyze social networks to determine comprehension level
Amazon’s readability statistics by example
Canadian research on readability in the ‘90s
Word lists, word frequency and contextual diversity
Interview with children’s books author Sabine Ludwig
Tendencies in research on readability
A note on Amazon’s text readability stats
Workshop on Complexity in Language – Day 2 (report)
Workshop on Complexity in Language - Day 1 (report)
Simon, Gell-Mann and Lloyd on complex systems
Melanie Mitchell: defining and measuring complexity
Renate Bartsch on linguistic complexity
E. Castello, Text Complexity and Reading Comprehension Tests - Reading Notes
Commented bibliography on readability assessment
Comparison of Features for Automatic Readability Assessment: review
Ad hoc and general-purpose corpus construction from web sources
Collection and indexing of tweets with a geographical focus
Analysis of the German Reddit corpus
Review of the Czech internet corpus
2nd release of the German Political Speeches Corpus
Introducing the German Political Speeches Corpus and Visualization Tool
Replicating the BootCat method to build web corpora from search engines
How to download web pages in parallel and follow politeness rules in Python
An easy way to save time and resources: content-aware URL filtering
Web scraping with R: Text and metadata extraction
Using a rule-based tokenizer for German
Using RSS and Atom feeds to collect web pages with Python
Using sitemaps to crawl websites on the command-line
Validating TEI-XML documents with Python
Extracting the main text content from web pages using Python
A module to extract date information from web pages
Rule-based URL cleaning for text collections
Recipes for several model fitting techniques in R
Data analysis and modeling in R: a crash course
Completing web pages on the fly with JavaScript
Display long texts with CSS, tutorial and example
Crawling a newspaper website to build a corpus
Building a basic specialized crawler