Blind reason, Leibniz and the age of cybernetics
Bibliography and links updates
Philosophy of technology, how things started: a typology
Philosophy of technology: a few resources
Three series of recorded lectures
Commented bibliography on readability assessment
Comparison of Features for Automatic Readability Assessment: review
A short bibliography on Latent Semantic Analysis and Indexing
How to download web pages in parallel and follow politeness rules in Python
An easy way to save time and resources: content-aware URL filtering
Web scraping with R: Text and metadata extraction
Using RSS and Atom feeds to collect web pages with Python
Validating TEI-XML documents with Python
Extracting the main text content from web pages using Python
A module to extract date information from web pages
Indexing text with ElasticSearch
Parsing and converting HTML documents to XML format using Python’s lxml
Rule-based URL cleaning for text collections
Guessing if a URL points to a WordPress blog
Batch file conversion to the same encoding on Linux
Recipes for several model fitting techniques in R
Data analysis and modeling in R: a crash course
Find and delete LaTeX temporary files
Franco-German workshop series on the historical illustrated press
On the creation and use of social media resources
On the interest of social media corpora
Finding viable seed URLs for web corpora
A few links on producing posters using LaTeX
Workshop on Complexity in Language – Day 2 (report)
Replicating the BootCat method to build web corpora from search engines
Two studies on toponyms in literary texts
Collection and indexing of tweets with a geographical focus
Analysis of the German Reddit corpus
Challenges in web corpus construction for low-resource languages
Review of the Czech internet corpus
Batch file conversion to the same encoding on Linux
What is good enough to become part of a web corpus?
Feeding the COW at the FU Berlin
Two open-source corpus-builders for German and French
2nd release of the German Political Speeches Corpus
XML standards for language corpora (review)
Canadian research on readability in the ‘90s
Word lists, word frequency and contextual diversity
Parallel work with two taggers
Introducing the German Political Speeches Corpus and Visualization Tool
Quick review of the Falko Project
Building a topic-specific corpus out of two different corpora
An easy way to save time and resources: content-aware URL filtering
Extracting the main text content from web pages using Python
A module to extract date information from web pages
Guessing if a URL points to a WordPress blog
Introducing the Microblog Explorer
Data analysis and modeling in R: a crash course
Ludovic Tanguy on Visual Analysis of Linguistic Data
On global vs. local visualization of readability
Microsoft to analyze social networks to determine comprehension level
Replicating the BootCat method to build web corpora from search engines
How to make language detection with langid.py faster
How to download web pages in parallel and follow politeness rules in Python
Web scraping with Trafilatura just got faster
Using a rule-based tokenizer for German
A simple multilingual lemmatizer for Python
Validating TEI-XML documents with Python
Extracting the main text content from web pages using Python
A module to extract date information from web pages
Parsing and converting HTML documents to XML format using Python’s lxml
Review of the readability checker DeLite
On global vs. local visualization of readability
“Gerolinguistics” and text comprehension
Microsoft to analyze social networks to determine comprehension level
Amazon’s readability statistics by example
Interview with children’s books author Sabine Ludwig
Tendencies in research on readability
A note on Amazon’s text readability stats
Lord Kelvin, Bachelard and Dilbert on Measurement
Renate Bartsch on linguistic complexity
E. Castello, Text Complexity and Reading Comprehension Tests - Reading Notes
Commented bibliography on readability assessment
Comparison of Features for Automatic Readability Assessment: review
Review of the Czech internet corpus
Overview of URL analysis and classification methods
A note on Computational Models of Psycholinguistics
Feeding the COW at the FU Berlin
Ludovic Tanguy on Visual Analysis of Linguistic Data
Review of the readability checker DeLite
On global vs. local visualization of readability
“Gerolinguistics” and text comprehension
Microsoft to analyze social networks to determine comprehension level
XML standards for language corpora (review)
Canadian research on readability in the ‘90s
Word lists, word frequency and contextual diversity
Tendencies in research on readability
Introducing the German Political Speeches Corpus and Visualization Tool
“Googleology is bad science”: Anatomy of a web corpus infrastructure
Web scraping with Trafilatura just got faster
Using a rule-based tokenizer for German
Evaluation of date extraction tools for Python
Evaluating scraping and text extraction tools for Python
Validating TEI-XML documents with Python
Batch file conversion to the same encoding on Linux
Two open-source corpus-builders for German and French
Parallel work with two taggers
Crawling a newspaper website to build a corpus
“Googleology is bad science”: Anatomy of a web corpus infrastructure
Replicating the BootCat method to build web corpora from search engines
How to download web pages in parallel and follow politeness rules in Python
Web scraping with Trafilatura just got faster
Web scraping with R: Text and metadata extraction
Using RSS and Atom feeds to collect web pages with Python
Using sitemaps to crawl websites on the command-line
Filtering links to gather texts on the web
Evaluation of date extraction tools for Python
Evaluating scraping and text extraction tools for Python
Validating TEI-XML documents with Python
Extracting the main text content from web pages using Python
A module to extract date information from web pages
Ad hoc and general-purpose corpus construction from web sources
Two studies on toponyms in literary texts
Distant reading and text visualization
Analysis of the German Reddit corpus
Data analysis and modeling in R: a crash course
Ludovic Tanguy on Visual Analysis of Linguistic Data
On global vs. local visualization of readability
Amazon’s readability statistics by example
“Googleology is bad science”: Anatomy of a web corpus infrastructure
Replicating the BootCat method to build web corpora from search engines
Using sitemaps to crawl websites on the command-line
Filtering links to gather texts on the web
Evaluation of date extraction tools for Python
Evaluating scraping and text extraction tools for Python
Extracting the main text content from web pages using Python
A module to extract date information from web pages
On the interest of social media corpora
Collection and indexing of tweets with a geographical focus
Analysis of the German Reddit corpus
Finding viable seed URLs for web corpora
Challenges in web corpus construction for low-resource languages
Review of the Czech internet corpus
“Googleology is bad science”: Anatomy of a web corpus infrastructure
How to download web pages in parallel and follow politeness rules in Python
An easy way to save time and resources: content-aware URL filtering
Web scraping with R: Text and metadata extraction
Using sitemaps to crawl websites on the command-line
Ad hoc and general-purpose corpus construction from web sources
Finding viable seed URLs for web corpora
Challenges in web corpus construction for low-resource languages
Guessing if a URL points to a WordPress blog
Overview of URL analysis and classification methods
Introducing the Microblog Explorer
What is good enough to become part of a web corpus?
Feeding the COW at the FU Berlin
Two open-source corpus-builders for German and French
Crawling a newspaper website to build a corpus
Building a basic specialized crawler
How to make language detection with langid.py faster
How to download web pages in parallel and follow politeness rules in Python
Web scraping with Trafilatura just got faster
Using RSS and Atom feeds to collect web pages with Python
A simple multilingual lemmatizer for Python
Using sitemaps to crawl websites on the command-line
Filtering links to gather texts on the web
Evaluating scraping and text extraction tools for Python
Validating TEI-XML documents with Python
Extracting the main text content from web pages using Python