This post introduces two ways to validate XML documents in Python according the guidelines of the Text Encoding Initiative, using a format commonly known as TEI-XML. The first one takes a shortcut using a library I am working on, while the second one shows an exhaustive way to perform the operation.
Both ground on LXML, an efficient library for processing XML and HTML. The following lines of code will try to parse and validate a document in the same directory as the terminal window or Python console.
Shortcut with the trafilatura library
I am currently using this web scraping …more ...