I recently introduced at the LTC‘13 conference a tool I developed to help performing fast text analysis on web corpora: a one-pass valency-oriented chunker for German.
“It turns out that topological fields together with chunked phrases provide a solid basis for a robust analysis of German sentence structure.” E. W. Hinrichs, “Finite-State Parsing of German”, in Inquiries into Words, Constraints and Contexts, A. Arppe and et al. (eds.), Stanford: CSLI Publications, pp. 35–44, 2005.
Non-finite state parsers provide fine-grained information but they are computationally demanding, so that it can be interesting to see how far a shallow parsing approach is able to go.
The transducer described here consists in a pattern-based matching operation of POS-tags using regular expressions that takes advantage of the characteristics of German grammar. The process aims at finding linguistically relevant phrases with a good precision, which enables in turn an estimation of the actual valency of a given verb.
The chunker reads its input exactly once instead of using cascades, which greatly benefits computational efficiency.
This finite-state chunking approach does not return a tree structure, but rather yields various kinds of linguistic information useful to the language researcher: possible applications include …more ...