Document-driven and data-driven, standoff and inline
First of all, the intention of the encoding can be different. Richard Eckart summarizes two main trends: document-driven XML and data-driven XML. While the first uses an « inline approach » and is « usually easily human-readable and meaningful even without the annotations », the latter is « geared towards machine processing and functions like a database record. […] The order of elements often is meaningless. » (Eckart 2008 p. 3)
In fact, several choices of architecture depend on the goal of an annotation using XML. The main division regards standoff and inline XML (also : stand-off and in-line).
The Paula format (“Potsdamer Austauschformat für linguistische Annotation”, ‘Potsdam Interchange Format for Linguistic Annotation’) chose both approaches. So did Nancy Ide for the ANC Project, a series of tools enable the users to convert the data between well-known formats (GrAF standoff, GrAF inline, GATE or UIMA). This versatility seems to be a good point, since you cannot expect corpus users to change their habits just because of one single corpus. Regarding the way standoff and inline annotation compare, (Dipper et al. 2007) found that the inline format (with pointers) performs better.
A few trends in linguistic research
Speaking about trends in the German …more ...