Syntactic Annotation

1. General Issues

[up]

Token boundaries

Broadly speaking, one token equates to one matrix sentence (IP-MAT), including an embedded clause if present. But note the following:

  • When two independent finite clauses are conjoined, the two clauses are treated as separate tokens:
(IP-MAT (NP-OB1 (DPDS Dat))     ← first independent clause as a token
        (VVFIN beleueden) 
        (NP-SBJ (PPER se)
                (DIN alle))
        (PP (APPR myt)
            (NP (NA willen)))
)

(IP-MAT (KON und)               ← second independent clause as a token
        (NP-SBJ *con*)      
        (VVFIN scheiden)
        (PP (APPR van)
            (NP (PPER em)))
)
  • Direct speech which constitutes an IP-MAT can sit within a higher IP-MAT introducing the speech. The direct speech matrix clause gets the extended tag -SPE:
(IP-MAT (ADVP (ADV DO))
        (VVFIN sprak)
        (NP-SBJ (PPER he))
        (IP-MAT-SPE (NP-SBJ (PPER ik))    ← IP which is direct speech
                    (PTKNEG ne)
                    (VVFIN bin))
)
  • In contrast to the treatment of direct speech (see above), cases where a citation which is an IP-MAT is introduced by e.g. `X writes’ are treated as separate tokens:
(IP-MAT (NP-SBJ (NE Salustius))   ← first token
        (VVFIN scrift)
)

(IP-MAT (NP-SBJ (DDARTA de)       ← second token
                (FM Troyani))
        (VAFIN hebben)
        (NP-OB1 (NE rome))
        (VVPP ghebuwet)
)
  • Chapter and section headings are treated as standalone tokens and are tagged IP-MAT is they constitute and independent finite clause or otherwise FRAG:

TODO: Tag these as CODE also?

(FRAG (PP (APPR Van)
          (NP (DDARTA dem)
              (NA Borchgherichte)))
)
Herford
  • Places and dates given for the time of writing are also treated as standalone tokens and are tagged FRAG:
(FRAG (FM proximo)
      (FM libro)
      (FM de)
      (FM ciuitate)
      (FM dei)
)
Engelhus
(FRAG (XY 2.000dcclxxx)
      (FM Abbon)
)
Engelhus

 

Token IDs

TODO: add at a later stage

Phrase structure

The annotation is encoded via labelled bracketing and is designed to facilitate efficient searching, rather than to reflect a particular analysis of Middle Low German.

Some key points:

  • Relatively flat trees
  • Multiple branching is possible
  • No VP – the verb and its objects are sisters and immediately dominated by IP
  • No intermediate phrase levels (i.e. bar-levels)
  • NP arguments are distinguished from NP adjuncts at IP-level
  • PP arguments are not distinguished from PP adjuncts

Heads and phrases

In general, heads project a corresponding phrase.

Categories which never project a phrase:

  • verbs (V)
  • determiners (D)
  • particles (PTK)
  • single-word modifiers (see below)
  • interjections (ITJ)

Categories which can project a phrase but can also be immediately dominated by IP:

  • conjunctions (KON)

Phrase types which do not necessarily have a head of the same category:

  • IP – since there is no I tag for verbs
  • NP – which may have a noun (NA) as its head, but can also be headed by a personal pronoun (PPER/PRF), a demonstrative pronoun (DPDS/DPIS) or a proper noun (NE).

A foreign word (tagged FM) may also head a phrase, resulting in `exocentricty’:

(PP (FM Ad)             ← FM heads PP
    (NP (FM hebreos))   ← FM heads NP
)
Engelhus

Complements

Complements always project a phrase.

Modifiers

Modifiers are treated as daughters of the phrasal node and sisters to the head.

Note that:

  • A modifier only projects a phrase when it is itself modified:
(NP-OB1 (DDARTA de)
        (ADJA ersten)  ← modifier which is not further modified; does not project a phrase
        (NA troyen)
)
Engelhus
(NP-SBJ (ADJP (DDA sodane)  ← modifier of modifier
              (ADJA edel))  ← modifier which is further modified; projects a phrase
        (NA land)
)

Selectional restrictions

The annotation observes certain selectional restrictions:

  • Prepositions should have exactly one nominal (NP) or clausal (CP) complement.
  • Any NP which is immediately contained by another NP must be a nominal complement (NP-COMP), genitive (NP-POS) or appositive (NP-PRN).
  • All finite clauses must have a subject (NP-SBJ), whether overt or empty.
  • Certain categories can occur a maximum of once per clause (but need not appear at all). This applies to direct (NP-OB1), indirect objects (NP-OB2), nominal predicates (NP-PRD) and adjectival predicates (ADJP-PRD).

 

Sentence fragments (FRAG)

The label FRAG is employed for material which consists of at least two constituents, but which can’t be represented as a full IP.

Some things to note:

  • Fragmentary clauses which standard alone are labelled as FRAG at the token level (i.e. are not tagged as IP-MAT):
(FRAG (NP (DDARTA De)
          (NA hystorie)
          (PP (APPR van) 
              (NP duldicheyt) 
              (NP-POS (DDARTA der)
                      (NA vrowen) 
                      (RRC (VVPP gheheten)
                           (NP-SMP (NE Griseldis))))))
)
  • FRAG can also occur as an immediate daughter of IP-MAT, in the case of (non-clausal) direct speech:
(IP-MAT (FRAG-SPE (PTKANT Ja))  ← non-clausal direct speech
        (VVFIN sprack)
        (NP-SBJ (PPER se))
)
(IP-MAT (ADVP (AVD Do))
        (VVFIN antworde)
        (NP-SBJ (PPER he))
        (NP-OB2 (PPER ene))
        (FRAG-SPE (PTKANT nen)) ← non-clausal direct speech
)
(IP-MAT (ADVP (AVD Do))
        (VVFIN sprak)
        (NP-SBJ (PPER se))
        (FRAG-SPE (NP-SBJ (DPNEGS neman))  ← non-clausal direct speech
                  (NP-VOC (NA here)))
)
  • FRAG is also used for parenthetical clauses which contain a good deal of foreign material (FM):
(IP-MAT (NP-SBJ (DPDS Dit))
        (VMFIN sal)
        (NP-OB2 (ADJA manigher)
                (ADJA armen)
                (NA sielen))
        (VVINF deren)
        (FRAG-PRN (FM id)         ← parenthetical clause consisting of foreign material
                  (FM est)
                  (VVFIN schaden))
)

 

Foreign material (FM)

A foreign word has the POS tag FM:

(PP (APPR van)
    (NP (DDARTA den)
        (FM phariseis))  ← foreign word
) 

Note that this is different to other Penn corpora, where foreign material is tagged as FW.

Unlike in other Penn corpora, there is no phrase-level category LATIN which immediately dominates FM. A word tagged FM can be dominated by any category. There are no restrictions here, even if it results in phrases which are not endocentric taking the POS tags at face value. Some examples:

(IP-MAT-SPE (NP-SBJ (PPER wi))
            (VVFIN sint)
            (NP-PRD (DDARTA dat)
                    (NA slecte)
                    (NP-POS (FM abrahe)))  ← FM heads NP-POS
)

(CP-THT (KOUS Wente)
        (IP-SUB (NP-SBJ (PPER du))
                (NP-PRD (DIARTA en)
                        (FM samaritanus))  ← FM heads NP-PRD
                (VVFIN bist))
)
(IP-IMP-SPE (NP-VOC (FM lazare))  ← FM heads NP-VOC
            (VVIMP cum)
            (ADVP (AVD hijr))
            (PTKVZ uor)
)

 

Numerals (XY)

Both Arabic and Roman numerals have the POS-tag XY (which in the HiNTs tagset stands for a `non-word’):

(NP-OB1 (NP-POS (DDARTA des)
                (NA volkes)
                (PP (APPR von)
                    (NP (NE Sirien))))
        (XY 100.000)     ← Arabic numeral
)
Engelhus
(FRAG (FM Dominica)
      (XY iiij)      ← Roman numeral
      (FM post)
      (FM aduentum)
)

This contrasts with numbers which are spelled out. These are tagged CARD*:

(NP-TMP (CARDA dre)  ← spelled out number
        (NA iar)
)
Engelhus

 

Direct speech (-SPE)

Generally speaking, a clause or fragment which constitutes direct speech gets the extended label -SPE:

(IP-MAT (ADVP (ADV Do))
        (VVFIN sprak)
        (NP-SBJ (PPER he))
        (IP-MAT-SPE (NP-SBJ (PPER ik))  ← IP-MAT which is direct speech
                    (PTKNEG ne)
                    (VVFIN bin))
)

Some things to note

  • The first direct speech clause is embedded in the introductory clause, i.e. there is no token break. Subsequent direct speech clauses in a chain of direct speech are treated as separate tokens.

  • If an IP or CP which is direct speech contains a further IP or CP, then only the highest IP or CP gets the -SPE tag (unlike in the Helipad):

(IP-MAT-SPE (NP-SBJ (PPER Du))                   ← IP-MAT-SPE
            (VMFIN schalt)
            (NP-OB2 (PPER di))
            (PTKNEG nicht)
            (VVINF wunderen)
            (CP-ADV (KOUS wente)                 ← but not CP-ADV-SPE
                    (IP-SUB (NP-SBJ (PPER ik))   ← and not IP-SUB-SPE
                            (NP-OB2 (PPER di))
                            (VVPP secht)
                            (VAFIN hebbe)))
)

 

Interjections (ITJ)

Interjections have the POS-tag ITJ:

(IP-MAT-SPE (ITJ Ach)            ← interjection
            (NP-VOC (ADJA wise)
                    (ADJA junge)
                    (NA man))
...
)
Zeno

Note:

  • Unlike the standard Penn scheme, we do not make use of INTJP for cases of multi-word interjections. These just attach directly at clause-level:
(IP-MAT-SPE (ITJ jach)
            (ITJ Jach)
            (NP-SBJ (NA Ot))
...
)
Zeno

 

Left-dislocation (-LFD) and resumption (-RSP)

Generally speaking, the extended label for left-dislocation (-LFD) is applied to clausal or phrasal constituents in the left periphery which are overtly resumed by a coreferential phrase (for some specific exceptions to this see below). The coreferential phrase is then tagged with the extended label for resumption (-RSP):

(IP-MAT (ADVP-TMP-LFD (KOUS Do)                            ← Left-dislocated constituent
                      (CP-ADV (WADVP-1 0)
                              (IP-SUB (ADVP-TMP *T*-1)
                                      (NP-SBJ (PPER he))
                                      (NP-OB1 (DPDS dit))
                                      (VVPP gesproken)
                                      (VAFIN hadde))))
        (ADVP-TMP-RSP (AVD Do))                           ← Resumptive constituent 
...
)

This means that, wherever there is a constituent tagged -LFD, there should also be one tagged -RSP.

Left-dislocation and resumption of a subject

Where we have left-dislocation and resumption of a subject, only the resumptive constituent is tagged as a subject so as not to violate the requirement that each IP has exactly one subject.

The left-dislocated constituent is simply tagged as NP-LFD; the resumptive constituent is tagged as NP-SBJ-RSP:

(IP-MAT (NP-LFD (DPOSA Ere)         ← left-dislocated constituent 
                (NA iunkfrowen)
                ...)
        (NP-SBJ-RSP (DPDS de))      ← resumptive constituent 
        (VVFIN wusten)
        (PTKNEG nicht)
        ...
)

Clauses which have more than one -LFD/-RSP pair

Cases can arise where more than one -LFD/-RSP pair occurs in a single clause.

In such cases, the pairs are co-indexed so as to avoid mismatches:

(IP-MAT (NP-LFD-2 (FM Ieu))
        (ADVP-TMP-LFD-3 (KOUS do)
                        (CP-ADV (WADVP-1 0)
                                (IP-SUB (ADVP-TMP *T*-1)
                                        (NP-SBJ (PPER he))
                                        (VVFIN was)
                                        (NP-PRD-XXX (XY xxxii)
                                                    (NA iar)
                                                    (ADJP (ADJD olt))))))
        (ADVP-RSP-3 (AVD do))
        (VVFIN ghewan)
        (NP-SBJ-RSP-2 (PPER he))
        (NP-OB1 (NE Saruth))
)

Insertion of null resumptives (*-RSP 0)

In certain left-dislocation contexts, a null resumptive element (*-RSP 0) is inserted.

The following contexts call for insertion of a null resumptive:

  • Where there is more than one left-dislocated element and a single resumptive element which could in principle pair with either one. As a default principle, the null resumptive element is inserted immediately after the first left-dislocated constituent. This insertion means that each left-dislocated constituent has a corresponding resumptive element, and the coindexation principle can be applied (see above):
(IP-MAT (CP-ADV-LFD-1 (C-V1 0)
                      (IP-SUB (VVFIN Is)
                              (NP-SBJ-2 (DPDS dat)) 
                              (ADVP (AVKO also)) 
                              (CP-THT-2 (KOUS dat) ...)))
(ADVP-RSP-1 0) 
(CP-ADV-LFD-3 (C-V1 0)
              (IP-SUB (VVFIN steruet) 
                      (NP-SBJ (DDARTA der)
                              (DPIS eyn))))
(ADVP-RSP-3 (AVKO so)) 
(VMFIN sal)
...)
Ruethen
  • Where there are two or mote left-dislocated constituents, of which only the latter is overtly resumed a clause-level. Again, the null resumptive element is inserted immediately after the first left-dislocated constituent by default, and the resulting -LFD/-RSP pairs are coindexed:
(IP-MAT (NP-LFD-1 (CP-FRL (WNP-2 (PTKG S) 
                                 (DWA welich)
                                 (NA voget)) 
                          (IP-SUB (NP-SBJ *T*-2)
                                  (NP-OB1 (DIARTA enen) 
                                          (NA richtere))
                                  (VVFIN set)
                                  (PP (APPR an)
                                      (NP (DPOSA sine)
                                          (NA stat))))))
(NP-RSP-1 0)
(NP-LFD-3 (CP-FRL (WNP-4 (PTKG s)
                         (DPWS waz))
                  (IP-SUB (NP-SBJ *T*-4) 
                          (PP (APPR vor)
                              (NP (DPDS dheme)))
                          (VVPP gelent)
                          (VAFIN wert))))
(NP-SBJ-RSP-3 (DPDS dat)) 
(VMFIN sal)
...)
Braunschweig

 

Appositives and parantheticals (-PRN)

The extended label -PRN is applied to appositive and parenthetical constituents. As such, it may occur with virtually any phrase type.

  • Appositive constituents are contained within the constituent to which they are in apposition:
(NP-SBJ (NE Abraham)
        (NP-PRN (DPOSA ivwe) ← appositive constituent
                (NA vader))
)     
  • Paranthetical clauses are contained within the clause to which they are paranthetical:
(IP-MAT (ADVP (AVD Oeck))
        (VVFIN secht)
        (NP-SBJ (NA sunte)
                (NE gregorius))
        (IP-MAT-PRN (NP-OB1 (DPDS dat)) ← paranthetical clause
                    (VVFIN weet)
                    (NP-SBJ (PPER ic))
                    (ADVP AVD wal))
...
)

 

Empty subjects

All finite clauses (IP-MAT, IP-SUB) are required to have a subject, and this is what drives the policy for including empty subjects. Infinitival clauses (IP-INF) are not required to have a subject.

All empty subjects appear as early as possible in the clause.

There are four types of empty subject:

  • \*con\*: subject elision under conjunction
  • \*pro\*: referential null subject
  • \*arb\*: TODO: do we actually use this?
  • \*exp\*: null expletive subject

\*con\*: subject elision under conjunction

(IP-MAT (NP-OB1 (DPDS Dat))     ← first conjunct
        (VVFIN beleueden) 
        (NP-SBJ (PPER se)
                (DIN alle))
        (PP (APPR myt)
            (NP (NA willen)))
)

(IP-MAT (KON und) 
        (NP-SBJ *con*)     ← second conjunct, with elision of subject
        (VVFIN scheiden)
        (PP (APPR van)
            (NP (PPER em)))
)

Note:

  • \*con\* can only be used when the elided subject is coreferential and identical in number with the subject of the preceding IP-MAT.

  • If not, then use \*pro\*:

(IP-MAT (ADVP-TMP (AVD Do))
        (VVFIN brochte)
        (NP-SBJ (DPIS me))
        (PTKVZ wedder)
        (NP-OB1 (DDARTA de)
                (ADJA kostlike)
                (NA kleder))
)

(IP-MAT (KON vnde)
        (NP-SBJ *pro*)       ← *pro* subject
        (VAFIN wart)
        (ADVP (AVD vroliken))
        (VVPP entfangen)
        (PP (APPR van)
            (NP (DIA allen)
                (NA volke)))
)
Griseldis

\*pro\*: referential null subject

An example:

(IP-MAT (KON vnde)
        (NP-SBJ *pro*)       ← *pro* subject
        (VAFIN wart)
        (ADVP (AVD vroliken))
        (VVPP entfangen)
        (PP (APPR van)
            (NP (DIA allen)
                (NA volke)))
)
Griseldis
  • Note: \*pro\* can also be used as a default empty subject where other types of empty subject are not appropriate, e.g. in cases of conjunction reduction where there is number mismatch and thus \*con\*cannot be used.

\*arb\*

TODO: do we have any instances of this?

\*exp\*: null expletive subject

A null expletive subject (*exp\*) is inserted in contexts where an overt expletive could be expected, but is not attested (see below).

(IP-MAT (KON Vnde)
        (NP-SBJ *exp*)             ← null expletive
        (PP (ADVP (PAVD hijr))
            (PAVAP vp))
        (VVFIN staet)
        (PP (APPR in)
            (NP (DDARTA der)
                (NA glosen)))
        (VVPP gheschreuen)
)

 

Other empty categories

There are five other empty non-subject categories:

  • \*t\*: traces of wh-movement
  • \*ICH\*: traces of other types of movement
  • Empty complementisers (C 0)
  • Empty conjunctions (KON 0)
  • Empty resumptives (*-RSP 0

\*t\*: traces of wh-movement

Traces of this type are always co-indexed.

(CONJP (KON vnde)
       (CP-QUE (WNP-1 (DPWS wat))         ← wh-phrase
               (IP-SUB (NP-PRD *T*-1)     ← trace
                       (NP-SBJ (PPER se)
                       (VVFIN is))))
)
          

\*ICH\*: traces of other types of movement (e.g. extraposition)

Traces of this type are also always co-indexed.

(IP-SUB (NP-SBJ  (PPER se))
        (ADJ-PRD (ADJD ledich)
                 (PP *ICH*-1))         ← trace
        (VVFIN was)
        (PP-1 (APPR van)               ← moved constituent
              (NP (ADJA guden)
                  (NA daden)))
)
  • NB: The \*ICH\* is inserted as early as possible in the source constituent if the movement is upward, and as late as possible in the source constituent if the movement is downward.

  • NB: Movements of these kinds are only represented if the movement in question takes the moved constituent out of its source constituent.

Empty complementisers (C 0)

Empty complementisers (C 0) are used in a more restricted way than in the general Penn scheme – see elsewhere.

Empty conjunctions (KON 0)

Departing from the standard Penn policy on null elements, we insert a null conjunction (KON 0) in contexts where it is motivated, e.g.

  • in lists, where the last two elements are overtly conjoined:
(NP-PRN (NE (NE Abram)
            (KON 0)      ← null conjunction
            (NE Machor)
            (KON vnde)
            (NE Aram))
)

Empty resumptives (*-RSP 0)

This is another departure from the standard Penn policy on null elements (see above).

Expletive constructions

As is standard for Penn-style treebanks, the expletive (whether overt or null) is contained within an NP-SBJ. Note that this is not necessarily a statement on the status of the expletive as a subject.

Overt expletives

Note that overt expletives do not have a special POS tag in the corpus: they are just tagged as an ordinary demonstrative pronoun (DPDS).

Overt expletives occur in four main contexts:

  • With an extraposed clausal subject; expletive is coindexed with the clausal subject.
(IP-MAT (NP-SBJ-1 (DPDS Dit))                   ← expletive
        (VVFIN is)
        (NP-PRD (DDARTA de)
                (ADJA erste)
                (NA punt)
                (PP (APPR van)
                    (NP (NA viuen))))
        (CP-THT-1 (KOUS)                        ← clausal subject
                  (IP-SUB (NP-SBJ (PPER wi))
                          (ADVP (ADJV lancge))
                          (VVFIN menen)
                          ...))
)
  • With an extraposed clausal object; expletive is coindexed with the clausal object.
    (IP-MAT (NP-SBJ *con*)
            (KON vnde)
            (VVFIN betugede)
            (NP-OB1-1 (PPER et))                       ← expletive
            (CP-THT-1 (KOUS wente)                     ← clausal object
                      (IP-SUB (NP-SBJ (DPDS dit))
                              (VVFIN is)
                              (NP-PRD (NP-POS (NA godes))
                                      (NA sone)))))
  • In presentational constructions; the expletive is not co-indexed with the postverbal discourse-new referent and so presentational constructions with an overt expletive cannot be isolated automatically.

TODO: example here

  • In impersonal constructions; the expletive is not coindexed with anything, and so impersonal constructions with an overt expletive are also not easily identifiable.

TODO: example here

Null expletives

Null expletives (\*exp\*) are inserted in these contexts (with the exception of the object expletive type) where an expletive could in principle appear but where it is not attested. The same rules for coindexing as with overt expletives apply.

  • Null expletive with an extraposed clausal subject:
(IP-SUB (NP-SBJ-1 *exp*)                      ← null expletive
        (ADVP (AVD dar))
        (VVPP ghescreuen)
        (VAFIN staet)
        (CP-THT-1 (KOUS Dat)                  ← clausal subject
                  (IP-SUB (NP-SBJ (DPIS men))
                          (PP (APPR in)
                              (NP (CARDA vier)
                                  (NA manieren)
                                  (NA sunde)))
                          (VVFIN begaet)))
)
  • Null expletive in presentational constructions:
(IP-MAT (NP-SBJ *exp*)           ← null expletive
        (ADVP (AVD Vortmer))
        (ADVP (AVD so))
        (VVFIN sint)
        (NP-PRD (CARDA drey)     ← discourse-new referent
                (NA gheRichte))
        (PP (APPR binnen)
            (NP (DDARTA der)
                (NA stat)))
)
  • Null expletive in impersonal constructions:
(IP-MAT (NP-SBJ *exp*)    ← null expletive
        (KON Vnde)
        (PP (ADVP (PAVD hijr))
            (PAVAP vp))
        (VVFIN staet)
        (PP (APPR in)
            (NP (DDARTA der)
                (NA glosen)))
        (VVPP gheschreuen)
)
  • Note that no null expletive is inserted in sentences with a clausal object which lack an expletive in the matrix clause:
(IP-MAT (PP (ADVP (PAVKO dar))
            (PAVAP von))
        (VVFIN screif)
        (NP-SBJ (DPIS me))
        (CP-THT (KOUS dat)                   ← clausal object
                (IP-SUB (NP-SBJ (PPER he))
                        (NP-OB1 (DDARTA dem)
                                (NA himmel)
                        (VVFIN vphelde)
                        (PP (ADVP (PAVD dar)
                            (PAVAP vmme))))))
)

 

Conjunction

Phrase-level conjunction

Phrase-level conjunction is applied in cases where any one of the conjuncts consists of more than one word.

Broadly speaking, phrase-level conjunction is presented using CONJP, a phrasal category headed by KON. The only exception to this are conjoined matrix clauses with coreferential subjects, which are treated as separate tokens (see above).

  • The general structure for phrase-level conjunction is:
(XP1 (XP2 first-conjunct)
     (CONJP (KON conjunction)     ← CONJP is sister to first conjunct
            (XP3 second-conjunct)) ← The second conjunct is conjunct to the head KON.
)
  • In cases for third and subsequent conjuncts, this general structure is just extended:
(XP1 (XP2 first-conjunct)
     (CONJP (KON conjunction)     
            (XP3 second-conjunct))
     (CONJP (KON conjunction)
            (XP4 third-conjunct))
)
(NP-OB1 (NP (DDARTA dat)
            (NA wijf))
        (CONJP (KON oft)
               (NP (NE eua)))
)

Word-level conjunction

  • Single-word conjuncts can just conjoin at word level; no CONJP is needed:
(XP (X1 (X2 first-conjunct)
        (KON and)
        (X3 second-conjunct))
)
(ADJP-PRD (ADJD (ADJD seker)
                (KON vnde)
                (ADJD ghewijs))
)
(NP-ADT (DPDS (DPDS dit)
              (KON of)
              (DPDS dat))
)
  • Conjunction of nonfinite verbs:
(VVFIN (VVFIN comen)
       (KON eder)
       (VVFIN quamen)
)

Inserting empty conjunctions (KON Ø)

Note that an empty conjunction (KON Ø) is inserted in certain contexts (see above).

Correlative conjunction

Cases of word-level correlative conjunction are treated as a flat structure:

(NP (NA (KON X)
        (NA Y)
        (KON Z)
        (NA Q))
)

In cases of phrase-level correlative conjunction, the first conjunction is annotated as a sister of the first conjunct. The rest of the phrase is treated as per non-correlative phrase-level conjunction.

(NP-OB1 (KON beide)           
        (NP (ADJA hillighe)
            (NA scrift))
        (CONJP (KON vnde)
               (NP (ADJA natuerlike)
                   (NA scrift)))
)

Conjunction with shared pre-modification

For cases of conjunction with shared pre-modifiers, we do not use NX, ADJX etc, as in the general Penn scheme. Rather, we follow the following guidelines:

  • If word-level conjunction can apply, apply it

TODO: insert example

  • Otherwise, attach the pre-modifier to the highest node, and annotate conjunction of a phrase-level category (XP):
        (NP-SBJ (DPOSA dine)                       ← Pre-modifier attached high
                (NP (ADJA grote)                |  ← Structural parallelism
                    (NA vnere))                 |
                (CONJP (KON vnde)               +-- Phrase-level conjunction
                       (NP (ADJA bedreuenne)    |
                           (NA schande))))      /

The rule to attach high by default is used in these cases. As illustrated above, this rule can be overridden in the case of structural parallelism between the two conjuncts. *grote* could in principle be treated as a shared pre-modifier, but to maintain parallelism it is attached inside the first conjunct instead.

Conjunction of unlike categories

Sometimes two unlike categories will be conjoined. In such cases, the category enclosing the conjunction structures is the same as the category of the first conjunct.

(NP-OB1 (NP (CARDA seuen)
            (NA vrowen))
        (CONJP (KON vnd)
               (PP (APPR bouen)
                   (NP (XY xx)
                       (NA sone))))
)
Engelhus

Unlike in the general Penn scheme, we use this policy for both phrase-level and word-level conjunction.

Use of gapping instead of word-level conjunction with separable verb prefexes

In some cases, what is logically word-level conjunction between two verbs cannot be annotated as such because of separable verb prefixes which are annotated as separate words. In such cases, the second (or later) conjunct is annotated as a gap instead:

    (IP-MAT-1 (PP (APPR Van)
                  (NP (DDARTA dessen)
                      (NA ghelde)))
              (VMFIN scholen)
              (NP-SBJ (NP (NE Gherd))
                      (CONJP (KON vnde)
                             (NP (NA vor)
                                 (NE vredeke)
                                 (ADJN vorghenompt))))
              (PTKVZ weder)
              (VVINF gheuen)
              (IP-MAT-PRN=1 (KON vnde)
                            (VVINF betalen))
     )

Not:

    (IP-MAT ...
            (PTKVZ weder)
            (VVINF (VVINF gheuen)
                   (KON vnde)
                   (VVINF betalen))
    )

 

Negation (PTKNEG; AVNEG; DNEG; DPNEGS)

Sentential negation

Sentential negation has the POS tag PTKNEG and is attached at the IP-level:

(IP-MAT (PP (APPR By)
            (NP (PPER my)
                (PTKN seluen)))
        (VVFIN vermach)
        (NP-SBJ (PPER ic))
        (PTKNEG nicht)        ← sentential negation
)

Phrase-level negation

Phrase-level negation can have a range of POS tags.

  • Negative adverbs have the POS tag AVNEG and project an ADVP:
(IP-MAT (KON Vnde)
        (NP-SBJ (PPER hey))
        (PTKNEG ne)            ← sentential negation
        (VMFIN sal)
        (ADVP (AVNEG nummer)   ← negative adverb
              (AVD mer))
        (PP (APPR in)
            (NP (DDARTA den)
                (NA Rayd)))
        (VVINF komen)
)
(IP-MAT-SPE (NP-SBJ (PPER Ick))
            (VAFIN hebbe)
            (NP-OB2 (PPER my))
            (ADVP (AVNEG nye))   ← negative adverb
            ...
)
  • Negative determiners have the POS tag DNEG* and are contained within an NP:
(IP-MAT (NP-OB1 (DNEGA Nene)       ← negative determiner
                (NA kopenschap))
        (VMFIN sulle)
        (NP-SBJ (PPER ghi))
        (VVINF hantieren)
)
  • Negative pronouns have the POS tag DPNEGS and are contained within an NP:
(IP-MAT (NP-OB1 (DPNEGS Nemant))   ← negative pronoun
        (PTKNEG en)                ← sentential negation
        (VMFIN sulle)
        (NP-SBJ (PPER ghi))
        (VVINF doetslaen)
)

 

Possessives

Possessor NPs are treated as complements of N:

(NP-OB1 (DDARTA den)
        (NA namen)
        (NP-POS (NA godes)) ← possessor NP
)

Possessive pronouns are treated as modifiers and do not project a phrase:

(PP (APPR in)
    (NP (DPOSA iuwen) ← possessive pronoun
        (NA munt))
)

 

Separable verb prefixes

These have the tag PTKVZ and attach at the IP level as sisters of their verb:

(IP-MAT (NP-SBJ (DDARTA De)
                (NA viant))
        (VVFIN ghift)
        (PTKVZ wt)
        (NP-OB1 (DDARTA den)
                (ADJA eersten)
                (NA raet))
)

 

Punctuation

Punctuation is given the POS tag $;

The general principle is to attach punctuation as high as is reasonable.

2. Clauses

[up]

Clause types

Any complete finite clause (IP-MAT, IP-SUB) must contain at least:

  • a finite verb
  • a subject

All finite subordinate clauses are labelled as a CP, and any CP is labelled for type (see below) and will contain an IP-SUB.

Ambiguous clauses (IP-X)

For finite clauses introduced by wente which are are ambiguous and cannot be labelled as either matrix (IP-MAT) or subordinate (IP-SUB), we use a novel label, IP-X.

This applies to wente-clauses which are unambiguously V2:

TODO: parsed example, e.g. wente dit is godes sone

This also applies to wente-clauses where the verb position is hard to diagnose, since there are only two constituents in the clause:

TODO: parsed example, e.g. wente he kam

Note: wente-clauses which are clearly verb-final

TODO: parsed example

(see LREC paper for more details).

Infinitival clauses (IP-INF)

  • are headed by a verb in its infinitive form (VVINF)
  • do not necessarily require a subject, though they may have one

Imperative clauses (IP-IMP)

  • are headed by a verb in its imperative form (VVIMP)
  • do not require a subject

 

Clause extended labels

For IPs:

  • -MAT: matrix clause
  • -SUB: subordinate clause
  • -INF: infinitival clause
  • -IMP: imperative clause

For CPs:

  • -REL: relative clause
  • -FRL: free relative clause
  • -THT: that-clause
  • -ADV: adverbial clause
  • CMP: comparative clause
  • QUE: interrogative clause
  • DEG: degree clause

Additional functional extended labels which an IP or CP may have (added in this order where there are multiple functional labels):

  • -PRN: parenthetical
  • -LFD: left-dislocated
  • -SPE: direct speech
  • -n: an index

 

The CP layer

A CP layer is only postulated in:

  • finite subordinate clauses (adverbial, degree, complement etc.)
  • clauses with wh-movement (direct and indirect wh-questions, relative clauses)

The CP layer is lexicalized either by a complementizer or by an element in SpecCP (e.g. a wh-phrase or a relative pronoun) or in rare cases by both.

The original Penn annotation scheme calls for a complementizer to be present in all CPs (whether overt of empty). In the CHLG, however, we have chosen not to insert empty elements in C as a general rule. There are only three cases where an empty C is inserted:

  • When an empty complementizer alternatives with dat, mostly after verba diecendi (C 0)
  • V1 conditionals and direct questions (C-V1 0)
  • Asyndetic V2 dependent clauses, where the main sign of clause dependence is subjunctive marking on the finite verb (C-SUBJ 0):

When an empty complementizer alternates with dat, mostly after verba diecendi (C 0)

(IP-MAT (NP-SBJ *con*)
        (KON vnde)
        (VVFIN sede) (CP-THT ​(C 0)                ← empty complementiser
                     (IP-SUB (NP-SBJ (PPER se))
                             (VMFIN wolde)
                             (NP-TMP-WH (DPWS wat))
                             (VVINF ruwen))
                     (CP-ADV (KOUS Also)
                             (IP-SUB (NP-SBJ (PPER se))
                                     (ADVP-TMP (AVD nu))
                                     (ADJP-PRD (ADJD allene))
                                     (VVFIN was))))
)
‘And [she] said (that) she wished to rest a while, since she was now alone’

V1 conditionals and direct questions (C-V1 0)

(IP-MAT (CP-ADV   (C-V1 0)                 ← empty complementiser
                  (IP-SUB (VVFIN Wult)
                          (NP-SBJ (PPER u))
                          (CP-THT (KOUS dat)
                                  (IP-SUB (NP-SBJ (PPER yck))
                                          (VVFIN sterue)))))
        (NP-SBJ (PPER yck))
        (VVFIN sterue)
        (PP (APPR myt)
            (NP (NA willen)))
)
V1 conditional: ‘If you want me to die, I gladly will’
(CP-QUE (C-V1 0)                 ← empty complementiser
        (IP-SUB (VVFIN Meine)
                (NP-SBJ (PPER ghi))
                (CP-THT (KOUS datt)
                ...))
)
Direct question: `Do you think that...'?

Asyndetic V2 dependent clauses, where the main sign of clause dependence is subjunctive marking on the finite verb (C-SUBJ 0)

(IP-MAT (CP-ADV (WADVP-2 0)
                (KOUS wen)
                (IP-SUB (ADVP-TMP *T*-2)
                        (NP-SBJ (PPER ick))
                        (VVFIN vterkese)))
        (CP-ADV (C-SUBJ 0)                    ← empty complementiser
                (IP-SUB (NP-SBJ (PPER se))
                        (VVFIN sy)
                        (NP-PRD (NP-POS (NP (NP-POS (DDARTA des)
                                                    (NA keisers))
                                            (NA vorsten))
                                        (CONJP (KON edder)
                                               (NP (NA herden))))
                                (NA dochter))))
        (ADVP (AVKO so))
        (VVFIN wil)
        (NP-SBJ (PPER ick))
        (CP-THT (CP-THT (KOUS dat)
                        (IP-SUB (NP-SBJ (PPER se))
                                (NP-PRD (DPOSA iuwe)
                                        (ADJA weldige)
                                        (NA vrowe))
                                (VVFIN sy))))
)
`When I choose [a bride], ​whether she is a daughter of a king’s lord or of a shepherd​, I wish for her to be your mighty lady' 

 

Adverbial clauses (CP-ADV)

The structure of an adverbial clause (CP-ADV) depends on the type of adverbial subordinator (KOUS) which introduces the clause.

Essentially, three different structures are available (see below for details on each):

  • One structure for adverbial clauses introduced by an adverbial subordinator which is formally identical to an ordinary adverb (e.g. also, dar, do, eer, nu, so).
  • One structure for adverbial clauses introduced by an adverbial subordinator which is not formally identical to an ordinary adverb (e.g. dat, eft,
    wanner, want(e)).
  • One structure for adverbial clauses introduced by wen.

 

Adverbial subordinators (KOUS) which project an ADVP

The following adverbial subordinators are formally identical to ordinary adverbs and project an ADVP which attaches at the IP level:

  • also
  • dar
  • dewile
  • do
  • eer
  • nu
  • so

In such structures, the adverbial subordinator (KOUS) takes a CP-ADVP as its complement which is headed by an empty WADVP. The WADVP is then traced into the finite subordinate clause (IP-SUB).

Some examples:

also

(ADVP (KOUS also)
      (CP-ADV (WADVP-1 0)
              (IP-SUB (ADVP *T*-1)
                      (NP-SBJ (DDARTA de)
                              (NA preesters))
                      (VVIMP ghebeedet)))
)

dar

(ADVP-LOC (KOUS dar)
          (CP-ADV (WADVP-1 0)
                  (IP-SUB  (ADVP-LOC *T*-1)
                           (NP-SBJ (PPER he))
                           (VVFIN sprekt)
                            ...))
)

dewile

(ADVP-TMP (KOUS dewile)
          (CP-ADV (WADVP-0)
          (IP-SUB (ADVP-TMP *T*-1)
                  (NP-SBJ (PPER se))
                  ...))
)
  

do

(IP-MAT (ADVP-TMP-LFD (KOUS Do)
                      (CP-ADV (WADVP-1 0)
                              (IP-SUB (ADVP-TMP *T*-1)
                                      (NP-SBJ (PPER he)
                                      (VVFIN quam)))))
        (ADVP-TMP-RSP (AVD do))
...
)

eer

(ADVP-TMP (KOUS Eer)
          (CP-ADV (WADVP-1 0)
                  (IP-SUB (ADVP-TMP *T*-1)
                          (NP-SBJ (PPER ic))
                          (NP-OB1 (DPDS dit))
                          (VVFIN vulbrincge)
                          (PP (APPR to)
                              (NP (DPOSA dinen)
                                  (NA eren)))))
)

nu

(IP-MAT (ADVP-TMP (KOUS Nu)
                  (CP-ADV (WADVP-1 0)
                          (IP-SUB (ADVP-TMP *T*-1)
                                  (NP-SBJ (PPER ik))
                                  (NP-OB1 (DPDS den))
                                  (PTKNEG nicht)
                                  (VVINF vorkamen)
                                  (VMFIN mach))))
        (ADVP (AVKO So))
...
)

so

(ADVP (KOUS So)
      (CP-ADV (WADVP-1 0)
              (IP-SUB (ADVP *T*-1)
                      (NP-OB2 (PPER my))
                      ...))
)

Adverbial subordinators (KOUS) which head a CP-ADVP

A small set of adverbial subordinators (KOUS) – those which do not have formally identical ordinary adverb counterparts – are treated as overt complementisers, i.e. head the CP, rather than project an ADVP of which CP-ADV is a daughter. This applies specifically to the following adverbial subordinators:

  • dat
  • eft/of
  • wanner
  • want(e)
  • woldat

dat

(IP-MAT (KON Mer)
        (NP-SBJ (PPER se))
        (VVFIN maket)
        (NP-OB1 (DDARTA den)
                (NA menschen))
        (CP-ADV (KOUS dat)
                (IP-SUB (NP-SBJ (PPER he))
                        (VVFIN wert)
                        (NP-OB2 (NA gode))
                        (OA-XXX leet)))
)

effte

(CP-ADV (KOUS effte)
        (IP-SUB (NP-SBJ (PPER wi))
                (PP (ADVP (PAVKO dar))
                    (PAVAP umme))
                (VVFIN bidden)
                (VMFIN mogen))
)

wanneer

(IP-MAT (CP-ADV (KOUS Wanneer)
                (IP-SUB (NP-SBJ (PPER ghi))
                        (PP (APPR mit)
                            (NP (NA vorsate))
                        (NP-OB1 (NA vnrecht))
                        (VVFIN sweert))))
        (NP-SBJ (DPDS Dat))
        (VVFIN is)
...
)

want(e)

(CP-ADV (KOUS want)
        (IP-SUB (NP-SBJ (PPER he))
                (NP-OB2 (PPER em))
                (PTKNEG nicht)
                (VVFIN vnsaghe)
                (PP (APPR in)
                    (NP (DDARTA dessen)
                        (NA leuen))))
)

woldat


(CP-ADV (KOUS woldat)
        (IP-SUB (NP-SBJ (DDA dusse)
                (XY x)
                (NA mestere))
                (NP-OB1 (DIA vele)
                (NP-POS gudes))
                (VVFIN (VVFIN makeden)
                       (KON vnd)
                       (VVFIN deden)))
)
Engelhus

Adverbial clauses introduced by wen

 

Adverbial clauses introduced by wen have their own structure.

 

wen projects a WADVP which heads the CP-ADV. The ADVP is then traced into the finite subordinate clause.

 

(IP-MAT (CP-ADV (WADVP-1 (AVW Wen))
                (IP-SUB (ADVP-TMP *T*-1)
                        (NP-SBJ (DDARTA de)
                                (NA here)
                        (VVFIN spaserde))))
        (VVFIN sach)
        (NP-SBJ (PPER he))
...
)

That-clauses (CP-THT)

Bare CP-THTs headed by (KOUS dat)

Bare CP-THTs are headed by a (KOUS dat) which takes an IP-SUB as its complement:

(IP-MAT (PP (ADVP (PAVKO dar))
            (PAVAP umm))
        (VVFIN wil)
        (NP-SBJ (PPER ik))
        (CP-THT (KOUS dat)                      ← KOUS as head of the CP-THT
                (IP-SUB (NP-OB2 (PPER my))      ← IP-SUB as complement of KOUS                        
                        (NP-SBJ (DDARTA dat)    
                                (DPIS ander))
                                (ADJP-PRD (ADJD nutte))
                        (VVFIN si)))
)

Bare CP-THTs where (KOUS dat) is absent

When dat is absent, a null complementizer (C 0) is inserted (see also above):

(IP-MAT (NP-SBJ *con*)
        (KON vnde)
        (VVFIN sede) (CP-THT ​(C 0)
                     (IP-SUB (NP-SBJ (PPER se))
                             (VMFIN wolde)
                             (NP-TMP-WH (DPWS wat))
                             (VVINF ruwen))
                     (CP-ADV (KOUS Also)
                             (IP-SUB (NP-SBJ (PPER se))
                                     (ADVP-TMP (AVD nu))
                                     (ADJP-PRD (ADJD allene))
                                     (VVFIN was))))
)
‘And [she] said (that) she wished to rest a while, since she was now alone’

CP-THTs introduced by a preposition

When the CP-THT is introduced by a preposition, then the CP-THT sits within a PP which is headed by the preposition:

(IP-MAT (PP (APPR vppe)
            (CP-THT (KOUS dat)
                    (IP-SUB (NP-OB2 (PPER di))
                            (NP-SBJ (DPDS dat))
                            (ADVP (AVD noch))
                            (VMFIN mochte)
                            (VVINF bescheen))))
        (ADVP (AVKO so))
....
)
(PP (ADVP (PAVKO Dar))
    (PAVAP umme)
    (CP-THT (KOUS dath)
            (IP-SUB (NP-SBJ (PPER yck))
                    (NP-OB1 (PPER dy))
                    (VAFIN hebbe)
                    (VVPP ghenomenn)))
)  

 

Degree complements (CP-DEG)

(ADJP-PRD (PTKA so)
          (ADJD hochtidlik)
          (CP-DEG (WADVP-1 0)
                  (KOUS dat)
                  (IP-SUB (ADVP *T*-1)
                          (NP-SBJ (DDARTA+NA desgeliken)
                          (ADVP-TMP (ADJV vor))
                          (PTKNEG nicht)
                          (VVPP geseen)
                          (VAFIN was))))
)
(IP-MAT (NP-SBJ (DDARTA De)
                (NA ghiricheit))
        (VVFIN is)
        (NP-PRD (DIARTA een)
                (NA ouel)
                (RRC (PTKA also)
                     (VVPP gheraket)
                     (CP-DEG (KOUS Dat)
                             (IP-SUB (NP-SBJ (PPER se))
                              ...))))
)

 

Comparative clauses (CP-CMP)

Virtually all CP-CMPs should be a sister of a comparative head, e.g.:

  • also
  • gelik (‘like’)
(PP (KOKOM also)
    (CP-CMP (WADVP-1 0)
            (IP-SUB (ADVP *T*-1)
                    (NP-SBJ (PPER wy))
                    (ADVP-TMP (AVD vore))
                    (VAFIN hadden)
                    (VVPP ghedan)))
)

Clauses which could plausibly qualify as a CP-CMP but which lack a comparative head should be annotated as a CP-ADV.

TODO: insert example here

Correlative comparative clauses

(IP-MAT (ADVP (PTKA also)
              (ADJV dicke)
              (CP-CMP (WADVP-1 0)
                      (IP-SUB (ADVP *T*-1)
                              (NP-SBJ (PPER se))
                              (PP (ADVP (PAVKO dar))
                                  (PAVAP mede))
                              (VVPP besen)
                              (VAFIN wert))))
        (ADVP (PTKA also)
              (ADJV dicke))
        (VMFIN schal)
        (NP-SBJ (PPER se))
        (NP-OB1 (DIARTA ene)
                (NA discipliuen))
        (VVINF vntfan)
)

 

Direct and indirect questions (CP-QUE)

Both direct and indirect questions are annotated as a CP-QUE which immediately dominates an IP-SUB.

In direct yes/no-questions, the CP-QUE is headed by an empty complementiser labelled C-V1 (see above):

(CP-QUE (C-V1 0)
        (IP-SUB (VVFIN Meine)
                (NP-SBJ (PPER ghi))
                (CP-THT (KOUS datt)
                ...))
)
Direct question: `Do you think that...'?

In direct wh-questions, the wh-phrase is traced into the IP-SUB in which it belongs:

(CP-QUE (WADVP-1 (AVW Wor))         ← wh-phrase
        (IP-SUB (ADVP-DIR *T*-1)    ← trace of wh-phrase in IP-SUB
                (VMFIN wil)
                (NP-SBJ (DPDS desse)
                (VVINF gan)
                ...))
)

In indirect questions, the wh-phrase is traced into the IP-SUB in which it belongs:

(IP-MAT (KON Sunder)
        (NP-SBJ (DPIS eteswelke))
        (VVFIN spreken)
        (CP-QUE (WADVP-1 (AVW wo))                 ← wh-phrase
                (IP-SUB (ADVP *T*-1)               ← trace of wh-phrase in IP-SUB
                        (VAFIN is)
                        (NP-SBJ (NE christus)
                        (VVPP gecomen)
                        (PP (APPR uan)
                            (NP (NE galilea))))))
)

 

V1 conditionals

V1 conditionals are treated as CP-ADVs with an empty complement (C-V1 0) which takes an IP-SUB as its complement (see also above):

(IP-MAT (CP-ADV   (C-V1 0)
                  (IP-SUB (VVFIN Wult)
                          (NP-SBJ (PPER u))
                          (CP-THT (KOUS dat)
                                  (IP-SUB (NP-SBJ (PPER yck))
                                          (VVFIN sterue)))))
        (NP-SBJ (PPER yck))
        (VVFIN sterue)
        (PP (APPR myt)
            (NP (NA willen))))

‘If you want me to die, I gladly will’

 

Relative clauses (CP-REL)

Relative clauses are treated as per the standard Penn scheme:

(NP-OB1 (DDARTA dat)
        (NA got)
        (CP-REL (WNP-1 (DPRELS dat))    ← relative pronoun
                (IP-SUB (NP-OB1 *T*-1)  ← trace in finite subordinate clause
                        (NP-SBJ he)
                        (VVFIN erft)))
)
Duderstadt2

Relative clauses introduced by pronominal adverbs

See also treatment of pronominal adverbs elsewhere.

(CP-REL (WPP-1 (WADVP (PAVREL Dar))
               (PAVAP na))
        (IP-SUB (PP *T*-1)
                ...)
)
  • Relative clauses can also be introduced by pronominal adverbs which are discontinuous:
(NP-SBJ (DDARTA de)
        (NA (NA busse)
            (KON vnde)
            (NA budel))
        (CP-REL (WADVP-3(PAVREL-2 dar))
                (IP-SUB (ADVP *T*-3)
                        (NP-SBJ (NP-POS (NA geodes))
                                (NA licham))
                        (PP (PAVREL *ICH*-2)
                          (PAVAP ynne))
                        (VVFIN was)))
) 

 

Free relative clauses (CP-FRL)

CP-FRLs cannot attach directly at the IP-level. The basic category enclosing the free relative is identical with the gap in the free relative.

(IP-MAT (NP-OB1 (DPDS Dat))
        (VVFIN do)
        (NP-SBJ (CP-FRL (WNP-1 (DPRELS we))       ← free relative within NP-SBJ
                        (IP-SUB (NP-SBJ *T*-1)
                                (NP-OB1 (PPER et))
                                (VVFIN wille))))
)
(IP-MAT (ADVP (AVD Aldus))
        (VMFIN moeghe)
        (NP-SBJ (PPER ghi))
        (PP (APPR in)
            (NP (DDARTA dessen)
                (NA boeke)))
        (VVINF soeken)
        (NP-OB1 (CP-FRL (WNP-1 (DPWS wat))        ← free relative within NP-OB1
                        (IP-SUB (NP-SBJ *T*-1)    ← NP gap in free relative
                                (NP-OB2 (PPER iv))
                                (ADVP (ADJV best))
                                (VVFIN gadet))))
)
(IP-MAT (NP-OB1 (DPDS dat))
        (VMFIN willen)
        (NP-SBJ (PPER sze))
        (VVINF vorschulden)
        (ADVP (CP-FRL (WADVP-1 (AVREL wor))            ← free relative within AVDP
                      (IP-SUB  (ADVP *T*-1)            ← gap in free relative
                               (NP-SBJ (PPER sze))
                               (VMFIN (VMFIN konen)
                                      (KON vnd)
                                      (VMFIN moghen)))))
)
(Greifswald)

 

Reduced relative clauses (RRC)

RRCs almost always immediately follow their antecedent. The contain no operator or gap of their own:

(PP (APPR Jn)
    (NP (DIARTA enen) 
        (NA boke)
        (RRC (VVP ghenoemet)
             (PP (APPR van)
                 (NP (DDARTA den)
                     (ADJA ouersten)
                     (NA gude)))))
)

Note that an RRC is not used when we just have a single postnominal modifier, e.g:

(NP-OB2 (NE hans)
(NE wickendorpe)
(ADJN erscreuen) ← post-nominal modifier
)

Infinitival clauses (IP-INF)

Not all infinitives (tagged VVINF) project and IP-INF.

Constructions where no IP-INF is projected:

  • Modal auxiliaries + inf
(IP-MAT (KON Vnde)
        (NP-SBJ (PPER et))
        (VMFIN mach)            ← modal auxiliary
        (ADVP (AVD wal))
        (VVINF heten)           ← infinitive
)

Constructions where an IP-INF is projected:

  • beghinnen + to inf
(IP-MAT (KON Vnde)
        (NP-SBJ *con*)
        (VVFIN beghinnen)
        (IP-INF (PTKZU to)
                (VVINF dweelen))
        ...
) 
  • laten + inf
(IP-MAT (NP-SBJ (PPER He))
        (VVFIN leyt)
        (IP-INF (VVINF (VVINF buwen)
                       (KON vnd)
                       (VVINF betern))
                (NP-OB1 (NP (DDARTA den)
                            (NA tempel))
                        (CONJP (KON 0)
                        ...)))
)
Engelhus
  • menen + to inf
(IP-SUB (NP-SBJ (PPER he))
        (VVFIN mende)
        (IP-INF (NP-OB1 (PRF sick))
                (PTKZU to)
                (VVINF verhoghen))
)
  • plegen + to inf
(IP-MAT (PP (APPR ouer)
            (NP (CARDA hundert)
                (NA iaren)))
        (VAFIN plach)
        (NP-SBJ (DPIS men))
        (IP-INF (NP-OB1 (PPER se))
                (APPR to)
                (NA done))
)

Inflected infinitives

The same structure applies to inflected infinitives which are tagged not as VVINF but as a nominal category (NA). To is not tagged as a particle (PTKZU) in such cases but as a preposition (APPR):

(IP-MAT (KON Wante)
        (PP (APPR ouer)
            (NP (CARDA hundert)
                (NA iaren)))
        (VAFIN plach)
        (NP-SBJ (DPIS men))
        (IP-INF (NP-OB1 (PPER se))
                (APPR to)
                (NA done))
)

 

Imperative clauses (IP-IMP)

Imperative clauses are labelled IP-IMP rather than IP-MAT:

(IP-IMP (NP-OB1 (NA (NA Vader)
                    (KON vnde)
                    (NA moder)))
        (VVIMP sulle)
        (NP-SBJ (PPER ghi))
        (VVINF eeren)
)
  • They must include a verb in imperative form (VVIMP).
  • Unlike IP-MATs and IP-SUBs, they do not require a subject.

 

3. Phrases

[up]

Noun phrases (NP)

Extended labels

Argument noun phrases must have one of the following extended labels:

  • -SBJ: subject
  • -OB1: direct or only object
  • -OB2: indirect or second object
  • -PRD: subject complement with certain verbs (e.g. wēsen, wērden, blīven, hēten); also for ADJPs
  • -SMC: object complements with certain verbs (e.g. maken, nomen); also for ADJPs

Non-argument noun phrases can have one of the following extended labels:

  • -ADT: adjunct (also for ADJPs)
  • -LOC: locative
  • -TMP: temporal
  • -VOC: vocative
  • -POS: possessor
  • -COM: bare noun phrase complement of noun or adjective

Nominal compounds

  • In noun-noun compounds, both components are tagged as nouns (NA) and are treated as sisters of each other.

TODO: example here

(NP-SBJ (N somer)
        (N tyme)
)

Proper nouns

Proper nouns are tagged as NE:

(NP (NE Gwido)         ← personal name
    (PP (APPR von)
        (NP (DDARTA den)
            (NA sulen)))
)
(NP-PRN (DDARTA de)
        (NA hertoge)
        (PP (APPR von)
            (NP (NE bruneswich))) ← place name
)
Duderstadt 1    
  • Note that titles are tagged as ordinary nouns (NA) or adjectives (ADJ*) = and are treated as sisters to the proper name:
(NP-SBJ (NA Mester)
        (NE tulius)
)
(NP-SBJ (ADJA Sinte)
        (NE Lucas)
)

Adjectival nouns

There is no special treatment for adjectival nouns; they are just treated as ordinary standalone adjectives (ADJS):

(NP-SBJ (DDARTA de)
        (ADJS besten)
)
(NP-OB1 (DDARTA dat)
        (ADJS beste)
)

Reflexive pronouns

Forms of the reflexive pronoun sik are tagged as PRF:

(IP-MAT (NP-SBJ (PPER He))
        (VVFIN maket)
        (NP-OB1 (PRF sick))
        (ADJP-PRD (ADJD runt)
                  (PP (KOKOM also)
                      (NP (DIARTA een)
                          (NA kloet))))
)

Nominal predicates

Nominal predicates take one of two labels:

  • NP-PRD
  • NP-SMC

NP-PRD

Use of NP-PRD broadly follows the YCOE guidelines: see here

NP-PRD is restricted to subject complements of the following verbs:

  • wēsen `be’
  • wērden `become’
  • blīven `become’
  • hēten `be called’
  • dunken `seem’
  • sprēken `means, equates to’
  • kosten `cost’
  • wegen `weigh’
(IP-MAT-SPE (NP-SBJ (PPER Du))
            (VVFIN bist)                    ← BE
            (NP-PRD (DIARTA eyn)            ← subject complement
                    (NA mester)
                    (PP (APPR to)
                        (NP (NE israhel))))
) 
(IP-MAT ....
        (ADVP (AVKO so))
        (VVFIN wert)                     ← BECOME
        (NP-SBJ (NP-POS (NE Janikels)) 
                (NA dochtersone))
        (NP-PRD (DPOSA vnse)             ← subject complement
                (NA here))
)
(IP-MAT (NP-SBJ (DPOSA sin)
                (NA sone))
        (VVFIN bleff)                   ← BECOME
        (NP-PRD (DIARTA eyn)            ← subject complement
                (NA arue)
                (NP-POS (DDARTA des)
                        (NA landes)))
)
(IP-MAT (KON Vnde)
        (NP-SBJ (PPER et))
        (VMFIN mach)
        (ADVP (AVD wal))  
        (VVINF heten)                   ← BE CALLED
        (NP-PRD (NA spieghel)           ← subject complement
                (NP-POS (DDARTA der)
                        (NA leyen)))
)
(IP-MAT (KON vnd)
        (NP-SBJ *con*)
        (VVFIN heyt)
        (NP-OB1 (PPER ot))
        (NP-SMC (NE Spartagus))
)

(IP-MAT (NP-SBJ (DPDS dat))
        (VVFIN sprekt)            ← MEAN
        (NP-PRD (NA hundeken))    ← subject complement
)
(IP-MAT (NP-SBJ (PPER Ot))
        (VMFIN moste)
        (VVINF kosten)           ← COST
        (NP-PRD (CARDA hundert)  ← subject complement
                (NA mark))
)
(CP-ADV (KOUS dat)
        (IP-SUB (NP-SBJ (DDARTA de)
                (NA tunne))
                (VMFIN schall)
                (VVINF weghen) ← WEIGH
                (NP-PRD (CARDA verteyn) ← subject complement
                (NA lispunt)))
)

N.B. This also applies to these verbs in the passive:

(IP-SUB (NP-SBJ (PPER ick))
        (NP-PRD (NP-POS (DDA alsodanen))    ← subject complement
                        (NA mans)
                (ADJA eghentlike)
                (NA brud))
        (VAFIN sy)
        (VVPP ghewesen)     ← BE (passive)
)
(IP-MAT (KON vnde)
        (NP-SBJ (DDARTA dat)
                (NA wort))
        (VAFIN is)
        (NP-PRD (NA ulesch))     ← subject complement
        (VVPP gheworden)         ← BECOME (passive)
)
(CONJP (KON vnde) 
       (IP-SUB (NP-SBJ *T*-1)
               (VVFIN is)
               (PP (ADVP (PAVD dar))
                   (PAVAP vmme))
               (VVPP gheheten)               ← BE CALLED (passive)
               (NP-PRD (DIARTA een)          ← subject complement
                       (NA spieghel)
                       (NP-POS (DDARTA der)
                               (NA leyen))))
) 

NP-SMC

NP-SMC is used for small clause nominals construed with the object. We have generally replaced the IP-SMC category from the PPCHE. This is done to cut down on the number of long-distance dependencies generated by syntactic transformations like passivization, and also the freer word order of MLG.

Instead, we use the alternate annotation scheme described here. The first point is that, generally, small clause complements are not treated as a constituent. Their treatment is assimilated to that of predicate nominals.

NP-SMC is used for object complements of the following verbs:

  • maken `make’
(IP-MAT (ADVP (AVKO so))
        (VVFIN makest)          ← MAKE
        (NP-SBJ (PPER du))
        (NP-OB1 (PPER dy)
                (PTKN suluen))
        (NP-SMC (NA gud))       ← object complement
)
  • nomen `name’
(PP (APPR Jn)
    (NP (DIARTA enen)
        (NA boeke)
        (CP-REL (WNP-1 (DPRELS dat))
                (IP-SUB (NP-OB1 *T*-1)
                        (NP-SBJ (DPIS men))
                        (VVFIN noemet)             ← NAME
                        (NP-SMC pastorael))))      ← object complement
)
(IP-MAT (PP (APPR Von)
            (NP (DDA dussen)
                (NE heber)))
        (VVFIN nomet)            ← NAME
        (NP-OB1 (PRF sek))
        (NP-SBJ (DDARTA de)
                (NA joden))
        (NP-SMC (NE Hebreos))    ← object complement
)
  • hēten `call’
(IP-MAT (ADVP (AVD Jocto))
        (PTKNEG ne)
        (VVFIN hete)             ← CALL
        (NP-SBJ (PPER ik))
        (NP-OB1 (PPER iv))
        (PTKNEG nicht)
        (NP-SMC (NA knechte))    ← object complement
)
  • wēgen `weight up as’

TODO: example

NP-MSR

  • NP-MSR is used for measure phrase modifiers of a noun or adjective,
(ADJP-PRD (NP-MSR (XY cxxx)
                  (NA iar))
          (ADJD olt)
)
(ADVP (NP-MSR (DIARTA eyn)
              (NA veerndel)
              (NP-POS (NA Jares)))
      (AVD tovoren)
)

Genitive noun phrases at clause-level

Four types of genitive can occur at the clause level:

  • Predicative genitives (NP-PRD (NP-POS X))
  • Clausal adjuncts (NP-ADT)
  • Temporal adverbials (NP-TMP)
  • Genitive objects of certain verbs (NP-OB1)

Note that there are no (unmoved) NP-POS daughters of an IP. In some cases, a genitive may raise out of an NP; in this case it can appear as the daughter of an IP, but is always coindexed with an *ICH* trace indicating its base position:

TODO: insert example here

Predicative genitives (NP-PRD (NP-POS X))

Predicative genitives are tagged as NP-POS inside NP-PRD:

    ( (IP-MAT (KON vnde)
              (CP-FRL-1 (WNP-2 (DPWS wat))
                        (IP-SUB (NP-OB1 *T*-2)
                                (NP-SBJ (DDARTA dey)
                                        (NA Rayt))
                                (NP-ADT (DDARTA des)) 
                                (VVFIN nymet)))
              (!!ED!! $.$)
              (NP-POS-3 (CP-FRL *ICH*-1)              
                                                          
                        (DPRELS des))
              (VVFIN is)
              (NP-PRD (NP-POS (DDARTA des)            ← predicative genitive
                              (NA richteres)))
              (NP-SBJ (NP-POS *ICH*-3)
                      (DDARTA dey)
                      (ADJA derde)
                      (NA deyl))
              ($; .)
              (!!ED!! $.$)))
"And whatever the council takes [<sub>1</sub> because of this matter ], one third [<sub>2</sub> of it ] is [<sub>3</sub> the judge's ]"

Clausal adjuncts (NP-ADT)

Clausal adjuncts (usually similar in meaning to an English prepositional phrase such as “about…”) are annotated as NP-ADT:

    ( (IP-MAT (KON vnde)
              (CP-FRL-1 (WNP-2 (DPWS wat))
                        (IP-SUB (NP-OB1 *T*-2)
                                (NP-SBJ (DDARTA dey)
                                        (NA Rayt))
                                (NP-ADT (DDARTA des)) ← NP-ADT
                                (VVFIN nymet)))
              (!!ED!! $.$)
              (NP-POS-3 (CP-FRL *ICH*-1)           
                                                         
                        (DPRELS des))
              (VVFIN is)
              (NP-PRD (NP-POS (DDARTA des)           
                              (NA richteres)))
              (NP-SBJ (NP-POS *ICH*-3)
                      (DDARTA dey)
                      (ADJA derde)
                      (NA deyl))
              ($; .)
              (!!ED!! $.$)))
"And whatever the council takes [<sub>1</sub> because of this matter ], one third [<sub>2</sub> of it ] is [<sub>3</sub> the judge's ]"

Temporal adverbials (NP-TMP)

Genitive NPs which function as temporal adverbials at clause-level are treated as NP-TMP:

TODO: example here

Genitive objects of certain verbs (NP-OB1)

Certain verbs in MLG take a genitive argument as their object. This is particularly common when a verb is negated or has an inherent negative meaning, e.g.:

  • bekennen, e.g. …dat se des bekende myt ghuden willen (Stralsund)
  • beropet, e.g. Swelich man sich sines tvges beropet umber gelt. (Braunschweig)
  • bidden, e.g. Biddet he enis echten dinghis… (Duderstadt1)
  • danken, e.g. und *con* dankende ome sins spels (Engelhus)
  • don, e.g. Swaz de rat tot mit dher stat willen… (Duderstadt1)
  • entberen, e.g. he wolde leuer der kindere enbern (Engelhus)
  • entgan, e.g. …se mogen is bat entgan mit ires eneshant (Braunschweig)
  • gelden, e.g. Swaz so en man eneme gaste ghelden sal… (Duderstadt1)
  • hebben, e.g. …ne mach he sines waren nicht hebben (Braunschweig)
  • hȍden, e.g. vnd *con* hodde der skap (Engelhus)
  • lösen, e.g. Losede vy es oc nicht binnen tuen jaren… (Stralsund)
  • los werden, e.g. dat sine borgere des loftes nummer los worden (Engelhus)
  • loven, e.g. Swaz man vor dren ratmannen louit… (Duderstadt1)
  • melden, e.g. en iunchurowe eder en vrouwe de hemelich iuwes kapitteles melde eynem vromeden personen (Stralsund)
  • plegen, e.g. Swelich borghere nenis rechtis wil pleghen… (Duderstadt1)
  • vorkopen, e.g. Swaz en man vorloft beneden eneme scillinghe… (Duderstadt1)
  • underwinden, e.g. Neman ne mach sich nener inning oder Werkes vnderwinden (Duderstadt1)
  • vorgeten, e.g. dar vmme forgot se sir (Engelhus)
  • vormögen, e.g. Were ok sake dat wy des nicht en vormochten… (Bremen)
  • vorsaken, e.g. Swelich man deme anderen sculdich is und es ime vorsaket… (Duderstadt1)
  • vorwinnen, e.g. Wert he is verwunnen met den screimannen na rechte… (Braunschweig)
  • warden, e.g. vnde darselves scolen se des warden mit voller macht (Stralsund)
  • weigeren, e.g. do Joachim des tins weygerde (Engelhus)
  • wunschen, e.g. dat alle lude sins does wunscheden ane eyn olt vrowe (Engelhus)

 

Complement NPs of nouns and adjectives (NP-COM)

Where a noun or an adjective takes a bare noun phrase complement, the complement is treated as NP-COM:

(ADJP-SMC (ADJA uol)
          (NP-COM (NA (NA gnade)
                      (KON vnde)
                      (NA warheit)))
)
Buxtehuder
  • TODO: list of nouns/adjectives in MLG which take bare noun phrase complements, e.g. `half’ etc.

 

NP-POS

TODO: constrain precisely for which genitives this is used and for which not.

Adjective phrases (ADJP)

Usually, adjectives do not project an ADJP and are just sisters to the noun which they modify.

An ADJP is only projected when:

  • An adjective is not contained within an NP because it is predicative or extraposed.
  • An adjective within an NP is itself modified.
  • An adjective within an NP takes a complement.
(IP-MAT (NP-SBJ (PPER he))
        (VVFIN is)
        (ADJP-PRD (ADJD edel)) ← predicative ADJ projects an ADJP
        ...
)
(ADJP (AVD seer)    ← modifier of ADJ
      (ADJA armode)
)
(ADJP (PP (APPR in)
          (NP (DIA allen)
              (NA dingen)))
      (ADJD loflyk)
)

Transitive adjectives

Transitive adjectives head an ADJP and take an NP complement:

(ADJP (NP-COM (DIA alles)
              (NA dinges))
      (ADJD luckich)       ← transitive adjective
)
(ADJP (NP-COM (DDARTA de)
              (NA werlt))
      (ADJD half)         ← transitive adjective
)

Extended tags for ADJP

Note that any ADJP at the clause level should have one of the following dashtags:

  • ADJP-PRD
  • ADJP-SMC
    (See under Noun Phrases sections for use of the extended tags -PRD and -SMC)

 

Quantifier phrases (QP)

Usually, quantifiers do no project a QP and are host sisters to the noun which they modify.

A QP is only projected when:

  • A quantifier is not contained within an NP because it is extraposed.
  • A quantifier within an NP is itself modifier. TODO: insert example of this
(IP-MAT (NP-SBJ (PPER wi)
                (QP *ICH*-1)
        (VMFIN moten)
        (QP-1 (DIA alle)
        (PP (APPR van)
            (NP (DDARTA den)
                (NA gude)
        (VVINF leuen)
)

Number phrases (NUMP)

We follow the Penn scheme in using the label NUMP for multi-word numbers, or numbers modified in some way:

(NUMP (CARDN hundert)
      (KON vnde)
      (CARDN dre)
      (KON Vn)
      (CARDN viftich)
)
Buxtehuder
  • Note that the internal structure of NUMP is typically flat.

 

Adverb phrases (ADVP)

Eventually, adverb phrases will usually take one of the following extended labels:

  • -LOC: locative
  • -DIR: directive
  • -TMP: temporal

Usually, ADVPs consist of a single adverb, possibly with a modifier. In other words, adjacent and functionally equivalent adverbs are separated into different ADVPs:

TODO: insert example

Pronominal adverbs

Pronominal adverbs e.g. dahin, daher etc. are treated as structurally similar to darum, davon etc. (see below). The d- element is tagged as PAVKO.

(IP-MAT (ADVP (ADVP (PAVKO dare))
              (AVD hon))
)

Prepositional phrases (PP)

Prepositions can take the following categories as their complement:

  • an NP unmarked for function
  • a CP
  • an ADVP-LOC

Note that pronominal adverbs (e.g. darumme) are treated as PPs:

(PP (ADVP (PAVKO Dar))
    (PAVAP vmme)
)   

Pronominal adverbs can also be discontinuous:

  • The d-element projects an ADVP which is treated as having been fronted out of the PP.

  • The d-element is traced into the PP as the complement of the head preposition (`PAVAP’).

(IP-MAT (ADVP-1 (PAVD Daer)) 
        (NP-OB2 (PPER vns))
        (VAFIN is)
        (PP (ADVP *ICH*-1)
            (PAVAP van))
        (VVPP ghecomen)
        (NP-SBJ (ADJA manich)
                (leet))
)

Double prepositions

MLG has a range of `double prepositions’:

  • bet an
  • bet uppe
  • van an
  • wente an
  • wente to
  • wente uppe

In such cases, both prepositions are tagged APPR, are immediately dominated by the PP and are sisters to each other and the NP complement:

(PP (APPR wente)
    (APPR an)
    (NP (NP-POS (NA godes)
        (NA ghebort)))
)

Co-occuring prepositions and adpositions

(PP (APPR bet)
    (ADVP (AVD bauen))
    (APPO an)
)
`up to the top'

Wh-phrases (W*P)

There are four types of wh-phrase:

  • WADJP
  • WADVP
  • WNP
  • WPP

A wh-phrase is projected whenever we have a wh-word (AVW).

The internal structure of wh-phrases is in principle identical to that of their non-wh-counterparts (ADJP, ADVP, NP etc.). However the content of a wh-phrase can often be 0, indicating an `empty’ operator. For 0 wh-phrases, see elsewhere.

WADJP

A WADJP with overt content is common in embedded interrogatives (CP-QUE). The ADJP is traced into the IP-SUB.

(IP-MAT ...
        (CP-QUE (WADJP-1 (AVW wo)
                         (ADJD swaer))
                 (IP-SUB (ADJP-PRD *T*-1)
                         (NP-SBJ (DDARTA de)
                                 (NA sunde))
                         (PP (APPR vor)
                             (NP (NA gode))
                         (VVFIN weer))))
)

WADVP

A WADVP with overt content is common in embedded interrogatives (CP-QUE). The ADVP is traced into the IP-SUB:

(IP-MAT (NP-SBJ (DPDS Dit))
        (VVFIN is)
        (CP-QUE (WADVP-1 (AVW waer))
                (IP-SUB  (ADVP *T*-1)
                         (NP-SBJ (NE Adam))
                         (ADVP (AVD erst))
                         (VVPP ghemaket)
                         (WAFIN woerd)))
...
)

A WADVP with overt content is also common in relative causes (CP-REL). Again, the ADVP is traced into the IP-SUB:

(NP-OB1 (DDARTA den)
        (NA ortsprunck)
        (CP-REL (WADVP-1 (PAVREL dar))
                (IP-SUB  (NP-SBJ (PPER du))
                         (PP (ADVP *T*-1)
                             (PAVAP van)
                         (VVPP ghekamen)
                         (VAFIN byst))))
)

wen projects a WADJP when it introduces an adverbial clause (CP-ADVP). The ADVP is traced into the IP-SUB:

(IP-MAT (KON vnde)
        (NP-SBJ (PPER yck))
        (PTKNEG nicht)
        (PP (ADVP (PAVD dar))
            (PAVAP vp))
        (VVPP gedacht)
        (VAFIN hadde)
        (CP-ADV (WADVP-1 (AVW wen))
                (IP-SUB (ADVP-TMP *T*-1)
                        (NP-SBJ (PPER ik))
                        (NP-OB1 (PPER id))
                        (PTKN suluen)
                        (PTKNEG nicht)
                        (VVPP geseen)
                        (VAFIN hadde)))
)

WNP

A WNP with overt content is common in embedded interrogatives (CP-QUE). The WNP is traced into the IP-SUB:

(CP-QUE (WNP-1 (DPWS wath))
        (IP-SUB (NP-SBJ *T*-1)
                (VVFIN duncket)
                (NP-OB2 (PPER dy))
                (PP (APPR van)
                    (NP (DPOSA myner)
                        (NA brud))))
)

A WNP with overt content commonly occurs in relative causes (CP-REL). The WNP is traced into the IP-SUB:

(NP-OB1 (DPOSA syne)
        (ADJA erste)
        (NA brud)
        (CP-REL (WNP-2 (DPRELS de))
                (IP-SUB (NP-SBJ *T*-2)
                        (NP-OB2 (PPER eme))
                        (PTKNEG nicht)
                        (ADJP-PRD (ADJD edel)
                                  (AVD ghenoch))
                        (PTKNEG en)
                        (VVFIN was)))
)

WPP

A WPP with overt content is common in embedded interrogatives (CP-QUE). The WPP is traced into the IP-SUB:

(CP-QUE (WPP-1 (WADVP (PAVW waer))
               (PAVAP vmme))
        (IP-SUB (PP *T*-1)
                (NP-OB2 (PPER vns))
                (NP-OB1 (DDARTA den)
                        (DPOSA syn))
                (VVPP verleent)
                (VAFIN heft))
)

A WPP with overt content can also occur in relative clauses (CP-REL). The WPP is traced into the IP-SUB:

(CP-REL (WPP-1 (WADVP (PAVREL dar))
               (PAVAP mede))
        (IP-SUB (PP *T*-1)
                (NP-SBJ (PPER ick))
                (VVINF kamen)
                (VMFIN mochte)
                (PP (APPR to)
                    (NP (DPOSA myner)
                        (ADJA begerliken)
                        (NA leue))))
)

4. Special structures by word

[up]

alene

As a focus particle, alene is just tagged as an ordinary ADJV which projects an ADVP which in turn modifies a noun:

(NP (DDARTA dat)
    (NA gelt)
    (ADVP (ADJV alene))
)

(al)sō

The word (al)sō is assigned one of a variety of POS-tags, in accordance with the following guidelines:

  • AVKO (‘conjunctional adverb’) when it appears at clause-level, in which case it projects an ADVP.
(IP-MAT (ADVP (AVKO Also))          ← tagged AVKO, projects ADVP
              (VAFIN leth)
              (NP-SBJ (PPER he))
              (VVINF beropen)
              ...
)
  • PTKA (‘pre-adjectival/adverbial particle’) when it modifies an adjective or adverb.
(IP-MAT (KON Vnde)
        (NP-SBJ *con*)
        (VVFIN maken)
        (NP-OB1 (DDARTA de)
                (NA memorie))
        (ADJP-SMC (PTKA also)       ← tagged PTKA
                  (ADJD verwoet)    ← modified ADJ
                  (CP-DEG (KOUS Dat)
                          ...))
)
  • PTKG (‘generalising particle’) when it modifies a wh-word.
(CP-FRL (WNP-1 (PTKG so)           ← tagged PTKG
               (DPRELS wat))       ← wh-word
        (IP-SUB (NP-OB1 *T*-1)
                (NP-SBJ (PPER he))
                (VVFIN horet))
)
  • KOKOM when it heads a PP containing a term of comparison.
(IP-MAT (KON Vnde)
        (NP-SBJ *con*)
        (VMFIN scal)
        (VVINF dorren)
        (PP (KOKOM also)          ← tagged KOKOM
            (NP (DDARTA de)
                (NA winlode)))
)
(PP (KOKOM also)              ← tagged KOKOM
    (CP-CMP (WADVP-1 0)
            (IP-SUB (ADVP *T*-1)
                    (NP-SBJ (PPER wy))
                    (ADVP-TMP (AVD vore))
                    (VAFIN hadden)
                    (VVPP ghedan)))
)
  • KOUS when it introduces a manner adverbial clause.
(IP-IMP-SPE (VVIMP maket)
            (ADVP (ADJV recht))
            (NP-OB1 (DDARTA den)
                    (NA wech)
                    (NP-POS (DDARTA des)
                            (NA heren)))
            (ADVP (KOUS also)                   ← tagged KOUS
                  (CP-ADV (WADVP-1 0)
                          (IP-SUB (ADVP *T*-1)
                                  (NP-SBJ (NE ysayas)
                                          (NP-PRN (DDARTA de)
                                                  (NA prophete)
                                  (VVFIN sprak))))))
)

N.B. A CP-ADV may also be headed by alsō when it introduces a reason clause (“as/since she watered the flowers, they grew taller”) or a temporal clause (“as she watered the flowers, she sang”). In these cases, it is assigned the lemma “alsō2” in order to aid the corpus validity checks in distinguishing it from the above kinds of alsō.

amen

amen is tagged as an interjection (INTJ) and attaches at sentence-level:

(FRAG (PP (APPR in)
          (NP (NP-POS (NA godes))
              (NA namen)))
      (INTJ amen)           ← tagged INTJ
)

bēde

The word bēde is assigned one a variety of POS-tags, according to the following guidelines:

  • DIA/DID/DIN (‘indefinite determiner’) when it functions as a determiner.
(PP (APPR van)
    (NP (DIA beiden)   ← tagged DIA
        (NA saken))
)
(NP-OB1 (PPER vns)
        (DIN beiden)   ← tagged DIN
)
  • KON (‘coordinating conjunction’) when it introduces a correlative conjunction structure.
(NP-OB1 (KON beide)           ← tagged KON
        (NP (ADJA hillighe)
            (NA scrift))
        (CONJP (KON vnde)
               (NP (ADJA natuerlike)
                   (NA scrift)))
)

dat

The word dat is given one of various treatments in accordance with the following guidelines:

  • DPDS when it is a demonstrative pronoun.
(IP-MAT (NP-SBJ (DPDS dat))        ← tagged DPDS
        (VVFIN is)
        (ADJP-PRD (ADJD openbaer))
)
  • DDARTA when it is a determiner.
(NP-OB1 (DDARTA dat)        ← tagged DDARTA
        (ADJA ewighe)
        (NA leuen)
)
  • DPRELS when it functions as a relative pronoun.
(NP-OB1 (DDARTA dat)
        (NA selue)
        (CP-REL (WNP-1 (DPRELS dat))    ← tagged DPRELS
                (IP-SUB (NP-SBJ *T*-1)
                        (PP (APPR in)
                            (NP (DDARTA der)
                                (NA werlde)
                        (VVFIN is)))))
)
  • KOUS when it introduces a subordinate clause.

KOUS dat projects a CP-THT if it introduces a that-complement.

(IP-MAT (NP-SBJ (PPER He))
        (VMFIN wolde)
        (CP-THT (KOUS dat)                  ← tagged KOUS
                (IP-SUB (NP-SBJ (PPER wi))
                        ...))
)

KOUS dat projects a CP-DEG if it introduces a degree complement.

(ADVP (PTKA also)
      (ADJV veren)
      (CP-DEG (KOUS dat)                  ← tagged KOUS
              (IP-SUB (NP-SBJ (PPER se))
                      (PTKNEG nicht)
                      ...))
)

KOUS dat projects a CP-ADV if it is a purpose or reason clause with the verb in the subjunctive.

(IP-IMP (ADVP (AVD Nu))
        (VVIMP help)
        (NP-OB1 (PPER my))
        (NP-VOC (ADJA leue)
                (NE ihesus))
        ...
        (CP-ADV (KOUS Dat)                  ← tagged KOUS
                (IP-SUB (NP-SBJ (PPER ic))
                        (ADVP (AVD nv))
                        (VVFIN volghe)
                        (NP-OB1 (DDARTA den)
                                (ADJA rechten)
                                (NA pat))))
)
  • In exceptional cases, dat is tagged KON (coordinating conjunction) when it appears at the beginning of a matrix clause.
    (IP-MAT (KON men)
            (KON dat)              ← tagged KON
            (NP-SBJ (PPER he))
            (VVFIN merkede)
            (NP-OB1 (NP (DPOSA ere)
                        (NA yoget))
                    (CONJP (KON vnde)
                           (NP (NA schonheit)
                               (NP-POS (DPOSA eres)
                                       (NA lyues))))
                    (CONJP (KON Unde)
                           (NP (NA grotheyt)))
                    (CONJP (KON vnde)
                           ...))
     )

dewīle

  • When it is a sentential adverb, dewīle is tagged DDARTA+NA and is treated as heading an NP-TMP.
(IP-MAT (NP-SBJ (NA Her)
                (NE wolter))
        (VAFIN leth)
        (ADVP (AVKO ock))
        (NP-TMP (DDARTA+NA dewile)) ← tagged DDARTA+NA
        (VVINF maken)
        (NP-OB1 (NP (ADJA guldenne)
                    (NA rynghe))
                ...)
)
  • When dewīle introduces a subordinate clause, it is tagged as KOUS and is treated like similar items (e.g. also, do).
(ADVP-TMP (KOUS dewile)        ← tagged KOUS
          (CP-ADV (WADVP-0)
          (IP-SUB (ADVP-TMP *T*-1)
                  (NP-SBJ (PPER se))
                  ...))
)
  

ēnander

Unlike the reflexive pronoun sik which is consistently tagged PRF (‘reflexive personal pronoun’) in the corpus (see below), ēnander is just tagged PPER (‘irreflexive personal pronoun’).

TODO: example

genōch

  • genōch is tagged as a noun (NA) when it looks nominal and projects an NP.
(NP-OB1 (NP-POS (NA gheldes))
        (NA ghenoch)          ← tagged NA
)
  • Otherwise it is tagged as an adverb (AVD).

TODO: example

hen

hen is tagged PTKVZ (‘verbal particle, separable’) and does not project a phrase when it occurs at sentence-level:

(IP-MAT (ADVP (AVD aldus))
        (VVFIN ga)
        (NP-SBJ (PPER ick))
        (PTKVZ hen)         ← tagged PTKVZ
)

item

item has one of two POS-tags according to context:

  • When it prompts subject-verb inversion (V2) it should be tagged as AVKO (‘conjunctional adverb’), in which case it projects an ADVP.

TODO: example here

  • When it does not prompt subject-verb inversion it should be tagged as KON (‘conjunction’).

TODO: example here

  • Note: item is not tagged FM (‘foreign material’), despite the fact that it is of Latin origin.

 

lēf

  • In the construction lēf hebben, the word lēf is tagged as PTKVZ (‘verbal particle, separable’).
(IP-MAT-SPE (NP-OB1 (DPDS dusse))
            (VMFIN scholle)
            (NP-SBJ (PPER gy))
            (PTKVZ leff)         ← tagged PTKVZ
            (VVINF hebben)
)
(Griseldis)
  • In other uses, such as with wēsen, it is an adjective (ADJ*).
(IP-MAT (NP-SBJ (DDARTA Dusse)
                (NA rede))
        (VVFIN weren)
        (NP-OB2 (DDARTA den)
                (NA ridderen))
        (ADJP-PRD (ADJD lef))  ← tagged ADJD
)
(Zeno)

mehr

When mehr is used in a comparative construction with als, it is treated as follows:

(NP-OB1 (DPIS mehr)
        (PP (KOKOM als)
            (NP (OA twelff)
                (CARDA teen)
                (NA molte)))
)

noch

The word noch is given one of various treatments, as follows:

  • AVD when it is a straightforward sentential adverb, in which case it projects an ADVP.
(IP-MAT ...
        (ADVP (AVD Wol))
        (VVFIN bist)
        (NP-SBJ (PPER u))
        (ADVP (AVD noch))         ← tagged AVD and projects an ADVP
        (PP (APPR in)
            (NP (ADJA bloidender)
                (NA tyd)))
)
  • PTKA when it modifies an adjective or adverb, and is sister to the adjective/adverb.
(ADVP (ADVP (PTKA noch)  ← tagged PTKA
            (ADJV meer))
      (AVD allene)
)
  • KON when it functions as a coordinating conjunction (the example here also features correlative coordination, see elsewhere)
(NP-OB1 (KON noch)        ← tagged KON
        (NP (DDARTA den)
            (NA armen)
        (CONJP (KON noch)       ← tagged KON
               (NP (DDARTA den)
                   (NA riken)))
)

sik

The part of speech tag PRF should only be assigned to the lemma sik.

(IP-MAT (ADVP (AVKO Doch))
        (VAFIN hadde)
        (NP-SBJ (PPER se))
        (NP-OB1 (PRF syck))     ← tagged PRF
        (VVPP vorgenomen)
        (ADVP (AVD duldichliken))
        ...
)
(Griseldis)

N.B. Some (but not all) ReN texts also assign PRF when a pronoun other than sik happens to be coreferential with the subject, but in the CHLG these are tagged PPER.

swelich

The item swelich is treated as two separate tokens (s- -welich). S- is tagged as PTKG (‘generalising particle’) and -welich is tagged as DWA:

(WNP (PTKG S)
     (DWA welich)
     (NA voget)
)
(Braunschweig)

sülve

The word sülve is generally tagged PTKN.

(NP-SBJ (NA got)
        (PTKN seluen)
)
(PP (APPR By)
    (NP (PPER my)
        (PTKN seluen))
)
  • Note: when tagged as PTKN and occurring at sentence-level, sülve does not project a phrasal category (i.e. is immediately dominated by IP-MAT).
(IP-MAT (NP-SBJ (PPER Wi))
        (VMFIN (solden))
        (PTKN seluen)
        (ADVP (ADVP (AVD seer))
              (AVD luttick))
        (VVINF beholden)
)
  • Note: when sülve appears with a definite article (DDARTA), it is tagged as ADJ*.
(NP-SBJ (DDARTA dey)
        (ADJA selue)
        (NA richtere)
)
(NP-OB1 (DDARTA deme)
        (ADJS suluen)
)

sunder

The item sunder is treated in one of various ways, as follows:

  • APPR when it is a preposition (‘ohne’, ‘außer’), in which case it projects a PP.
(IP-MAT (PP (ADVP (PAVKO Dar))
            (PAVAP na))
        (VVFIN schededen)
        (NP-SBJ (PPER se))
        (NP-OB1 (PRF syk))
        (PTKNEG nicht)
        (PP (APPR sunder)    ← tagged APPR
            (NP (ADJA grot)
                (NA leyt)))
)
  • KON when it functions as a coordinating conjunction (‘sondern’, ‘aber’).
(IP-MAT (KON Sunder)        ← tagged KON
        (NP-SBJ (PPER se))
        (VVFIN bleuen)
        (PP (APPR by)
            (NP (PPER eme)))
        (NP-TMP (DDARTA den)
                (NA dach))
)

vnde

The word vnde is tagged as:

  • KON when it functions as a coordinating conjunction.
(IP-MAT (KON vnde)          ← tagged KON
        (NP-SBJ (DDARTA dat)
                (NA wort))
        (VVFIN was)
        (PP (APPR bi)
            (NP (NA gode)))
)
  • AVD when it functions as a sentence adverb with the meaning ‘also’, in which case it projects an ADVP.
(IP-MAT (KON vnd)              ← tagged KON
        (PP (ADVP (PAVD dar))
            (PAVAP na))
        (VVFIN kam)
        (ADVP (AVD vnd))        ← tagged AVD
        (NP-SBJ (NE Jafeth))
)

wane

  • wane should be tagged as a preposition (APPR) when it introduces a noun phrase. The noun phrase is annotated as the complement of wane.
(PP (APPR wane)
    (NP (CARDA ver)
        (NA scilling))
)

TODO: presumably this also applies to dan when it introduces a noun phrase?

von NP wegen

  • This construction is a treated as a PP headed by preposition (APPR) von.

  • wegen is tagged as a noun (NA) which heads an NP which is the complement of the PP headed by von.

  • The NP preceding wegen is tagged as NP-POS and is a complement of the head noun wegen.

(PP (APPR von)
    (NP (NP-POS (DPOSA orer)
                (ADJA eyghen)
                (NA sunde))
        (NA wegen))
)

wente

For finite clauses introduced by wente which are are ambiguous and cannot be labelled as either matrix (IP-MAT) or subordinate (IP-SUB), we use a novel label, IP-X.

This applies to wente-clauses which are unambiguously V2:

TODO: parsed example, e.g. wente dit is godes sone

This also applies to wente-clauses where the verb position is hard to diagnose, since there are only two constituents in the clause:

TODO: parsed example, e.g. wente he kam

Note: wente-clauses which are clearly verb-final

TODO: parsed example

(see LREC paper for more details).

weyt (with cardinals)

  • weyt is tagged as an AVD which projects an ADVP.

  • The ADVP is a sister of the cardinals (CARD*).

  • The cardinals and the ADVP headed by weyt all sit within a NUMP (the label used for complex numbers, see elsewhere).

(NP-PRD (NUMP (CARDA seuentich)
              (ADVP (AVD weyt))
              (CARDA seuen))
        (NA weruen)
)