1. General Issues
Token boundaries
Broadly speaking, one token equates to one matrix sentence (IP-MAT
), including an embedded clause if present. But note the following:
- When two independent finite clauses are conjoined, the two clauses are treated as separate tokens:
(IP-MAT (NP-OB1 (DPDS Dat)) ← first independent clause as a token
(VVFIN beleueden)
(NP-SBJ (PPER se)
(DIN alle))
(PP (APPR myt)
(NP (NA willen)))
)
(IP-MAT (KON und) ← second independent clause as a token
(NP-SBJ *con*)
(VVFIN scheiden)
(PP (APPR van)
(NP (PPER em)))
)
- Direct speech which constitutes an
IP-MAT
can sit within a higherIP-MAT
introducing the speech. The direct speech matrix clause gets the extended tag-SPE
:
(IP-MAT (ADVP (ADV DO))
(VVFIN sprak)
(NP-SBJ (PPER he))
(IP-MAT-SPE (NP-SBJ (PPER ik)) ← IP which is direct speech
(PTKNEG ne)
(VVFIN bin))
)
- In contrast to the treatment of direct speech (see above), cases where a citation which is an
IP-MAT
is introduced by e.g. `X writes’ are treated as separate tokens:
(IP-MAT (NP-SBJ (NE Salustius)) ← first token
(VVFIN scrift)
)
(IP-MAT (NP-SBJ (DDARTA de) ← second token
(FM Troyani))
(VAFIN hebben)
(NP-OB1 (NE rome))
(VVPP ghebuwet)
)
- Chapter and section headings are treated as standalone tokens and are tagged
IP-MAT
is they constitute and independent finite clause or otherwiseFRAG
:
TODO: Tag these as CODE also?
(FRAG (PP (APPR Van)
(NP (DDARTA dem)
(NA Borchgherichte)))
)
Herford
- Places and dates given for the time of writing are also treated as standalone tokens and are tagged
FRAG
:
(FRAG (FM proximo)
(FM libro)
(FM de)
(FM ciuitate)
(FM dei)
)
Engelhus
(FRAG (XY 2.000dcclxxx)
(FM Abbon)
)
Engelhus
Token IDs
TODO: add at a later stage
Phrase structure
The annotation is encoded via labelled bracketing and is designed to facilitate efficient searching, rather than to reflect a particular analysis of Middle Low German.
Some key points:
- Relatively flat trees
- Multiple branching is possible
- No VP – the verb and its objects are sisters and immediately dominated by IP
- No intermediate phrase levels (i.e. bar-levels)
- NP arguments are distinguished from NP adjuncts at IP-level
- PP arguments are not distinguished from PP adjuncts
Heads and phrases
In general, heads project a corresponding phrase.
Categories which never project a phrase:
- verbs (
V
) - determiners (
D
) - particles (
PTK
) - single-word modifiers (see below)
- interjections (
ITJ
)
Categories which can project a phrase but can also be immediately dominated by IP:
- conjunctions (
KON
)
Phrase types which do not necessarily have a head of the same category:
IP
– since there is no I tag for verbsNP
– which may have a noun (NA
) as its head, but can also be headed by a personal pronoun (PPER/PRF
), a demonstrative pronoun (DPDS/DPIS
) or a proper noun (NE
).
A foreign word (tagged FM
) may also head a phrase, resulting in `exocentricty’:
(PP (FM Ad) ← FM heads PP
(NP (FM hebreos)) ← FM heads NP
)
Engelhus
Complements
Complements always project a phrase.
Modifiers
Modifiers are treated as daughters of the phrasal node and sisters to the head.
Note that:
- A modifier only projects a phrase when it is itself modified:
(NP-OB1 (DDARTA de)
(ADJA ersten) ← modifier which is not further modified; does not project a phrase
(NA troyen)
)
Engelhus
(NP-SBJ (ADJP (DDA sodane) ← modifier of modifier
(ADJA edel)) ← modifier which is further modified; projects a phrase
(NA land)
)
Selectional restrictions
The annotation observes certain selectional restrictions:
- Prepositions should have exactly one nominal (
NP
) or clausal (CP
) complement. - Any
NP
which is immediately contained by anotherNP
must be a nominal complement (NP-COMP
), genitive (NP-POS
) or appositive (NP-PRN
). - All finite clauses must have a subject (
NP-SBJ
), whether overt or empty. - Certain categories can occur a maximum of once per clause (but need not appear at all). This applies to direct (
NP-OB1
), indirect objects (NP-OB2
), nominal predicates (NP-PRD
) and adjectival predicates (ADJP-PRD
).
Sentence fragments (FRAG
)
The label FRAG
is employed for material which consists of at least two constituents, but which can’t be represented as a full IP.
Some things to note:
- Fragmentary clauses which standard alone are labelled as
FRAG
at the token level (i.e. are not tagged asIP-MAT
):
(FRAG (NP (DDARTA De)
(NA hystorie)
(PP (APPR van)
(NP duldicheyt)
(NP-POS (DDARTA der)
(NA vrowen)
(RRC (VVPP gheheten)
(NP-SMP (NE Griseldis))))))
)
FRAG
can also occur as an immediate daughter ofIP-MAT
, in the case of (non-clausal) direct speech:
(IP-MAT (FRAG-SPE (PTKANT Ja)) ← non-clausal direct speech
(VVFIN sprack)
(NP-SBJ (PPER se))
)
(IP-MAT (ADVP (AVD Do))
(VVFIN antworde)
(NP-SBJ (PPER he))
(NP-OB2 (PPER ene))
(FRAG-SPE (PTKANT nen)) ← non-clausal direct speech
)
(IP-MAT (ADVP (AVD Do))
(VVFIN sprak)
(NP-SBJ (PPER se))
(FRAG-SPE (NP-SBJ (DPNEGS neman)) ← non-clausal direct speech
(NP-VOC (NA here)))
)
FRAG
is also used for parenthetical clauses which contain a good deal of foreign material (FM
):
(IP-MAT (NP-SBJ (DPDS Dit))
(VMFIN sal)
(NP-OB2 (ADJA manigher)
(ADJA armen)
(NA sielen))
(VVINF deren)
(FRAG-PRN (FM id) ← parenthetical clause consisting of foreign material
(FM est)
(VVFIN schaden))
)
Foreign material (FM
)
A foreign word has the POS tag FM
:
(PP (APPR van)
(NP (DDARTA den)
(FM phariseis)) ← foreign word
)
Note that this is different to other Penn corpora, where foreign material is tagged as FW
.
Unlike in other Penn corpora, there is no phrase-level category LATIN
which immediately dominates FM
. A word tagged FM
can be dominated by any category. There are no restrictions here, even if it results in phrases which are not endocentric taking the POS tags at face value. Some examples:
(IP-MAT-SPE (NP-SBJ (PPER wi))
(VVFIN sint)
(NP-PRD (DDARTA dat)
(NA slecte)
(NP-POS (FM abrahe))) ← FM heads NP-POS
)
(CP-THT (KOUS Wente)
(IP-SUB (NP-SBJ (PPER du))
(NP-PRD (DIARTA en)
(FM samaritanus)) ← FM heads NP-PRD
(VVFIN bist))
)
(IP-IMP-SPE (NP-VOC (FM lazare)) ← FM heads NP-VOC
(VVIMP cum)
(ADVP (AVD hijr))
(PTKVZ uor)
)
Numerals (XY
)
Both Arabic and Roman numerals have the POS-tag XY
(which in the HiNTs tagset stands for a `non-word’):
(NP-OB1 (NP-POS (DDARTA des)
(NA volkes)
(PP (APPR von)
(NP (NE Sirien))))
(XY 100.000) ← Arabic numeral
)
Engelhus
(FRAG (FM Dominica)
(XY iiij) ← Roman numeral
(FM post)
(FM aduentum)
)
This contrasts with numbers which are spelled out. These are tagged CARD*
:
(NP-TMP (CARDA dre) ← spelled out number
(NA iar)
)
Engelhus
Direct speech (-SPE
)
Generally speaking, a clause or fragment which constitutes direct speech gets the extended label -SPE
:
(IP-MAT (ADVP (ADV Do))
(VVFIN sprak)
(NP-SBJ (PPER he))
(IP-MAT-SPE (NP-SBJ (PPER ik)) ← IP-MAT which is direct speech
(PTKNEG ne)
(VVFIN bin))
)
Some things to note
-
The first direct speech clause is embedded in the introductory clause, i.e. there is no token break. Subsequent direct speech clauses in a chain of direct speech are treated as separate tokens.
-
If an IP or CP which is direct speech contains a further IP or CP, then only the highest IP or CP gets the
-SPE
tag (unlike in the Helipad):
(IP-MAT-SPE (NP-SBJ (PPER Du)) ← IP-MAT-SPE
(VMFIN schalt)
(NP-OB2 (PPER di))
(PTKNEG nicht)
(VVINF wunderen)
(CP-ADV (KOUS wente) ← but not CP-ADV-SPE
(IP-SUB (NP-SBJ (PPER ik)) ← and not IP-SUB-SPE
(NP-OB2 (PPER di))
(VVPP secht)
(VAFIN hebbe)))
)
Interjections (ITJ
)
Interjections have the POS-tag ITJ
:
(IP-MAT-SPE (ITJ Ach) ← interjection
(NP-VOC (ADJA wise)
(ADJA junge)
(NA man))
...
)
Zeno
Note:
- Unlike the standard Penn scheme, we do not make use of
INTJP
for cases of multi-word interjections. These just attach directly at clause-level:
(IP-MAT-SPE (ITJ jach)
(ITJ Jach)
(NP-SBJ (NA Ot))
...
)
Zeno
Left-dislocation (-LFD
) and resumption (-RSP
)
Generally speaking, the extended label for left-dislocation (-LFD
) is applied to clausal or phrasal constituents in the left periphery which are overtly resumed by a coreferential phrase (for some specific exceptions to this see below). The coreferential phrase is then tagged with the extended label for resumption (-RSP
):
(IP-MAT (ADVP-TMP-LFD (KOUS Do) ← Left-dislocated constituent
(CP-ADV (WADVP-1 0)
(IP-SUB (ADVP-TMP *T*-1)
(NP-SBJ (PPER he))
(NP-OB1 (DPDS dit))
(VVPP gesproken)
(VAFIN hadde))))
(ADVP-TMP-RSP (AVD Do)) ← Resumptive constituent
...
)
This means that, wherever there is a constituent tagged -LFD
, there should also be one tagged -RSP
.
Left-dislocation and resumption of a subject
Where we have left-dislocation and resumption of a subject, only the resumptive constituent is tagged as a subject so as not to violate the requirement that each IP has exactly one subject.
The left-dislocated constituent is simply tagged as NP-LFD
; the resumptive constituent is tagged as NP-SBJ-RSP
:
(IP-MAT (NP-LFD (DPOSA Ere) ← left-dislocated constituent
(NA iunkfrowen)
...)
(NP-SBJ-RSP (DPDS de)) ← resumptive constituent
(VVFIN wusten)
(PTKNEG nicht)
...
)
Clauses which have more than one -LFD/-RSP
pair
Cases can arise where more than one -LFD/-RSP pair occurs in a single clause.
In such cases, the pairs are co-indexed so as to avoid mismatches:
(IP-MAT (NP-LFD-2 (FM Ieu))
(ADVP-TMP-LFD-3 (KOUS do)
(CP-ADV (WADVP-1 0)
(IP-SUB (ADVP-TMP *T*-1)
(NP-SBJ (PPER he))
(VVFIN was)
(NP-PRD-XXX (XY xxxii)
(NA iar)
(ADJP (ADJD olt))))))
(ADVP-RSP-3 (AVD do))
(VVFIN ghewan)
(NP-SBJ-RSP-2 (PPER he))
(NP-OB1 (NE Saruth))
)
Insertion of null resumptives (*-RSP 0
)
In certain left-dislocation contexts, a null resumptive element (*-RSP 0
) is inserted.
The following contexts call for insertion of a null resumptive:
- Where there is more than one left-dislocated element and a single resumptive element which could in principle pair with either one. As a default principle, the null resumptive element is inserted immediately after the first left-dislocated constituent. This insertion means that each left-dislocated constituent has a corresponding resumptive element, and the coindexation principle can be applied (see above):
(IP-MAT (CP-ADV-LFD-1 (C-V1 0)
(IP-SUB (VVFIN Is)
(NP-SBJ-2 (DPDS dat))
(ADVP (AVKO also))
(CP-THT-2 (KOUS dat) ...)))
(ADVP-RSP-1 0)
(CP-ADV-LFD-3 (C-V1 0)
(IP-SUB (VVFIN steruet)
(NP-SBJ (DDARTA der)
(DPIS eyn))))
(ADVP-RSP-3 (AVKO so))
(VMFIN sal)
...)
Ruethen
- Where there are two or mote left-dislocated constituents, of which only the latter is overtly resumed a clause-level. Again, the null resumptive element is inserted immediately after the first left-dislocated constituent by default, and the resulting
-LFD/-RSP
pairs are coindexed:
(IP-MAT (NP-LFD-1 (CP-FRL (WNP-2 (PTKG S)
(DWA welich)
(NA voget))
(IP-SUB (NP-SBJ *T*-2)
(NP-OB1 (DIARTA enen)
(NA richtere))
(VVFIN set)
(PP (APPR an)
(NP (DPOSA sine)
(NA stat))))))
(NP-RSP-1 0)
(NP-LFD-3 (CP-FRL (WNP-4 (PTKG s)
(DPWS waz))
(IP-SUB (NP-SBJ *T*-4)
(PP (APPR vor)
(NP (DPDS dheme)))
(VVPP gelent)
(VAFIN wert))))
(NP-SBJ-RSP-3 (DPDS dat))
(VMFIN sal)
...)
Braunschweig
Appositives and parantheticals (-PRN
)
The extended label -PRN
is applied to appositive and parenthetical constituents. As such, it may occur with virtually any phrase type.
- Appositive constituents are contained within the constituent to which they are in apposition:
(NP-SBJ (NE Abraham)
(NP-PRN (DPOSA ivwe) ← appositive constituent
(NA vader))
)
- Paranthetical clauses are contained within the clause to which they are paranthetical:
(IP-MAT (ADVP (AVD Oeck))
(VVFIN secht)
(NP-SBJ (NA sunte)
(NE gregorius))
(IP-MAT-PRN (NP-OB1 (DPDS dat)) ← paranthetical clause
(VVFIN weet)
(NP-SBJ (PPER ic))
(ADVP AVD wal))
...
)
Empty subjects
All finite clauses (IP-MAT
, IP-SUB
) are required to have a subject, and this is what drives the policy for including empty subjects. Infinitival clauses (IP-INF
) are not required to have a subject.
All empty subjects appear as early as possible in the clause.
There are four types of empty subject:
\*con\*
: subject elision under conjunction\*pro\*
: referential null subject\*arb\*
: TODO: do we actually use this?\*exp\*
: null expletive subject
\*con\*
: subject elision under conjunction
(IP-MAT (NP-OB1 (DPDS Dat)) ← first conjunct
(VVFIN beleueden)
(NP-SBJ (PPER se)
(DIN alle))
(PP (APPR myt)
(NP (NA willen)))
)
(IP-MAT (KON und)
(NP-SBJ *con*) ← second conjunct, with elision of subject
(VVFIN scheiden)
(PP (APPR van)
(NP (PPER em)))
)
Note:
-
\*con\*
can only be used when the elided subject is coreferential and identical in number with the subject of the precedingIP-MAT
. -
If not, then use
\*pro\*
:
(IP-MAT (ADVP-TMP (AVD Do))
(VVFIN brochte)
(NP-SBJ (DPIS me))
(PTKVZ wedder)
(NP-OB1 (DDARTA de)
(ADJA kostlike)
(NA kleder))
)
(IP-MAT (KON vnde)
(NP-SBJ *pro*) ← *pro* subject
(VAFIN wart)
(ADVP (AVD vroliken))
(VVPP entfangen)
(PP (APPR van)
(NP (DIA allen)
(NA volke)))
)
Griseldis
\*pro\*
: referential null subject
An example:
(IP-MAT (KON vnde)
(NP-SBJ *pro*) ← *pro* subject
(VAFIN wart)
(ADVP (AVD vroliken))
(VVPP entfangen)
(PP (APPR van)
(NP (DIA allen)
(NA volke)))
)
Griseldis
- Note:
\*pro\*
can also be used as a default empty subject where other types of empty subject are not appropriate, e.g. in cases of conjunction reduction where there is number mismatch and thus\*con\*
cannot be used.
\*arb\*
TODO: do we have any instances of this?
\*exp\*
: null expletive subject
A null expletive subject (*exp\*
) is inserted in contexts where an overt expletive could be expected, but is not attested (see below).
(IP-MAT (KON Vnde)
(NP-SBJ *exp*) ← null expletive
(PP (ADVP (PAVD hijr))
(PAVAP vp))
(VVFIN staet)
(PP (APPR in)
(NP (DDARTA der)
(NA glosen)))
(VVPP gheschreuen)
)
Other empty categories
There are five other empty non-subject categories:
\*t\*
: traces of wh-movement\*ICH\*
: traces of other types of movement- Empty complementisers (
C 0
) - Empty conjunctions (
KON 0
) - Empty resumptives (
*-RSP 0
\*t\*
: traces of wh-movement
Traces of this type are always co-indexed.
(CONJP (KON vnde)
(CP-QUE (WNP-1 (DPWS wat)) ← wh-phrase
(IP-SUB (NP-PRD *T*-1) ← trace
(NP-SBJ (PPER se)
(VVFIN is))))
)
\*ICH\*
: traces of other types of movement (e.g. extraposition)
Traces of this type are also always co-indexed.
(IP-SUB (NP-SBJ (PPER se))
(ADJ-PRD (ADJD ledich)
(PP *ICH*-1)) ← trace
(VVFIN was)
(PP-1 (APPR van) ← moved constituent
(NP (ADJA guden)
(NA daden)))
)
-
NB: The
\*ICH\*
is inserted as early as possible in the source constituent if the movement is upward, and as late as possible in the source constituent if the movement is downward. -
NB: Movements of these kinds are only represented if the movement in question takes the moved constituent out of its source constituent.
Empty complementisers (C 0
)
Empty complementisers (C 0
) are used in a more restricted way than in the general Penn scheme – see elsewhere.
Empty conjunctions (KON 0
)
Departing from the standard Penn policy on null elements, we insert a null conjunction (KON 0
) in contexts where it is motivated, e.g.
- in lists, where the last two elements are overtly conjoined:
(NP-PRN (NE (NE Abram)
(KON 0) ← null conjunction
(NE Machor)
(KON vnde)
(NE Aram))
)
Empty resumptives (*-RSP 0
)
This is another departure from the standard Penn policy on null elements (see above).
Expletive constructions
As is standard for Penn-style treebanks, the expletive (whether overt or null) is contained within an NP-SBJ
. Note that this is not necessarily a statement on the status of the expletive as a subject.
Overt expletives
Note that overt expletives do not have a special POS tag in the corpus: they are just tagged as an ordinary demonstrative pronoun (DPDS
).
Overt expletives occur in four main contexts:
- With an extraposed clausal subject; expletive is coindexed with the clausal subject.
(IP-MAT (NP-SBJ-1 (DPDS Dit)) ← expletive
(VVFIN is)
(NP-PRD (DDARTA de)
(ADJA erste)
(NA punt)
(PP (APPR van)
(NP (NA viuen))))
(CP-THT-1 (KOUS) ← clausal subject
(IP-SUB (NP-SBJ (PPER wi))
(ADVP (ADJV lancge))
(VVFIN menen)
...))
)
- With an extraposed clausal object; expletive is coindexed with the clausal object.
(IP-MAT (NP-SBJ *con*)
(KON vnde)
(VVFIN betugede)
(NP-OB1-1 (PPER et)) ← expletive
(CP-THT-1 (KOUS wente) ← clausal object
(IP-SUB (NP-SBJ (DPDS dit))
(VVFIN is)
(NP-PRD (NP-POS (NA godes))
(NA sone)))))
- In presentational constructions; the expletive is not co-indexed with the postverbal discourse-new referent and so presentational constructions with an overt expletive cannot be isolated automatically.
TODO: example here
- In impersonal constructions; the expletive is not coindexed with anything, and so impersonal constructions with an overt expletive are also not easily identifiable.
TODO: example here
Null expletives
Null expletives (\*exp\*
) are inserted in these contexts (with the exception of the object expletive type) where an expletive could in principle appear but where it is not attested. The same rules for coindexing as with overt expletives apply.
- Null expletive with an extraposed clausal subject:
(IP-SUB (NP-SBJ-1 *exp*) ← null expletive
(ADVP (AVD dar))
(VVPP ghescreuen)
(VAFIN staet)
(CP-THT-1 (KOUS Dat) ← clausal subject
(IP-SUB (NP-SBJ (DPIS men))
(PP (APPR in)
(NP (CARDA vier)
(NA manieren)
(NA sunde)))
(VVFIN begaet)))
)
- Null expletive in presentational constructions:
(IP-MAT (NP-SBJ *exp*) ← null expletive
(ADVP (AVD Vortmer))
(ADVP (AVD so))
(VVFIN sint)
(NP-PRD (CARDA drey) ← discourse-new referent
(NA gheRichte))
(PP (APPR binnen)
(NP (DDARTA der)
(NA stat)))
)
- Null expletive in impersonal constructions:
(IP-MAT (NP-SBJ *exp*) ← null expletive
(KON Vnde)
(PP (ADVP (PAVD hijr))
(PAVAP vp))
(VVFIN staet)
(PP (APPR in)
(NP (DDARTA der)
(NA glosen)))
(VVPP gheschreuen)
)
- Note that no null expletive is inserted in sentences with a clausal object which lack an expletive in the matrix clause:
(IP-MAT (PP (ADVP (PAVKO dar))
(PAVAP von))
(VVFIN screif)
(NP-SBJ (DPIS me))
(CP-THT (KOUS dat) ← clausal object
(IP-SUB (NP-SBJ (PPER he))
(NP-OB1 (DDARTA dem)
(NA himmel)
(VVFIN vphelde)
(PP (ADVP (PAVD dar)
(PAVAP vmme))))))
)
Conjunction
Phrase-level conjunction
Phrase-level conjunction is applied in cases where any one of the conjuncts consists of more than one word.
Broadly speaking, phrase-level conjunction is presented using CONJP
, a phrasal category headed by KON
. The only exception to this are conjoined matrix clauses with coreferential subjects, which are treated as separate tokens (see above).
- The general structure for phrase-level conjunction is:
(XP1 (XP2 first-conjunct)
(CONJP (KON conjunction) ← CONJP is sister to first conjunct
(XP3 second-conjunct)) ← The second conjunct is conjunct to the head KON.
)
- In cases for third and subsequent conjuncts, this general structure is just extended:
(XP1 (XP2 first-conjunct)
(CONJP (KON conjunction)
(XP3 second-conjunct))
(CONJP (KON conjunction)
(XP4 third-conjunct))
)
(NP-OB1 (NP (DDARTA dat)
(NA wijf))
(CONJP (KON oft)
(NP (NE eua)))
)
Word-level conjunction
- Single-word conjuncts can just conjoin at word level; no
CONJP
is needed:
(XP (X1 (X2 first-conjunct)
(KON and)
(X3 second-conjunct))
)
(ADJP-PRD (ADJD (ADJD seker)
(KON vnde)
(ADJD ghewijs))
)
(NP-ADT (DPDS (DPDS dit)
(KON of)
(DPDS dat))
)
- Conjunction of nonfinite verbs:
(VVFIN (VVFIN comen)
(KON eder)
(VVFIN quamen)
)
Inserting empty conjunctions (KON Ø)
Note that an empty conjunction (KON Ø)
is inserted in certain contexts (see above).
Correlative conjunction
Cases of word-level correlative conjunction are treated as a flat structure:
(NP (NA (KON X)
(NA Y)
(KON Z)
(NA Q))
)
In cases of phrase-level correlative conjunction, the first conjunction is annotated as a sister of the first conjunct. The rest of the phrase is treated as per non-correlative phrase-level conjunction.
(NP-OB1 (KON beide)
(NP (ADJA hillighe)
(NA scrift))
(CONJP (KON vnde)
(NP (ADJA natuerlike)
(NA scrift)))
)
Conjunction with shared pre-modification
For cases of conjunction with shared pre-modifiers, we do not use NX
, ADJX
etc, as in the general Penn scheme. Rather, we follow the following guidelines:
- If word-level conjunction can apply, apply it
TODO: insert example
- Otherwise, attach the pre-modifier to the highest node, and annotate conjunction of a phrase-level category (XP):
(NP-SBJ (DPOSA dine) ← Pre-modifier attached high
(NP (ADJA grote) | ← Structural parallelism
(NA vnere)) |
(CONJP (KON vnde) +-- Phrase-level conjunction
(NP (ADJA bedreuenne) |
(NA schande)))) /
The rule to attach high by default is used in these cases. As illustrated above, this rule can be overridden in the case of structural parallelism between the two conjuncts. *grote* could in principle be treated as a shared pre-modifier, but to maintain parallelism it is attached inside the first conjunct instead.
Conjunction of unlike categories
Sometimes two unlike categories will be conjoined. In such cases, the category enclosing the conjunction structures is the same as the category of the first conjunct.
(NP-OB1 (NP (CARDA seuen)
(NA vrowen))
(CONJP (KON vnd)
(PP (APPR bouen)
(NP (XY xx)
(NA sone))))
)
Engelhus
Unlike in the general Penn scheme, we use this policy for both phrase-level and word-level conjunction.
Use of gapping instead of word-level conjunction with separable verb prefexes
In some cases, what is logically word-level conjunction between two verbs cannot be annotated as such because of separable verb prefixes which are annotated as separate words. In such cases, the second (or later) conjunct is annotated as a gap instead:
(IP-MAT-1 (PP (APPR Van)
(NP (DDARTA dessen)
(NA ghelde)))
(VMFIN scholen)
(NP-SBJ (NP (NE Gherd))
(CONJP (KON vnde)
(NP (NA vor)
(NE vredeke)
(ADJN vorghenompt))))
(PTKVZ weder)
(VVINF gheuen)
(IP-MAT-PRN=1 (KON vnde)
(VVINF betalen))
)
Not:
(IP-MAT ...
(PTKVZ weder)
(VVINF (VVINF gheuen)
(KON vnde)
(VVINF betalen))
)
Negation (PTKNEG
; AVNEG
; DNEG
; DPNEGS
)
Sentential negation
Sentential negation has the POS tag PTKNEG
and is attached at the IP-level:
(IP-MAT (PP (APPR By)
(NP (PPER my)
(PTKN seluen)))
(VVFIN vermach)
(NP-SBJ (PPER ic))
(PTKNEG nicht) ← sentential negation
)
Phrase-level negation
Phrase-level negation can have a range of POS tags.
- Negative adverbs have the POS tag
AVNEG
and project anADVP
:
(IP-MAT (KON Vnde)
(NP-SBJ (PPER hey))
(PTKNEG ne) ← sentential negation
(VMFIN sal)
(ADVP (AVNEG nummer) ← negative adverb
(AVD mer))
(PP (APPR in)
(NP (DDARTA den)
(NA Rayd)))
(VVINF komen)
)
(IP-MAT-SPE (NP-SBJ (PPER Ick))
(VAFIN hebbe)
(NP-OB2 (PPER my))
(ADVP (AVNEG nye)) ← negative adverb
...
)
- Negative determiners have the POS tag
DNEG*
and are contained within anNP
:
(IP-MAT (NP-OB1 (DNEGA Nene) ← negative determiner
(NA kopenschap))
(VMFIN sulle)
(NP-SBJ (PPER ghi))
(VVINF hantieren)
)
- Negative pronouns have the POS tag
DPNEGS
and are contained within anNP
:
(IP-MAT (NP-OB1 (DPNEGS Nemant)) ← negative pronoun
(PTKNEG en) ← sentential negation
(VMFIN sulle)
(NP-SBJ (PPER ghi))
(VVINF doetslaen)
)
Possessives
Possessor NPs are treated as complements of N
:
(NP-OB1 (DDARTA den)
(NA namen)
(NP-POS (NA godes)) ← possessor NP
)
Possessive pronouns are treated as modifiers and do not project a phrase:
(PP (APPR in)
(NP (DPOSA iuwen) ← possessive pronoun
(NA munt))
)
Separable verb prefixes
These have the tag PTKVZ
and attach at the IP level as sisters of their verb:
(IP-MAT (NP-SBJ (DDARTA De)
(NA viant))
(VVFIN ghift)
(PTKVZ wt)
(NP-OB1 (DDARTA den)
(ADJA eersten)
(NA raet))
)
Punctuation
Punctuation is given the POS tag $;
The general principle is to attach punctuation as high as is reasonable.
2. Clauses
Clause types
Any complete finite clause (IP-MAT
, IP-SUB
) must contain at least:
- a finite verb
- a subject
All finite subordinate clauses are labelled as a CP
, and any CP
is labelled for type (see below) and will contain an IP-SUB
.
Ambiguous clauses (IP-X
)
For finite clauses introduced by wente which are are ambiguous and cannot be labelled as either matrix (IP-MAT
) or subordinate (IP-SUB
), we use a novel label, IP-X
.
This applies to wente-clauses which are unambiguously V2:
TODO: parsed example, e.g. wente dit is godes sone
This also applies to wente-clauses where the verb position is hard to diagnose, since there are only two constituents in the clause:
TODO: parsed example, e.g. wente he kam
Note: wente-clauses which are clearly verb-final
TODO: parsed example
(see LREC paper for more details).
Infinitival clauses (IP-INF
)
- are headed by a verb in its infinitive form (
VVINF
) - do not necessarily require a subject, though they may have one
Imperative clauses (IP-IMP
)
- are headed by a verb in its imperative form (
VVIMP
) - do not require a subject
Clause extended labels
For IPs:
-MAT
: matrix clause-SUB
: subordinate clause-INF
: infinitival clause-IMP
: imperative clause
For CPs:
-REL
: relative clause-FRL
: free relative clause-THT
: that-clause-ADV
: adverbial clause- –
CMP
: comparative clause - –
QUE
: interrogative clause - –
DEG
: degree clause
Additional functional extended labels which an IP
or CP
may have (added in this order where there are multiple functional labels):
-PRN
: parenthetical-LFD
: left-dislocated-SPE
: direct speech-n
: an index
The CP layer
A CP layer is only postulated in:
- finite subordinate clauses (adverbial, degree, complement etc.)
- clauses with wh-movement (direct and indirect wh-questions, relative clauses)
The CP layer is lexicalized either by a complementizer or by an element in SpecCP (e.g. a wh-phrase or a relative pronoun) or in rare cases by both.
The original Penn annotation scheme calls for a complementizer to be present in all CPs (whether overt of empty). In the CHLG, however, we have chosen not to insert empty elements in C as a general rule. There are only three cases where an empty C is inserted:
- When an empty complementizer alternatives with dat, mostly after verba diecendi (
C 0
) - V1 conditionals and direct questions (
C-V1 0
) - Asyndetic V2 dependent clauses, where the main sign of clause dependence is subjunctive marking on the finite verb (
C-SUBJ 0
):
When an empty complementizer alternates with dat, mostly after verba diecendi (C 0
)
(IP-MAT (NP-SBJ *con*)
(KON vnde)
(VVFIN sede) (CP-THT (C 0) ← empty complementiser
(IP-SUB (NP-SBJ (PPER se))
(VMFIN wolde)
(NP-TMP-WH (DPWS wat))
(VVINF ruwen))
(CP-ADV (KOUS Also)
(IP-SUB (NP-SBJ (PPER se))
(ADVP-TMP (AVD nu))
(ADJP-PRD (ADJD allene))
(VVFIN was))))
)
‘And [she] said (that) she wished to rest a while, since she was now alone’
V1 conditionals and direct questions (C-V1 0
)
(IP-MAT (CP-ADV (C-V1 0) ← empty complementiser
(IP-SUB (VVFIN Wult)
(NP-SBJ (PPER u))
(CP-THT (KOUS dat)
(IP-SUB (NP-SBJ (PPER yck))
(VVFIN sterue)))))
(NP-SBJ (PPER yck))
(VVFIN sterue)
(PP (APPR myt)
(NP (NA willen)))
)
V1 conditional: ‘If you want me to die, I gladly will’
(CP-QUE (C-V1 0) ← empty complementiser
(IP-SUB (VVFIN Meine)
(NP-SBJ (PPER ghi))
(CP-THT (KOUS datt)
...))
)
Direct question: `Do you think that...'?
Asyndetic V2 dependent clauses, where the main sign of clause dependence is subjunctive marking on the finite verb (C-SUBJ 0
)
(IP-MAT (CP-ADV (WADVP-2 0)
(KOUS wen)
(IP-SUB (ADVP-TMP *T*-2)
(NP-SBJ (PPER ick))
(VVFIN vterkese)))
(CP-ADV (C-SUBJ 0) ← empty complementiser
(IP-SUB (NP-SBJ (PPER se))
(VVFIN sy)
(NP-PRD (NP-POS (NP (NP-POS (DDARTA des)
(NA keisers))
(NA vorsten))
(CONJP (KON edder)
(NP (NA herden))))
(NA dochter))))
(ADVP (AVKO so))
(VVFIN wil)
(NP-SBJ (PPER ick))
(CP-THT (CP-THT (KOUS dat)
(IP-SUB (NP-SBJ (PPER se))
(NP-PRD (DPOSA iuwe)
(ADJA weldige)
(NA vrowe))
(VVFIN sy))))
)
`When I choose [a bride], whether she is a daughter of a king’s lord or of a shepherd, I wish for her to be your mighty lady'
Adverbial clauses (CP-ADV
)
The structure of an adverbial clause (CP-ADV
) depends on the type of adverbial subordinator (KOUS
) which introduces the clause.
Essentially, three different structures are available (see below for details on each):
- One structure for adverbial clauses introduced by an adverbial subordinator which is formally identical to an ordinary adverb (e.g. also, dar, do, eer, nu, so).
- One structure for adverbial clauses introduced by an adverbial subordinator which is not formally identical to an ordinary adverb (e.g. dat, eft,
wanner, want(e)). - One structure for adverbial clauses introduced by wen.
Adverbial subordinators (KOUS
) which project an ADVP
The following adverbial subordinators are formally identical to ordinary adverbs and project an ADVP which attaches at the IP level:
- also
- dar
- dewile
- do
- eer
- nu
- so
In such structures, the adverbial subordinator (KOUS
) takes a CP-ADVP
as its complement which is headed by an empty WADVP
. The WADVP
is then traced into the finite subordinate clause (IP-SUB
).
Some examples:
also
(ADVP (KOUS also)
(CP-ADV (WADVP-1 0)
(IP-SUB (ADVP *T*-1)
(NP-SBJ (DDARTA de)
(NA preesters))
(VVIMP ghebeedet)))
)
dar
(ADVP-LOC (KOUS dar)
(CP-ADV (WADVP-1 0)
(IP-SUB (ADVP-LOC *T*-1)
(NP-SBJ (PPER he))
(VVFIN sprekt)
...))
)
dewile
(ADVP-TMP (KOUS dewile)
(CP-ADV (WADVP-0)
(IP-SUB (ADVP-TMP *T*-1)
(NP-SBJ (PPER se))
...))
)
do
(IP-MAT (ADVP-TMP-LFD (KOUS Do)
(CP-ADV (WADVP-1 0)
(IP-SUB (ADVP-TMP *T*-1)
(NP-SBJ (PPER he)
(VVFIN quam)))))
(ADVP-TMP-RSP (AVD do))
...
)
eer
(ADVP-TMP (KOUS Eer)
(CP-ADV (WADVP-1 0)
(IP-SUB (ADVP-TMP *T*-1)
(NP-SBJ (PPER ic))
(NP-OB1 (DPDS dit))
(VVFIN vulbrincge)
(PP (APPR to)
(NP (DPOSA dinen)
(NA eren)))))
)
nu
(IP-MAT (ADVP-TMP (KOUS Nu)
(CP-ADV (WADVP-1 0)
(IP-SUB (ADVP-TMP *T*-1)
(NP-SBJ (PPER ik))
(NP-OB1 (DPDS den))
(PTKNEG nicht)
(VVINF vorkamen)
(VMFIN mach))))
(ADVP (AVKO So))
...
)
so
(ADVP (KOUS So)
(CP-ADV (WADVP-1 0)
(IP-SUB (ADVP *T*-1)
(NP-OB2 (PPER my))
...))
)
Adverbial subordinators (KOUS
) which head a CP-ADVP
A small set of adverbial subordinators (KOUS
) – those which do not have formally identical ordinary adverb counterparts – are treated as overt complementisers, i.e. head the CP, rather than project an ADVP
of which CP-ADV
is a daughter. This applies specifically to the following adverbial subordinators:
- dat
- eft/of
- wanner
- want(e)
- woldat
dat
(IP-MAT (KON Mer)
(NP-SBJ (PPER se))
(VVFIN maket)
(NP-OB1 (DDARTA den)
(NA menschen))
(CP-ADV (KOUS dat)
(IP-SUB (NP-SBJ (PPER he))
(VVFIN wert)
(NP-OB2 (NA gode))
(OA-XXX leet)))
)
effte
(CP-ADV (KOUS effte)
(IP-SUB (NP-SBJ (PPER wi))
(PP (ADVP (PAVKO dar))
(PAVAP umme))
(VVFIN bidden)
(VMFIN mogen))
)
wanneer
(IP-MAT (CP-ADV (KOUS Wanneer)
(IP-SUB (NP-SBJ (PPER ghi))
(PP (APPR mit)
(NP (NA vorsate))
(NP-OB1 (NA vnrecht))
(VVFIN sweert))))
(NP-SBJ (DPDS Dat))
(VVFIN is)
...
)
want(e)
(CP-ADV (KOUS want)
(IP-SUB (NP-SBJ (PPER he))
(NP-OB2 (PPER em))
(PTKNEG nicht)
(VVFIN vnsaghe)
(PP (APPR in)
(NP (DDARTA dessen)
(NA leuen))))
)
woldat
(CP-ADV (KOUS woldat)
(IP-SUB (NP-SBJ (DDA dusse)
(XY x)
(NA mestere))
(NP-OB1 (DIA vele)
(NP-POS gudes))
(VVFIN (VVFIN makeden)
(KON vnd)
(VVFIN deden)))
)
Engelhus
Adverbial clauses introduced by wen
Adverbial clauses introduced by wen have their own structure.
wen projects a WADVP
which heads the CP-ADV
. The ADVP
is then traced into the finite subordinate clause.
(IP-MAT (CP-ADV (WADVP-1 (AVW Wen))
(IP-SUB (ADVP-TMP *T*-1)
(NP-SBJ (DDARTA de)
(NA here)
(VVFIN spaserde))))
(VVFIN sach)
(NP-SBJ (PPER he))
...
)
That-clauses (CP-THT
)
Bare CP-THTs headed by (KOUS dat
)
Bare CP-THTs are headed by a (KOUS dat
) which takes an IP-SUB
as its complement:
(IP-MAT (PP (ADVP (PAVKO dar))
(PAVAP umm))
(VVFIN wil)
(NP-SBJ (PPER ik))
(CP-THT (KOUS dat) ← KOUS as head of the CP-THT
(IP-SUB (NP-OB2 (PPER my)) ← IP-SUB as complement of KOUS
(NP-SBJ (DDARTA dat)
(DPIS ander))
(ADJP-PRD (ADJD nutte))
(VVFIN si)))
)
Bare CP-THTs where (KOUS dat
) is absent
When dat is absent, a null complementizer (C 0
) is inserted (see also above):
(IP-MAT (NP-SBJ *con*)
(KON vnde)
(VVFIN sede) (CP-THT (C 0)
(IP-SUB (NP-SBJ (PPER se))
(VMFIN wolde)
(NP-TMP-WH (DPWS wat))
(VVINF ruwen))
(CP-ADV (KOUS Also)
(IP-SUB (NP-SBJ (PPER se))
(ADVP-TMP (AVD nu))
(ADJP-PRD (ADJD allene))
(VVFIN was))))
)
‘And [she] said (that) she wished to rest a while, since she was now alone’
CP-THTs introduced by a preposition
When the CP-THT
is introduced by a preposition, then the CP-THT
sits within a PP
which is headed by the preposition:
(IP-MAT (PP (APPR vppe)
(CP-THT (KOUS dat)
(IP-SUB (NP-OB2 (PPER di))
(NP-SBJ (DPDS dat))
(ADVP (AVD noch))
(VMFIN mochte)
(VVINF bescheen))))
(ADVP (AVKO so))
....
)
(PP (ADVP (PAVKO Dar))
(PAVAP umme)
(CP-THT (KOUS dath)
(IP-SUB (NP-SBJ (PPER yck))
(NP-OB1 (PPER dy))
(VAFIN hebbe)
(VVPP ghenomenn)))
)
Degree complements (CP-DEG
)
(ADJP-PRD (PTKA so)
(ADJD hochtidlik)
(CP-DEG (WADVP-1 0)
(KOUS dat)
(IP-SUB (ADVP *T*-1)
(NP-SBJ (DDARTA+NA desgeliken)
(ADVP-TMP (ADJV vor))
(PTKNEG nicht)
(VVPP geseen)
(VAFIN was))))
)
(IP-MAT (NP-SBJ (DDARTA De)
(NA ghiricheit))
(VVFIN is)
(NP-PRD (DIARTA een)
(NA ouel)
(RRC (PTKA also)
(VVPP gheraket)
(CP-DEG (KOUS Dat)
(IP-SUB (NP-SBJ (PPER se))
...))))
)
Comparative clauses (CP-CMP
)
Virtually all CP-CMP
s should be a sister of a comparative head, e.g.:
also
gelik
(‘like’)
(PP (KOKOM also)
(CP-CMP (WADVP-1 0)
(IP-SUB (ADVP *T*-1)
(NP-SBJ (PPER wy))
(ADVP-TMP (AVD vore))
(VAFIN hadden)
(VVPP ghedan)))
)
Clauses which could plausibly qualify as a CP-CMP
but which lack a comparative head should be annotated as a CP-ADV
.
TODO: insert example here
Correlative comparative clauses
(IP-MAT (ADVP (PTKA also)
(ADJV dicke)
(CP-CMP (WADVP-1 0)
(IP-SUB (ADVP *T*-1)
(NP-SBJ (PPER se))
(PP (ADVP (PAVKO dar))
(PAVAP mede))
(VVPP besen)
(VAFIN wert))))
(ADVP (PTKA also)
(ADJV dicke))
(VMFIN schal)
(NP-SBJ (PPER se))
(NP-OB1 (DIARTA ene)
(NA discipliuen))
(VVINF vntfan)
)
Direct and indirect questions (CP-QUE
)
Both direct and indirect questions are annotated as a CP-QUE
which immediately dominates an IP-SUB
.
In direct yes/no-questions, the CP-QUE
is headed by an empty complementiser labelled C-V1
(see above):
(CP-QUE (C-V1 0)
(IP-SUB (VVFIN Meine)
(NP-SBJ (PPER ghi))
(CP-THT (KOUS datt)
...))
)
Direct question: `Do you think that...'?
In direct wh-questions, the wh-phrase is traced into the IP-SUB
in which it belongs:
(CP-QUE (WADVP-1 (AVW Wor)) ← wh-phrase
(IP-SUB (ADVP-DIR *T*-1) ← trace of wh-phrase in IP-SUB
(VMFIN wil)
(NP-SBJ (DPDS desse)
(VVINF gan)
...))
)
In indirect questions, the wh-phrase is traced into the IP-SUB
in which it belongs:
(IP-MAT (KON Sunder)
(NP-SBJ (DPIS eteswelke))
(VVFIN spreken)
(CP-QUE (WADVP-1 (AVW wo)) ← wh-phrase
(IP-SUB (ADVP *T*-1) ← trace of wh-phrase in IP-SUB
(VAFIN is)
(NP-SBJ (NE christus)
(VVPP gecomen)
(PP (APPR uan)
(NP (NE galilea))))))
)
V1 conditionals
V1 conditionals are treated as CP-ADV
s with an empty complement (C-V1 0
) which takes an IP-SUB
as its complement (see also above):
(IP-MAT (CP-ADV (C-V1 0)
(IP-SUB (VVFIN Wult)
(NP-SBJ (PPER u))
(CP-THT (KOUS dat)
(IP-SUB (NP-SBJ (PPER yck))
(VVFIN sterue)))))
(NP-SBJ (PPER yck))
(VVFIN sterue)
(PP (APPR myt)
(NP (NA willen))))
‘If you want me to die, I gladly will’
Relative clauses (CP-REL
)
Relative clauses are treated as per the standard Penn scheme:
(NP-OB1 (DDARTA dat)
(NA got)
(CP-REL (WNP-1 (DPRELS dat)) ← relative pronoun
(IP-SUB (NP-OB1 *T*-1) ← trace in finite subordinate clause
(NP-SBJ he)
(VVFIN erft)))
)
Duderstadt2
Relative clauses introduced by pronominal adverbs
See also treatment of pronominal adverbs elsewhere.
(CP-REL (WPP-1 (WADVP (PAVREL Dar))
(PAVAP na))
(IP-SUB (PP *T*-1)
...)
)
- Relative clauses can also be introduced by pronominal adverbs which are discontinuous:
(NP-SBJ (DDARTA de)
(NA (NA busse)
(KON vnde)
(NA budel))
(CP-REL (WADVP-3(PAVREL-2 dar))
(IP-SUB (ADVP *T*-3)
(NP-SBJ (NP-POS (NA geodes))
(NA licham))
(PP (PAVREL *ICH*-2)
(PAVAP ynne))
(VVFIN was)))
)
Free relative clauses (CP-FRL
)
CP-FRL
s cannot attach directly at the IP-level. The basic category enclosing the free relative is identical with the gap in the free relative.
(IP-MAT (NP-OB1 (DPDS Dat))
(VVFIN do)
(NP-SBJ (CP-FRL (WNP-1 (DPRELS we)) ← free relative within NP-SBJ
(IP-SUB (NP-SBJ *T*-1)
(NP-OB1 (PPER et))
(VVFIN wille))))
)
(IP-MAT (ADVP (AVD Aldus))
(VMFIN moeghe)
(NP-SBJ (PPER ghi))
(PP (APPR in)
(NP (DDARTA dessen)
(NA boeke)))
(VVINF soeken)
(NP-OB1 (CP-FRL (WNP-1 (DPWS wat)) ← free relative within NP-OB1
(IP-SUB (NP-SBJ *T*-1) ← NP gap in free relative
(NP-OB2 (PPER iv))
(ADVP (ADJV best))
(VVFIN gadet))))
)
(IP-MAT (NP-OB1 (DPDS dat))
(VMFIN willen)
(NP-SBJ (PPER sze))
(VVINF vorschulden)
(ADVP (CP-FRL (WADVP-1 (AVREL wor)) ← free relative within AVDP
(IP-SUB (ADVP *T*-1) ← gap in free relative
(NP-SBJ (PPER sze))
(VMFIN (VMFIN konen)
(KON vnd)
(VMFIN moghen)))))
)
(Greifswald)
Reduced relative clauses (RRC
)
RRC
s almost always immediately follow their antecedent. The contain no operator or gap of their own:
(PP (APPR Jn)
(NP (DIARTA enen)
(NA boke)
(RRC (VVP ghenoemet)
(PP (APPR van)
(NP (DDARTA den)
(ADJA ouersten)
(NA gude)))))
)
Note that an RRC is not used when we just have a single postnominal modifier, e.g:
(NP-OB2 (NE hans)
(NE wickendorpe)
(ADJN erscreuen) ← post-nominal modifier
)
Infinitival clauses (IP-INF
)
Not all infinitives (tagged VVINF
) project and IP-INF
.
Constructions where no IP-INF
is projected:
- Modal auxiliaries + inf
(IP-MAT (KON Vnde)
(NP-SBJ (PPER et))
(VMFIN mach) ← modal auxiliary
(ADVP (AVD wal))
(VVINF heten) ← infinitive
)
Constructions where an IP-INF
is projected:
- beghinnen + to inf
(IP-MAT (KON Vnde)
(NP-SBJ *con*)
(VVFIN beghinnen)
(IP-INF (PTKZU to)
(VVINF dweelen))
...
)
- laten + inf
(IP-MAT (NP-SBJ (PPER He))
(VVFIN leyt)
(IP-INF (VVINF (VVINF buwen)
(KON vnd)
(VVINF betern))
(NP-OB1 (NP (DDARTA den)
(NA tempel))
(CONJP (KON 0)
...)))
)
Engelhus
- menen + to inf
(IP-SUB (NP-SBJ (PPER he))
(VVFIN mende)
(IP-INF (NP-OB1 (PRF sick))
(PTKZU to)
(VVINF verhoghen))
)
- plegen + to inf
(IP-MAT (PP (APPR ouer)
(NP (CARDA hundert)
(NA iaren)))
(VAFIN plach)
(NP-SBJ (DPIS men))
(IP-INF (NP-OB1 (PPER se))
(APPR to)
(NA done))
)
Inflected infinitives
The same structure applies to inflected infinitives which are tagged not as VVINF
but as a nominal category (NA
). To is not tagged as a particle (PTKZU
) in such cases but as a preposition (APPR
):
(IP-MAT (KON Wante)
(PP (APPR ouer)
(NP (CARDA hundert)
(NA iaren)))
(VAFIN plach)
(NP-SBJ (DPIS men))
(IP-INF (NP-OB1 (PPER se))
(APPR to)
(NA done))
)
Imperative clauses (IP-IMP
)
Imperative clauses are labelled IP-IMP
rather than IP-MAT
:
(IP-IMP (NP-OB1 (NA (NA Vader)
(KON vnde)
(NA moder)))
(VVIMP sulle)
(NP-SBJ (PPER ghi))
(VVINF eeren)
)
- They must include a verb in imperative form (
VVIMP
). - Unlike
IP-MAT
s andIP-SUB
s, they do not require a subject.
3. Phrases
Noun phrases (NP
)
Extended labels
Argument noun phrases must have one of the following extended labels:
-SBJ
: subject-OB1
: direct or only object-OB2
: indirect or second object-PRD
: subject complement with certain verbs (e.g. wēsen, wērden, blīven, hēten); also forADJP
s-SMC
: object complements with certain verbs (e.g. maken, nomen); also forADJP
s
Non-argument noun phrases can have one of the following extended labels:
-ADT
: adjunct (also forADJP
s)-LOC
: locative-TMP
: temporal-VOC
: vocative-POS
: possessor-COM
: bare noun phrase complement of noun or adjective
Nominal compounds
- In noun-noun compounds, both components are tagged as nouns (
NA
) and are treated as sisters of each other.
TODO: example here
(NP-SBJ (N somer)
(N tyme)
)
Proper nouns
Proper nouns are tagged as NE
:
(NP (NE Gwido) ← personal name
(PP (APPR von)
(NP (DDARTA den)
(NA sulen)))
)
(NP-PRN (DDARTA de)
(NA hertoge)
(PP (APPR von)
(NP (NE bruneswich))) ← place name
)
Duderstadt 1
- Note that titles are tagged as ordinary nouns (
NA
) or adjectives (ADJ*
) = and are treated as sisters to the proper name:
(NP-SBJ (NA Mester)
(NE tulius)
)
(NP-SBJ (ADJA Sinte)
(NE Lucas)
)
Adjectival nouns
There is no special treatment for adjectival nouns; they are just treated as ordinary standalone adjectives (ADJS
):
(NP-SBJ (DDARTA de)
(ADJS besten)
)
(NP-OB1 (DDARTA dat)
(ADJS beste)
)
Reflexive pronouns
Forms of the reflexive pronoun sik are tagged as PRF
:
(IP-MAT (NP-SBJ (PPER He))
(VVFIN maket)
(NP-OB1 (PRF sick))
(ADJP-PRD (ADJD runt)
(PP (KOKOM also)
(NP (DIARTA een)
(NA kloet))))
)
Nominal predicates
Nominal predicates take one of two labels:
NP-PRD
NP-SMC
NP-PRD
Use of NP-PRD
broadly follows the YCOE guidelines: see here
NP-PRD
is restricted to subject complements of the following verbs:
- wēsen `be’
- wērden `become’
- blīven `become’
- hēten `be called’
- dunken `seem’
- sprēken `means, equates to’
- kosten `cost’
- wegen `weigh’
(IP-MAT-SPE (NP-SBJ (PPER Du))
(VVFIN bist) ← BE
(NP-PRD (DIARTA eyn) ← subject complement
(NA mester)
(PP (APPR to)
(NP (NE israhel))))
)
(IP-MAT ....
(ADVP (AVKO so))
(VVFIN wert) ← BECOME
(NP-SBJ (NP-POS (NE Janikels))
(NA dochtersone))
(NP-PRD (DPOSA vnse) ← subject complement
(NA here))
)
(IP-MAT (NP-SBJ (DPOSA sin)
(NA sone))
(VVFIN bleff) ← BECOME
(NP-PRD (DIARTA eyn) ← subject complement
(NA arue)
(NP-POS (DDARTA des)
(NA landes)))
)
(IP-MAT (KON Vnde)
(NP-SBJ (PPER et))
(VMFIN mach)
(ADVP (AVD wal))
(VVINF heten) ← BE CALLED
(NP-PRD (NA spieghel) ← subject complement
(NP-POS (DDARTA der)
(NA leyen)))
)
(IP-MAT (KON vnd)
(NP-SBJ *con*)
(VVFIN heyt)
(NP-OB1 (PPER ot))
(NP-SMC (NE Spartagus))
)
(IP-MAT (NP-SBJ (DPDS dat))
(VVFIN sprekt) ← MEAN
(NP-PRD (NA hundeken)) ← subject complement
)
(IP-MAT (NP-SBJ (PPER Ot))
(VMFIN moste)
(VVINF kosten) ← COST
(NP-PRD (CARDA hundert) ← subject complement
(NA mark))
)
(CP-ADV (KOUS dat)
(IP-SUB (NP-SBJ (DDARTA de)
(NA tunne))
(VMFIN schall)
(VVINF weghen) ← WEIGH
(NP-PRD (CARDA verteyn) ← subject complement
(NA lispunt)))
)
N.B. This also applies to these verbs in the passive:
(IP-SUB (NP-SBJ (PPER ick))
(NP-PRD (NP-POS (DDA alsodanen)) ← subject complement
(NA mans)
(ADJA eghentlike)
(NA brud))
(VAFIN sy)
(VVPP ghewesen) ← BE (passive)
)
(IP-MAT (KON vnde)
(NP-SBJ (DDARTA dat)
(NA wort))
(VAFIN is)
(NP-PRD (NA ulesch)) ← subject complement
(VVPP gheworden) ← BECOME (passive)
)
(CONJP (KON vnde)
(IP-SUB (NP-SBJ *T*-1)
(VVFIN is)
(PP (ADVP (PAVD dar))
(PAVAP vmme))
(VVPP gheheten) ← BE CALLED (passive)
(NP-PRD (DIARTA een) ← subject complement
(NA spieghel)
(NP-POS (DDARTA der)
(NA leyen))))
)
NP-SMC
NP-SMC
is used for small clause nominals construed with the object. We have generally replaced the IP-SMC
category from the PPCHE. This is done to cut down on the number of long-distance dependencies generated by syntactic transformations like passivization, and also the freer word order of MLG.
Instead, we use the alternate annotation scheme described here. The first point is that, generally, small clause complements are not treated as a constituent. Their treatment is assimilated to that of predicate nominals.
NP-SMC
is used for object complements of the following verbs:
- maken `make’
(IP-MAT (ADVP (AVKO so))
(VVFIN makest) ← MAKE
(NP-SBJ (PPER du))
(NP-OB1 (PPER dy)
(PTKN suluen))
(NP-SMC (NA gud)) ← object complement
)
- nomen `name’
(PP (APPR Jn)
(NP (DIARTA enen)
(NA boeke)
(CP-REL (WNP-1 (DPRELS dat))
(IP-SUB (NP-OB1 *T*-1)
(NP-SBJ (DPIS men))
(VVFIN noemet) ← NAME
(NP-SMC pastorael)))) ← object complement
)
(IP-MAT (PP (APPR Von)
(NP (DDA dussen)
(NE heber)))
(VVFIN nomet) ← NAME
(NP-OB1 (PRF sek))
(NP-SBJ (DDARTA de)
(NA joden))
(NP-SMC (NE Hebreos)) ← object complement
)
- hēten `call’
(IP-MAT (ADVP (AVD Jocto))
(PTKNEG ne)
(VVFIN hete) ← CALL
(NP-SBJ (PPER ik))
(NP-OB1 (PPER iv))
(PTKNEG nicht)
(NP-SMC (NA knechte)) ← object complement
)
- wēgen `weight up as’
TODO: example
NP-MSR
NP-MSR
is used for measure phrase modifiers of a noun or adjective,
(ADJP-PRD (NP-MSR (XY cxxx)
(NA iar))
(ADJD olt)
)
(ADVP (NP-MSR (DIARTA eyn)
(NA veerndel)
(NP-POS (NA Jares)))
(AVD tovoren)
)
Genitive noun phrases at clause-level
Four types of genitive can occur at the clause level:
- Predicative genitives (
NP-PRD (NP-POS X)
) - Clausal adjuncts (
NP-ADT
) - Temporal adverbials (
NP-TMP
) - Genitive objects of certain verbs (
NP-OB1
)
Note that there are no (unmoved) NP-POS
daughters of an IP
. In some cases, a genitive may raise out of an NP; in this case it can appear as the daughter of an IP, but is always coindexed with an *ICH*
trace indicating its base position:
TODO: insert example here
Predicative genitives (NP-PRD (NP-POS X)
)
Predicative genitives are tagged as NP-POS
inside NP-PRD
:
( (IP-MAT (KON vnde)
(CP-FRL-1 (WNP-2 (DPWS wat))
(IP-SUB (NP-OB1 *T*-2)
(NP-SBJ (DDARTA dey)
(NA Rayt))
(NP-ADT (DDARTA des))
(VVFIN nymet)))
(!!ED!! $.$)
(NP-POS-3 (CP-FRL *ICH*-1)
(DPRELS des))
(VVFIN is)
(NP-PRD (NP-POS (DDARTA des) ← predicative genitive
(NA richteres)))
(NP-SBJ (NP-POS *ICH*-3)
(DDARTA dey)
(ADJA derde)
(NA deyl))
($; .)
(!!ED!! $.$)))
"And whatever the council takes [<sub>1</sub> because of this matter ], one third [<sub>2</sub> of it ] is [<sub>3</sub> the judge's ]"
Clausal adjuncts (NP-ADT
)
Clausal adjuncts (usually similar in meaning to an English prepositional phrase such as “about…”) are annotated as NP-ADT
:
( (IP-MAT (KON vnde)
(CP-FRL-1 (WNP-2 (DPWS wat))
(IP-SUB (NP-OB1 *T*-2)
(NP-SBJ (DDARTA dey)
(NA Rayt))
(NP-ADT (DDARTA des)) ← NP-ADT
(VVFIN nymet)))
(!!ED!! $.$)
(NP-POS-3 (CP-FRL *ICH*-1)
(DPRELS des))
(VVFIN is)
(NP-PRD (NP-POS (DDARTA des)
(NA richteres)))
(NP-SBJ (NP-POS *ICH*-3)
(DDARTA dey)
(ADJA derde)
(NA deyl))
($; .)
(!!ED!! $.$)))
"And whatever the council takes [<sub>1</sub> because of this matter ], one third [<sub>2</sub> of it ] is [<sub>3</sub> the judge's ]"
Temporal adverbials (NP-TMP
)
Genitive NPs which function as temporal adverbials at clause-level are treated as NP-TMP
:
TODO: example here
Genitive objects of certain verbs (NP-OB1
)
Certain verbs in MLG take a genitive argument as their object. This is particularly common when a verb is negated or has an inherent negative meaning, e.g.:
- bekennen, e.g. …dat se des bekende myt ghuden willen (Stralsund)
- beropet, e.g. Swelich man sich sines tvges beropet umber gelt. (Braunschweig)
- bidden, e.g. Biddet he enis echten dinghis… (Duderstadt1)
- danken, e.g. und *con* dankende ome sins spels (Engelhus)
- don, e.g. Swaz de rat tot mit dher stat willen… (Duderstadt1)
- entberen, e.g. he wolde leuer der kindere enbern (Engelhus)
- entgan, e.g. …se mogen is bat entgan mit ires eneshant (Braunschweig)
- gelden, e.g. Swaz so en man eneme gaste ghelden sal… (Duderstadt1)
- hebben, e.g. …ne mach he sines waren nicht hebben (Braunschweig)
- hȍden, e.g. vnd *con* hodde der skap (Engelhus)
- lösen, e.g. Losede vy es oc nicht binnen tuen jaren… (Stralsund)
- los werden, e.g. dat sine borgere des loftes nummer los worden (Engelhus)
- loven, e.g. Swaz man vor dren ratmannen louit… (Duderstadt1)
- melden, e.g. en iunchurowe eder en vrouwe de hemelich iuwes kapitteles melde eynem vromeden personen (Stralsund)
- plegen, e.g. Swelich borghere nenis rechtis wil pleghen… (Duderstadt1)
- vorkopen, e.g. Swaz en man vorloft beneden eneme scillinghe… (Duderstadt1)
- underwinden, e.g. Neman ne mach sich nener inning oder Werkes vnderwinden (Duderstadt1)
- vorgeten, e.g. dar vmme forgot se sir (Engelhus)
- vormögen, e.g. Were ok sake dat wy des nicht en vormochten… (Bremen)
- vorsaken, e.g. Swelich man deme anderen sculdich is und es ime vorsaket… (Duderstadt1)
- vorwinnen, e.g. Wert he is verwunnen met den screimannen na rechte… (Braunschweig)
- warden, e.g. vnde darselves scolen se des warden mit voller macht (Stralsund)
- weigeren, e.g. do Joachim des tins weygerde (Engelhus)
- wunschen, e.g. dat alle lude sins does wunscheden ane eyn olt vrowe (Engelhus)
Complement NPs of nouns and adjectives (NP-COM
)
Where a noun or an adjective takes a bare noun phrase complement, the complement is treated as NP-COM
:
(ADJP-SMC (ADJA uol)
(NP-COM (NA (NA gnade)
(KON vnde)
(NA warheit)))
)
Buxtehuder
- TODO: list of nouns/adjectives in MLG which take bare noun phrase complements, e.g. `half’ etc.
NP-POS
TODO: constrain precisely for which genitives this is used and for which not.
Adjective phrases (ADJP
)
Usually, adjectives do not project an ADJP
and are just sisters to the noun which they modify.
An ADJP
is only projected when:
- An adjective is not contained within an
NP
because it is predicative or extraposed. - An adjective within an
NP
is itself modified. - An adjective within an
NP
takes a complement.
(IP-MAT (NP-SBJ (PPER he))
(VVFIN is)
(ADJP-PRD (ADJD edel)) ← predicative ADJ projects an ADJP
...
)
(ADJP (AVD seer) ← modifier of ADJ
(ADJA armode)
)
(ADJP (PP (APPR in)
(NP (DIA allen)
(NA dingen)))
(ADJD loflyk)
)
Transitive adjectives
Transitive adjectives head an ADJP
and take an NP
complement:
(ADJP (NP-COM (DIA alles)
(NA dinges))
(ADJD luckich) ← transitive adjective
)
(ADJP (NP-COM (DDARTA de)
(NA werlt))
(ADJD half) ← transitive adjective
)
Extended tags for ADJP
Note that any ADJP
at the clause level should have one of the following dashtags:
ADJP-PRD
ADJP-SMC
(See under Noun Phrases sections for use of the extended tags-PRD
and-SMC
)
Quantifier phrases (QP
)
Usually, quantifiers do no project a QP
and are host sisters to the noun which they modify.
A QP
is only projected when:
- A quantifier is not contained within an NP because it is extraposed.
- A quantifier within an NP is itself modifier. TODO: insert example of this
(IP-MAT (NP-SBJ (PPER wi)
(QP *ICH*-1)
(VMFIN moten)
(QP-1 (DIA alle)
(PP (APPR van)
(NP (DDARTA den)
(NA gude)
(VVINF leuen)
)
Number phrases (NUMP
)
We follow the Penn scheme in using the label NUMP
for multi-word numbers, or numbers modified in some way:
(NUMP (CARDN hundert)
(KON vnde)
(CARDN dre)
(KON Vn)
(CARDN viftich)
)
Buxtehuder
- Note that the internal structure of
NUMP
is typically flat.
Adverb phrases (ADVP
)
Eventually, adverb phrases will usually take one of the following extended labels:
-LOC
: locative-DIR
: directive-TMP
: temporal
Usually, ADVP
s consist of a single adverb, possibly with a modifier. In other words, adjacent and functionally equivalent adverbs are separated into different ADVP
s:
TODO: insert example
Pronominal adverbs
Pronominal adverbs e.g. dahin, daher etc. are treated as structurally similar to darum, davon etc. (see below). The d- element is tagged as PAVKO
.
(IP-MAT (ADVP (ADVP (PAVKO dare))
(AVD hon))
)
Prepositional phrases (PP
)
Prepositions can take the following categories as their complement:
- an
NP
unmarked for function - a
CP
- an
ADVP-LOC
Note that pronominal adverbs (e.g. darumme) are treated as PP
s:
(PP (ADVP (PAVKO Dar))
(PAVAP vmme)
)
Pronominal adverbs can also be discontinuous:
-
The d-element projects an
ADVP
which is treated as having been fronted out of thePP
. -
The d-element is traced into the PP as the complement of the head preposition (`PAVAP’).
(IP-MAT (ADVP-1 (PAVD Daer))
(NP-OB2 (PPER vns))
(VAFIN is)
(PP (ADVP *ICH*-1)
(PAVAP van))
(VVPP ghecomen)
(NP-SBJ (ADJA manich)
(leet))
)
Double prepositions
MLG has a range of `double prepositions’:
- bet an
- bet uppe
- van an
- wente an
- wente to
- wente uppe
In such cases, both prepositions are tagged APPR
, are immediately dominated by the PP
and are sisters to each other and the NP
complement:
(PP (APPR wente)
(APPR an)
(NP (NP-POS (NA godes)
(NA ghebort)))
)
Co-occuring prepositions and adpositions
(PP (APPR bet)
(ADVP (AVD bauen))
(APPO an)
)
`up to the top'
Wh-phrases (W*P
)
There are four types of wh-phrase:
WADJP
WADVP
WNP
WPP
A wh-phrase is projected whenever we have a wh-word (AVW
).
The internal structure of wh-phrases is in principle identical to that of their non-wh-counterparts (ADJP
, ADVP
, NP
etc.). However the content of a wh-phrase can often be 0, indicating an `empty’ operator. For 0 wh-phrases, see elsewhere.
WADJP
A WADJP
with overt content is common in embedded interrogatives (CP-QUE
). The ADJP
is traced into the IP-SUB
.
(IP-MAT ...
(CP-QUE (WADJP-1 (AVW wo)
(ADJD swaer))
(IP-SUB (ADJP-PRD *T*-1)
(NP-SBJ (DDARTA de)
(NA sunde))
(PP (APPR vor)
(NP (NA gode))
(VVFIN weer))))
)
WADVP
A WADVP
with overt content is common in embedded interrogatives (CP-QUE
). The ADVP
is traced into the IP-SUB
:
(IP-MAT (NP-SBJ (DPDS Dit))
(VVFIN is)
(CP-QUE (WADVP-1 (AVW waer))
(IP-SUB (ADVP *T*-1)
(NP-SBJ (NE Adam))
(ADVP (AVD erst))
(VVPP ghemaket)
(WAFIN woerd)))
...
)
A WADVP
with overt content is also common in relative causes (CP-REL
). Again, the ADVP
is traced into the IP-SUB
:
(NP-OB1 (DDARTA den)
(NA ortsprunck)
(CP-REL (WADVP-1 (PAVREL dar))
(IP-SUB (NP-SBJ (PPER du))
(PP (ADVP *T*-1)
(PAVAP van)
(VVPP ghekamen)
(VAFIN byst))))
)
wen projects a WADJP
when it introduces an adverbial clause (CP-ADVP
). The ADVP
is traced into the IP-SUB
:
(IP-MAT (KON vnde)
(NP-SBJ (PPER yck))
(PTKNEG nicht)
(PP (ADVP (PAVD dar))
(PAVAP vp))
(VVPP gedacht)
(VAFIN hadde)
(CP-ADV (WADVP-1 (AVW wen))
(IP-SUB (ADVP-TMP *T*-1)
(NP-SBJ (PPER ik))
(NP-OB1 (PPER id))
(PTKN suluen)
(PTKNEG nicht)
(VVPP geseen)
(VAFIN hadde)))
)
WNP
A WNP
with overt content is common in embedded interrogatives (CP-QUE
). The WNP
is traced into the IP-SUB
:
(CP-QUE (WNP-1 (DPWS wath))
(IP-SUB (NP-SBJ *T*-1)
(VVFIN duncket)
(NP-OB2 (PPER dy))
(PP (APPR van)
(NP (DPOSA myner)
(NA brud))))
)
A WNP
with overt content commonly occurs in relative causes (CP-REL
). The WNP
is traced into the IP-SUB
:
(NP-OB1 (DPOSA syne)
(ADJA erste)
(NA brud)
(CP-REL (WNP-2 (DPRELS de))
(IP-SUB (NP-SBJ *T*-2)
(NP-OB2 (PPER eme))
(PTKNEG nicht)
(ADJP-PRD (ADJD edel)
(AVD ghenoch))
(PTKNEG en)
(VVFIN was)))
)
WPP
A WPP
with overt content is common in embedded interrogatives (CP-QUE
). The WPP
is traced into the IP-SUB
:
(CP-QUE (WPP-1 (WADVP (PAVW waer))
(PAVAP vmme))
(IP-SUB (PP *T*-1)
(NP-OB2 (PPER vns))
(NP-OB1 (DDARTA den)
(DPOSA syn))
(VVPP verleent)
(VAFIN heft))
)
A WPP
with overt content can also occur in relative clauses (CP-REL
). The WPP
is traced into the IP-SUB
:
(CP-REL (WPP-1 (WADVP (PAVREL dar))
(PAVAP mede))
(IP-SUB (PP *T*-1)
(NP-SBJ (PPER ick))
(VVINF kamen)
(VMFIN mochte)
(PP (APPR to)
(NP (DPOSA myner)
(ADJA begerliken)
(NA leue))))
)
4. Special structures by word
alene
As a focus particle, alene is just tagged as an ordinary ADJV
which projects an ADVP
which in turn modifies a noun:
(NP (DDARTA dat)
(NA gelt)
(ADVP (ADJV alene))
)
(al)sō
The word (al)sō is assigned one of a variety of POS-tags, in accordance with the following guidelines:
AVKO
(‘conjunctional adverb’) when it appears at clause-level, in which case it projects anADVP
.
(IP-MAT (ADVP (AVKO Also)) ← tagged AVKO, projects ADVP
(VAFIN leth)
(NP-SBJ (PPER he))
(VVINF beropen)
...
)
PTKA
(‘pre-adjectival/adverbial particle’) when it modifies an adjective or adverb.
(IP-MAT (KON Vnde)
(NP-SBJ *con*)
(VVFIN maken)
(NP-OB1 (DDARTA de)
(NA memorie))
(ADJP-SMC (PTKA also) ← tagged PTKA
(ADJD verwoet) ← modified ADJ
(CP-DEG (KOUS Dat)
...))
)
PTKG
(‘generalising particle’) when it modifies a wh-word.
(CP-FRL (WNP-1 (PTKG so) ← tagged PTKG
(DPRELS wat)) ← wh-word
(IP-SUB (NP-OB1 *T*-1)
(NP-SBJ (PPER he))
(VVFIN horet))
)
KOKOM
when it heads a PP containing a term of comparison.
(IP-MAT (KON Vnde)
(NP-SBJ *con*)
(VMFIN scal)
(VVINF dorren)
(PP (KOKOM also) ← tagged KOKOM
(NP (DDARTA de)
(NA winlode)))
)
(PP (KOKOM also) ← tagged KOKOM
(CP-CMP (WADVP-1 0)
(IP-SUB (ADVP *T*-1)
(NP-SBJ (PPER wy))
(ADVP-TMP (AVD vore))
(VAFIN hadden)
(VVPP ghedan)))
)
KOUS
when it introduces a manner adverbial clause.
(IP-IMP-SPE (VVIMP maket)
(ADVP (ADJV recht))
(NP-OB1 (DDARTA den)
(NA wech)
(NP-POS (DDARTA des)
(NA heren)))
(ADVP (KOUS also) ← tagged KOUS
(CP-ADV (WADVP-1 0)
(IP-SUB (ADVP *T*-1)
(NP-SBJ (NE ysayas)
(NP-PRN (DDARTA de)
(NA prophete)
(VVFIN sprak))))))
)
N.B. A CP-ADV
may also be headed by alsō when it introduces a reason clause (“as/since she watered the flowers, they grew taller”) or a temporal clause (“as she watered the flowers, she sang”). In these cases, it is assigned the lemma “alsō2” in order to aid the corpus validity checks in distinguishing it from the above kinds of alsō.
amen
amen is tagged as an interjection (INTJ
) and attaches at sentence-level:
(FRAG (PP (APPR in)
(NP (NP-POS (NA godes))
(NA namen)))
(INTJ amen) ← tagged INTJ
)
bēde
The word bēde is assigned one a variety of POS-tags, according to the following guidelines:
DIA/DID/DIN
(‘indefinite determiner’) when it functions as a determiner.
(PP (APPR van)
(NP (DIA beiden) ← tagged DIA
(NA saken))
)
(NP-OB1 (PPER vns)
(DIN beiden) ← tagged DIN
)
KON
(‘coordinating conjunction’) when it introduces a correlative conjunction structure.
(NP-OB1 (KON beide) ← tagged KON
(NP (ADJA hillighe)
(NA scrift))
(CONJP (KON vnde)
(NP (ADJA natuerlike)
(NA scrift)))
)
dat
The word dat is given one of various treatments in accordance with the following guidelines:
DPDS
when it is a demonstrative pronoun.
(IP-MAT (NP-SBJ (DPDS dat)) ← tagged DPDS
(VVFIN is)
(ADJP-PRD (ADJD openbaer))
)
DDARTA
when it is a determiner.
(NP-OB1 (DDARTA dat) ← tagged DDARTA
(ADJA ewighe)
(NA leuen)
)
DPRELS
when it functions as a relative pronoun.
(NP-OB1 (DDARTA dat)
(NA selue)
(CP-REL (WNP-1 (DPRELS dat)) ← tagged DPRELS
(IP-SUB (NP-SBJ *T*-1)
(PP (APPR in)
(NP (DDARTA der)
(NA werlde)
(VVFIN is)))))
)
KOUS
when it introduces a subordinate clause.
– KOUS dat
projects a CP-THT
if it introduces a that-complement.
(IP-MAT (NP-SBJ (PPER He))
(VMFIN wolde)
(CP-THT (KOUS dat) ← tagged KOUS
(IP-SUB (NP-SBJ (PPER wi))
...))
)
– KOUS dat
projects a CP-DEG
if it introduces a degree complement.
(ADVP (PTKA also)
(ADJV veren)
(CP-DEG (KOUS dat) ← tagged KOUS
(IP-SUB (NP-SBJ (PPER se))
(PTKNEG nicht)
...))
)
– KOUS dat
projects a CP-ADV
if it is a purpose or reason clause with the verb in the subjunctive.
(IP-IMP (ADVP (AVD Nu))
(VVIMP help)
(NP-OB1 (PPER my))
(NP-VOC (ADJA leue)
(NE ihesus))
...
(CP-ADV (KOUS Dat) ← tagged KOUS
(IP-SUB (NP-SBJ (PPER ic))
(ADVP (AVD nv))
(VVFIN volghe)
(NP-OB1 (DDARTA den)
(ADJA rechten)
(NA pat))))
)
- In exceptional cases, dat is tagged
KON
(coordinating conjunction) when it appears at the beginning of a matrix clause.
(IP-MAT (KON men)
(KON dat) ← tagged KON
(NP-SBJ (PPER he))
(VVFIN merkede)
(NP-OB1 (NP (DPOSA ere)
(NA yoget))
(CONJP (KON vnde)
(NP (NA schonheit)
(NP-POS (DPOSA eres)
(NA lyues))))
(CONJP (KON Unde)
(NP (NA grotheyt)))
(CONJP (KON vnde)
...))
)
dewīle
- When it is a sentential adverb, dewīle is tagged
DDARTA+NA
and is treated as heading anNP-TMP
.
(IP-MAT (NP-SBJ (NA Her)
(NE wolter))
(VAFIN leth)
(ADVP (AVKO ock))
(NP-TMP (DDARTA+NA dewile)) ← tagged DDARTA+NA
(VVINF maken)
(NP-OB1 (NP (ADJA guldenne)
(NA rynghe))
...)
)
- When dewīle introduces a subordinate clause, it is tagged as
KOUS
and is treated like similar items (e.g. also, do).
(ADVP-TMP (KOUS dewile) ← tagged KOUS
(CP-ADV (WADVP-0)
(IP-SUB (ADVP-TMP *T*-1)
(NP-SBJ (PPER se))
...))
)
ēnander
Unlike the reflexive pronoun sik which is consistently tagged PRF
(‘reflexive personal pronoun’) in the corpus (see below), ēnander is just tagged PPER
(‘irreflexive personal pronoun’).
TODO: example
genōch
- genōch is tagged as a noun (
NA
) when it looks nominal and projects anNP
.
(NP-OB1 (NP-POS (NA gheldes))
(NA ghenoch) ← tagged NA
)
- Otherwise it is tagged as an adverb (
AVD
).
TODO: example
hen
hen is tagged PTKVZ
(‘verbal particle, separable’) and does not project a phrase when it occurs at sentence-level:
(IP-MAT (ADVP (AVD aldus))
(VVFIN ga)
(NP-SBJ (PPER ick))
(PTKVZ hen) ← tagged PTKVZ
)
item
item has one of two POS-tags according to context:
- When it prompts subject-verb inversion (V2) it should be tagged as
AVKO
(‘conjunctional adverb’), in which case it projects anADVP
.
TODO: example here
- When it does not prompt subject-verb inversion it should be tagged as
KON
(‘conjunction’).
TODO: example here
- Note: item is not tagged
FM
(‘foreign material’), despite the fact that it is of Latin origin.
lēf
- In the construction lēf hebben, the word lēf is tagged as
PTKVZ
(‘verbal particle, separable’).
(IP-MAT-SPE (NP-OB1 (DPDS dusse))
(VMFIN scholle)
(NP-SBJ (PPER gy))
(PTKVZ leff) ← tagged PTKVZ
(VVINF hebben)
)
(Griseldis)
- In other uses, such as with wēsen, it is an adjective (
ADJ*
).
(IP-MAT (NP-SBJ (DDARTA Dusse)
(NA rede))
(VVFIN weren)
(NP-OB2 (DDARTA den)
(NA ridderen))
(ADJP-PRD (ADJD lef)) ← tagged ADJD
)
(Zeno)
mehr
When mehr is used in a comparative construction with als, it is treated as follows:
(NP-OB1 (DPIS mehr)
(PP (KOKOM als)
(NP (OA twelff)
(CARDA teen)
(NA molte)))
)
noch
The word noch is given one of various treatments, as follows:
AVD
when it is a straightforward sentential adverb, in which case it projects anADVP
.
(IP-MAT ...
(ADVP (AVD Wol))
(VVFIN bist)
(NP-SBJ (PPER u))
(ADVP (AVD noch)) ← tagged AVD and projects an ADVP
(PP (APPR in)
(NP (ADJA bloidender)
(NA tyd)))
)
PTKA
when it modifies an adjective or adverb, and is sister to the adjective/adverb.
(ADVP (ADVP (PTKA noch) ← tagged PTKA
(ADJV meer))
(AVD allene)
)
KON
when it functions as a coordinating conjunction (the example here also features correlative coordination, see elsewhere)
(NP-OB1 (KON noch) ← tagged KON
(NP (DDARTA den)
(NA armen)
(CONJP (KON noch) ← tagged KON
(NP (DDARTA den)
(NA riken)))
)
sik
The part of speech tag PRF
should only be assigned to the lemma sik.
(IP-MAT (ADVP (AVKO Doch))
(VAFIN hadde)
(NP-SBJ (PPER se))
(NP-OB1 (PRF syck)) ← tagged PRF
(VVPP vorgenomen)
(ADVP (AVD duldichliken))
...
)
(Griseldis)
N.B. Some (but not all) ReN texts also assign PRF
when a pronoun other than sik happens to be coreferential with the subject, but in the CHLG these are tagged PPER
.
swelich
The item swelich is treated as two separate tokens (s- -welich). S- is tagged as PTKG
(‘generalising particle’) and -welich is tagged as DWA
:
(WNP (PTKG S)
(DWA welich)
(NA voget)
)
(Braunschweig)
sülve
The word sülve is generally tagged PTKN
.
(NP-SBJ (NA got)
(PTKN seluen)
)
(PP (APPR By)
(NP (PPER my)
(PTKN seluen))
)
- Note: when tagged as
PTKN
and occurring at sentence-level, sülve does not project a phrasal category (i.e. is immediately dominated byIP-MAT
).
(IP-MAT (NP-SBJ (PPER Wi))
(VMFIN (solden))
(PTKN seluen)
(ADVP (ADVP (AVD seer))
(AVD luttick))
(VVINF beholden)
)
- Note: when sülve appears with a definite article (
DDARTA
), it is tagged asADJ*
.
(NP-SBJ (DDARTA dey)
(ADJA selue)
(NA richtere)
)
(NP-OB1 (DDARTA deme)
(ADJS suluen)
)
sunder
The item sunder is treated in one of various ways, as follows:
APPR
when it is a preposition (‘ohne’, ‘außer’), in which case it projects aPP
.
(IP-MAT (PP (ADVP (PAVKO Dar))
(PAVAP na))
(VVFIN schededen)
(NP-SBJ (PPER se))
(NP-OB1 (PRF syk))
(PTKNEG nicht)
(PP (APPR sunder) ← tagged APPR
(NP (ADJA grot)
(NA leyt)))
)
KON
when it functions as a coordinating conjunction (‘sondern’, ‘aber’).
(IP-MAT (KON Sunder) ← tagged KON
(NP-SBJ (PPER se))
(VVFIN bleuen)
(PP (APPR by)
(NP (PPER eme)))
(NP-TMP (DDARTA den)
(NA dach))
)
vnde
The word vnde is tagged as:
KON
when it functions as a coordinating conjunction.
(IP-MAT (KON vnde) ← tagged KON
(NP-SBJ (DDARTA dat)
(NA wort))
(VVFIN was)
(PP (APPR bi)
(NP (NA gode)))
)
AVD
when it functions as a sentence adverb with the meaning ‘also’, in which case it projects anADVP
.
(IP-MAT (KON vnd) ← tagged KON
(PP (ADVP (PAVD dar))
(PAVAP na))
(VVFIN kam)
(ADVP (AVD vnd)) ← tagged AVD
(NP-SBJ (NE Jafeth))
)
wane
- wane should be tagged as a preposition (
APPR
) when it introduces a noun phrase. The noun phrase is annotated as the complement of wane.
(PP (APPR wane)
(NP (CARDA ver)
(NA scilling))
)
TODO: presumably this also applies to dan when it introduces a noun phrase?
von NP wegen
-
This construction is a treated as a
PP
headed by preposition (APPR
) von. -
wegen is tagged as a noun (
NA
) which heads anNP
which is the complement of thePP
headed by von. -
The NP preceding wegen is tagged as
NP-POS
and is a complement of the head noun wegen.
(PP (APPR von)
(NP (NP-POS (DPOSA orer)
(ADJA eyghen)
(NA sunde))
(NA wegen))
)
wente
For finite clauses introduced by wente which are are ambiguous and cannot be labelled as either matrix (IP-MAT
) or subordinate (IP-SUB
), we use a novel label, IP-X
.
This applies to wente-clauses which are unambiguously V2:
TODO: parsed example, e.g. wente dit is godes sone
This also applies to wente-clauses where the verb position is hard to diagnose, since there are only two constituents in the clause:
TODO: parsed example, e.g. wente he kam
Note: wente-clauses which are clearly verb-final
TODO: parsed example
(see LREC paper for more details).
weyt (with cardinals)
-
weyt is tagged as an
AVD
which projects anADVP
. -
The
ADVP
is a sister of the cardinals (CARD*
). -
The cardinals and the
ADVP
headed by weyt all sit within aNUMP
(the label used for complex numbers, see elsewhere).
(NP-PRD (NUMP (CARDA seuentich)
(ADVP (AVD weyt))
(CARDA seuen))
(NA weruen)
)