Reified LMNL

Introduction

This document defines the structure of a reified LMNL layer. The definitions in this document can be used in other documents, for example API or syntax definitions, that need to refer to the mapping between a syntax and a LMNL data model.

When a LMNL document has been serialised, the serialised form of the LMNL document can become a LMNL document itself. For example, if the LMNL document described in the example in the LMNL data model were serialised using XML, it would look approximately like:

<date day-of-week="Friday">
  <year>2002</year>-<month name="August">08</month>-<day>23</day>
</date>

Note that it is not possible to accurately serialise this LMNL document using XML, since it is not part of the tree subset of LMNL as it contains an annotation that itself has annotations.

This document's content would be the sequence of characters that make up the XML document: { '<', 'd', 'a', 't', 'e', ' ', ..., 't', 'e', '>' }.

Layers could then be defined over this sequence of characters in order to extract information from the XML document.

One such layer could be a layer that represents the LMNL data model of the document. This layer is termed the reified LMNL layer. To be recognised as holding information about a LMNL document, such a layer must contain a sequence of ranges with particular names and particular annotations. In essence, these names and annotations reflect the LMNL data model, plus a certain amount of syntactic information.

A reified LMNL layer contains all the information required to reconstruct a LMNL document without reference to any base. Since the base is irrelevant, the actual values of the ranges' start and length are not important; the only thing that is important is the relationships between the ranges in this layer.

The ranges held by the reified LMNL layer, their annotations, the constraints on their relationships with other ranges, and their mapping on to the non-reified LMNL data model, are described below.

Last modified 11 Oct 2002 by Jeni Tennison.

Notation

For brevity, this document will refer to ranges and annotations in the form “[name] range” or “[name] annotation”. In this notation, the “name” specified is the qualified name for the range or notation, which can be resolved to provide the name of the range.

A qualified name with no prefix indicates an expanded name whose namespace name is the empty string (in other words, they are in no namespace). A qualified name with the prefix rl indicates an expanded name whose namespace name is the reified LMNL namespace. The LMNL namespace is:

http://www.lmnl.org/namespace/reified

A qualified name with the prefix syn indicates an expanded name whose namespace name is the LMNL syntax namespace. The LMNL syntax namespace is:

http://www.lmnl.org/namespace/syntax

Within the reified LMNL layer, all annotations have simple values, which are just text layers with no overlays. In other words, none of the annotations in the reified LMNL layer have structured values, though some of them do have annotations of their own.

Document Range

The [rl:document] range is a range that represents the document as a whole. The [rl:document] range must enclose all other ranges in the reified LMNL layer. The closest enclosed ranges of the [rl:document] range can be [rl:range], [rl:text], [syn:entity] or [syn:comment] ranges.

Document Content

The content of the document is a sequence of characters constructed by concatenating the values of the [characters] annotations of those [rl:text] ranges in the reified LMNL layer that are not within a [rl:annotation] range.

Overlays

Working out the overlays of a document or an annotation value involves grouping together the contained ranges of the [rl:document] or [rl:annotation] range representing the document or annotation.

The contained ranges of the document represented by a [rl:document] range are the [rl:range] ranges that are not within any [rl:annotation] range.

The contained ranges of the value of the annotation represented by a [rl:annotation] range are the [rl:range] ranges that are:

  1. within the [rl:annotation] range and
  2. not within any [rl:annotation] range that is within this [rl:annotation] range

Partition the contained ranges into groups based on the value of their [owner-layer] annotations if they have one; the layer can be identified through this value. The group of contained ranges that don't have an [owner-layer] annotation forms the default layer of the document or annotation.

The ranges represented by the [rl:range] ranges in each group form the content of a layer. It must be the case that the [base] annotations on the [owner-layer] annotations of all the [rl:range] ranges in a particular group have equal values; this identifies the layer's base. If the [base] annotations have the special value "#text" then the layer's base is the document itself (or the value of the annotation if you're looking at ranges within a [rl:annotation] range). If the [base] annotations have the special value "#default" then the layer's base is the default layer of the document (or annotation value). Otherwise, it's an error if no layer in the document (or annotation) has the identifier specified by the [base] annotations. The default layer's base is the document (or annotation value) itself.

The layers constructed by the previous step can now be arranged into bases and overlays. The overlays of the document (or annotation value) are the default layer and those constructed from the contained ranges whose [owner-layer] annotation's [base] annotation has the value "#text". Similarly, each one of those layers' overlays are those constructed from the contained ranges of the document whose [owner-layer] annotation's [base] annotation has a value equal to that layer's identifier.

Range Range

A [rl:range] range is a range that represents a range within a layer in a document. The closest enclosing range of a [rl:range] range must be a [rl:document] or [rl:value] range. The closest enclosed ranges of the [rl:range] range may include a single [rl:value] range and may include any number of [rl:annotation] ranges before or after that [rl:value] range. A [rl:range] range can overlap [rl:value] ranges and other [rl:range] ranges but the overlap must only occur within the [rl:value] range that it encloses.

Owner Layer Annotation

A [rl:range] range may have a [owner-layer] annotation, which identifies the layer to which the range represented by the [rl:range] range belongs. The [owner-layer] annotation must have a [base] annotation which specifies the base of that layer. See the descriptions of the [rl:document] and [rl:annotation] ranges to see how the layers are constructed.

Name Annotation

Each [rl:range] and [rl:annotation] range can have a [name] annotation, which represents an expanded name used as the name of the range represented by the [rl:range] range or as the name of the annotation represented by the [rl:annotation]. [rl:range] ranges representing anonymous ranges must not have [name] annotations; all other [rl:range] and [rl:annotation] ranges must have a [name] annotation.

The [name] annotation's value holds the local part of the expanded name. The [name] annotation may have its own [namespace] annotation, whose value is the namespace name of the expanded name represented by the [name] annotation. If the [name] annotation does not have a [namespace] annotation then the namespace name of the expanded name representated by the [name] annotation is the empty string.

The [namespace] annotation may have a [syn:prefix] annotation, which holds the preferred prefix for the namespace.

Range Identifier

Each [rl:range] range may have a [syn:id] annotation, which holds the name used, in preference, as the identifier used for disambiguating tags.

Range Start and Length

The start and length of the range represented by a [rl:range] range are derived from the position of the [rl:range] range amongst the other ranges in the document.

If the [rl:range] range's [owner-layer] annotation's [base] annotation has the special value "#text", of if the [rl:range] range doesn't have a [owner-layer] annotation, then the start of the range represented by the [rl:range] range is based on the [rl:text] ranges that:

  1. have the same closest enclosing [rl:annotation] range as the [rl:range] range or, if the [rl:range] range is not within a [rl:annotation] range, are also not within a [rl:annotation] range and
  2. precede the [rl:range] range

The start of the range represented by the [rl:range] range is the sum of the number of characters in the [characters] annotations of these [rl:text] ranges. The length of the range represented by the [rl:range] range is equal to the sum of the number of characters in the [characters] annotations of the [rl:text] ranges that are within the [rl:range] range and not within any [rl:annotation] ranges that are also within the [rl:range] range.

If the [rl:range] range's [owner-layer] annotation's [base] annotation does not have the special value "#text", then the start of the range represented by the [rl:range] range is the number of [rl:range] ranges:

  1. that have the same closest enclosing [rl:annotation] range as the [rl:range] range or, if the [rl:range] range is not within a [rl:annotation] range, are also not within a [rl:annotation] range,
  2. that precede the [rl:range] range, and
  3. whose [owner-layer] annotation value is equal to the [rl:range] range's [owner-layer] annotation's [base] annotation's value or, if the [rl:range] range doesn't have a [owner-layer] annotation, has the special value "#default".

The length of the range represented by the [rl:range] range is equal to the number of [rl:range] ranges whose [owner-layer] annotation is equal to the [base] annotation's value or, if the [rl:range] doesn't have a [owner-layer] annotation, is equal to the special value "#default", that are within the [rl:range] range and not within any [rl:annotation] ranges that are also within the [rl:range] range.

Annotations

The annotations of the range represented by a [rl:range] range are the annotations represented by the closest enclosed [rl:annotation] ranges within the [rl:range] range.

Value Ranges

A [rl:value] range is a range that does not represent anything itself but is used to delimit the value of a range represented by a [rl:range] range or an annotation represented by a [rl:annotation] range. The closest enclosing range of a [rl:value] range must be a [rl:range] or [rl:annotation] range. The closest enclosed ranges of the [rl:value] range can be [rl:text], [syn:entity], [syn:comment] or [rl:range] ranges, though a [rl:value] range does not necessarily enclose any ranges. A [rl:value] range can overlap [rl:range] ranges and other [rl:value] ranges, but it must not overlap any other ranges.

Annotation Ranges

A [rl:annotation] range is a range that represents an annotation on a range or on an annotation. The closest enclosing range of a [rl:annotation] range must be a [rl:range] or [rl:annotation] range. The closest enclosed ranges of the [rl:annotation] range may include a single [rl:value] range and may include any number of [rl:annotation] ranges around the [rl:value] range. A [rl:annotation] range must not overlap any other range.

Annotation Values

The content of the value of the annotation represented by the [rl:annotation] range is a sequence of characters. These characters are gathered by concatenating the values of the [characters] annotations of the [rl:text] ranges within the [rl:annotation] range that are not within another [rl:annotation] range that is within this one.

Annotation Value overlays

The method of working out the overlays of the value of the annotation represented by the [rl:annotation] range is described above. Basically it involves grouping the [rl:range] ranges based on their [owner-layer] annotation's value.

Name Annotation

Each [rl:annotation] range must have a [name] annotation, which are described above. The [name] annotation on a [rl:annotation] range determines the name of the annotation represented by the [rl:annotation] range.

Annotation Owner

The owner of the annotation represented by a [rl:annotation] range is the range or annotation represented by the closest enclosing range (which will be either a [rl:range] or [rl:annotation] range).

Annotations

The annotations of the annotation represented by a [rl:annotation] range are the annotations represented by its closest enclosed [rl:annotation] ranges, in order.

Text Ranges

A [rl:text] range is a range that represents a sequence of characters in a text layer. [rl:text] ranges must not enclose or overlap any other range. The closest enclosing range of a [rl:text] range must be a [syn:entity], [rl:value] or [rl:document] range.

If a [rl:text] range is within a [rl:annotation] range then the characters that it represents are part of the value of the annotation represented by the [rl:text] range's closest enclosing [rl:annotation] range. If the [rl:text] range is not within a [rl:annotation] range then the characters it represents are part of the content of the document.

Characters Annotation

A [rl:text] range has a single [characters] annotation whose value holds the sequence of characters represented by the [rl:text] range.

Entity Ranges

A [syn:entity] range is a range that indicates where an entity is used in a physical document. This information is not passed through into the data model. An [syn:entity] range may enclose any number of [rl:text] or other [syn:entity] ranges but may not overlap any range. The closest enclosing range of a [syn:entity] range must be a [syn:entity], [rl:value] or [rl:document] range.

Entity Name

Each [syn:entity] range must have a [name] annotation which holds the name of the entity represented by the [syn:entity] range.

Note that this name is not a expanded name, and therefore the [name] annotation on the [syn:entity] range does not have a [namespace] annotation, unlike those on [lr:range] or [lr:annotation] ranges.

Entity Value

The ranges within the [syn:entity] range represent the value of the entity represented by the [syn:entity] range.

Comment Ranges

A [syn:comment] range is a range that indicates where an comment is used in a physical document. This information is not passed through into the data model. An [syn:comment] range must not enclose any ranges or overlap any range. The closest enclosing range of a [syn:comment] range must be a [rl:value] or [rl:document] range.

Comment Value

Each [syn:comment] range must have a [value] annotation whose value is the text of the comment.

Example Reified LMNL Layer

The following example shows a possible reified LMNL layer for an XML representation of the example in the data model spec. The XML representation is:

<date day-of-week="Friday">
  <year>2002</year>-<month name="August">08</month>-<day>23</day>
</date>

There are multiple possible reified LMNL layers that could be generated from this XML document, varying in the precise values of the start and length properties of the ranges that it contains.

A reified LMNL layer for this document is:

layer l1 { base: {},
           content: { r1, r2, r3, r4, r5, r6, r7, r8, r9, r10, 
                      r11, r12, r13, r14,  r15, r16, r17, r18, r19  }, overlays: {} }

range r1 { name: { "http://www.lmnl.org/namespace/reified", "document" }, owner layer: l1,
           start: 0, length: 97, annotations: {} } 

range r2 { name: { "http://www.lmnl.org/namespace/reified", "range" }, owner layer: l1,
           start: 0, length: 97, annotations: { a1, a2 } }

annotation  a1 { name: { "", "owner-layer" }, owner: r2,
                value: "layer1", annotations: { a3  } }

annotation a3 { name: { "", "base" }, owner: a1,
                value: "#text", annotations: {} } 

annotation a2 { name: { "", "name" }, owner: r2,
                value: "date", annotations: {} } 

range r2 { name: { "http://www.lmnl.org/namespace/reified", "annotation" }, owner  layer: l1,
           start: 6, length: 20, annotations: { a4 } }

annotation a4 { name: { "",  "name" }, owner: r2,
                value: "day-of-week", annotations:  {} }

range r3 { name: { "http://www.lmnl.org/namespace/reified",  "value" }, owner layer: l1,
           start: 19, length: 6,  annotations: {} }

range r4 { name: { "http://www.lmnl.org/namespace/reified", "text" }, owner layer: l1,
           start: 19, length: 6, annotations: { a5 } } 

annotation a5 { name: { "",  "characters" }, owner: r4,
                value: "Friday", annotations: {} }

range r5 { name: { "http://www.lmnl.org/namespace/reified",  "value" }, owner layer: l1,
           start: 27, length: 63, annotations: {} } 

range r6 { name: { "http://www.lmnl.org/namespace/reified", "range" }, owner layer: l1,
           start: 27, length: 17, annotations: { a6, a7 } }

annotation  a6 { name: { "", "owner-layer" }, owner: r6,
                  value: "layer1", annotations: { a7  } }

annotation a7 { name: { "", "base" }, owner: a6,
                value: "#text", annotations: {} }

annotation a8 { name: { "", "name" }, owner: r6,
                value: "year", annotations: {} }

range r7 { name: { "http://www.lmnl.org/namespace/reified", "value" }, owner layer:  l1,
           start: 33, length: 4, annotations: {} }

range r8 { name: { "http://www.lmnl.org/namespace/reified", "text" },  owner layer: l1,
           start: 33, length: 4, annotations: { a9 } }

annotation a9 { name: { "",  "characters" }, owner: r8,
                value: "2002", annotations: {}  }

range r9 { name: { "http://www.lmnl.org/namespace/reified",  "text" }, owner layer: l1,
           start: 44, length: 1,  annotations: { a10 } }

annotation a10 { name: { "", "characters" }, owner: r9,
                   value: "-", annotations: {} }

range r10 { name: { "http://www.lmnl.org/namespace/reified", "range" }, owner layer: l1,
            start: 45, length: 31, annotations: { a11, a12 } }

annotation a11 { name: { "", "owner-layer" }, owner: r10,
                   value: "layer1", annotations: { a13 } }

annotation a13 { name: { "",  "base" }, owner: a11,
                 value: "#text", annotations: {} }

nnotation a12 { name: { "", "name" }, owner:  r10,
                value: "month", annotations: {} }

range r11 { name: { "http://www.lmnl.org/namespace/reified",  "annotation" }, owner layer: l1,
            start: 52, length: 13,  annotations: { a14 } }

annotation a14 { name: { "", "name" }, owner: r11,
                 value: "name", annotations: {} }

range r12 { name: { "http://www.lmnl.org/namespace/reified", "value" }, owner layer: l1,
            start: 58, length: 6, annotations: {} }

range r13 { name: { "http://www.lmnl.org/namespace/reified", "text" },  owner layer: l1,
            start: 58, length: 6, annotations: {  a15 } }

annotation a15 { name: {  "", "characters" }, owner: r13,
                 value: "August", annotations: {} }

range r14 { name: { "http://www.lmnl.org/namespace/reified", "value" }, owner layer: l1,
            start: 66, length: 2, annotations: {} }

range r15 { name: { "http://www.lmnl.org/namespace/reified", "text" },  owner layer: l1,
            start: 66, length: 2, annotations: {  a16 } }

annotation a16 { name: {  "", "characters" }, owner: r15,
                 value: "08", annotations: {} }

range r16 { name: { "http://www.lmnl.org/namespace/reified", "text" }, owner layer: l1,
            start: 76, length: 1, annotations: { a17 } }

annotation a17 { name: { "",  "characters" }, owner: r16,
                 value: "-", annotations: {}  }

range r17 { name: { "http://www.lmnl.org/namespace/reified",  "range" }, owner layer: l1,
            start: 77, length: 13,  annotations: { a18, a19 } }

annotation a18 { name: { "", "owner-layer" }, owner:  r17,
                 value: "layer1", annotations:  { a20 } }

annotation a20 { name: {  "", "owner-layer" }, owner: a18,
                 value: "#text",  annotations: {} }

annotation a19 { name: { "", "name" },  owner: r17,
                 value: "day", annotations: {} }

ange r18 { name: { "http://www.lmnl.org/namespace/reified", "value" }, owner layer: l1,
           start: 82, length: 2,  annotations: {} }

range r19 { name: { "http://www.lmnl.org/namespace/reified", "text" }, owner layer: l1,
            start: 82, length: 2, annotations: { a21 } }

annotation a21 { name: { "",  "characters" }, owner: r19,
                 value: "23", annotations: {} }