This document defines the structure of a reified LMNL layer. The definitions in this document can be used in other documents, for example API or syntax definitions, that need to refer to the mapping between a syntax and a LMNL data model.
When a LMNL document has been serialised, the serialised form of the LMNL document can become a LMNL document itself. For example, if the LMNL document described in the example in the LMNL data model were serialised using XML, it would look approximately like:
<date day-of-week="Friday"> <year>2002</year>-<month name="August">08</month>-<day>23</day> </date>
Note that it is not possible to accurately serialise this LMNL document using XML, since it is not part of the tree subset of LMNL as it contains an annotation that itself has annotations.
This document's
content would be the sequence
of characters that make up the XML
document: { '<', 'd', 'a', 't', 'e', ' ', ..., 't', 'e', '>'
}.
Layers could then be defined over this sequence of characters in order to extract information from the XML document.
One such layer could be a layer that represents the LMNL data model of the document. This layer is termed the reified LMNL layer. To be recognised as holding information about a LMNL document, such a layer must contain a sequence of ranges with particular names and particular annotations. In essence, these names and annotations reflect the LMNL data model, plus a certain amount of syntactic information.
A reified LMNL layer contains all the information required to reconstruct a LMNL document without reference to any base. Since the base is irrelevant, the actual values of the ranges' start and length are not important; the only thing that is important is the relationships between the ranges in this layer.
The ranges held by the reified LMNL layer, their annotations, the constraints on their relationships with other ranges, and their mapping on to the non-reified LMNL data model, are described below.
Last modified 11 Oct 2002 by Jeni Tennison.
For brevity, this document will refer to ranges and annotations in
the form “[name]
range” or “[name] annotation”. In this
notation, the “name” specified is the
qualified name for the range
or notation, which can be resolved to provide the
name of the range.
A qualified name
with no prefix indicates an expanded
name whose namespace
name is the empty string (in other words, they are in no namespace). A
qualified name with the prefix rl indicates an expanded name whose
namespace name is the reified LMNL namespace.
The LMNL namespace is:
http://www.lmnl.org/namespace/reified
A qualified name
with the prefix syn indicates an
expanded name whose
namespace name is the
LMNL syntax namespace. The LMNL syntax
namespace is:
http://www.lmnl.org/namespace/syntax
Within the reified LMNL layer, all annotations have simple values, which are just text layers with no overlays. In other words, none of the annotations in the reified LMNL layer have structured values, though some of them do have annotations of their own.
The [rl:document]
range is a range that represents
the document as a whole. The [rl:document] range must
enclose all other ranges in the
reified LMNL layer. The
closest enclosed
ranges of the [rl:document] range can be
[rl:range], [rl:text], [syn:entity] or
[syn:comment] ranges.
The content of
the document is a sequence of characters constructed by concatenating
the values of the
[characters] annotations
of those [rl:text] ranges in the
reified LMNL layer that are not
within a [rl:annotation] range.
Working out the
overlays of a document or an
annotation value involves
grouping together the contained ranges of the
[rl:document] or
[rl:annotation] range
representing the document or annotation.
The contained ranges of the
document represented by a
[rl:document] range are the
[rl:range] ranges that are not
within any
[rl:annotation] range.
The contained ranges of the
value of the
annotation represented by a
[rl:annotation] range are the
[rl:range] ranges that are:
[rl:annotation] range and
[rl:annotation] range that is
within this [rl:annotation] range
Partition the contained ranges into groups based on the
value of their
[owner-layer]
annotations if they have one; the layer can be identified through this
value. The group of contained ranges that don't have an
[owner-layer] annotation forms the default layer of the document or
annotation.
The ranges represented by
the [rl:range] ranges in each group form the
content of a
layer. It must be the case that the
[base] annotations on the
[owner-layer] annotations of all the [rl:range]
ranges in a particular group have equal values; this identifies the
layer's base. If the
[base] annotations have the special value "#text"
then the layer's base is the document itself (or the
value of the
annotation if you're looking at
ranges within a [rl:annotation]
range). If the [base] annotations have the special value
"#default" then the layer's base is the
default layer of the document (or
annotation value). Otherwise, it's an error if no layer in the document (or
annotation) has the identifier specified by the [base]
annotations. The default layer's base is
the document (or annotation value) itself.
The layers constructed by
the previous step can now be arranged into
bases and
overlays. The overlays of the
document (or annotation value) are the default
layer and those constructed from the contained ranges whose
[owner-layer]
annotation's [base]
annotation has the value "#text".
Similarly, each one of those layers' overlays are those constructed from the
contained ranges of the document whose [owner-layer] annotation's
[base] annotation has a value equal to that layer's identifier.
A [rl:range] range is a
range that represents a range within a
layer in a document. The
closest enclosing
range of a [rl:range] range must be a
[rl:document] or
[rl:value] range. The
closest enclosed
ranges of the [rl:range] range may include a single
[rl:value] range and may include any number of
[rl:annotation] ranges before or after that
[rl:value] range. A [rl:range] range can
overlap [rl:value]
ranges and other [rl:range] ranges but the overlap must only occur
within the [rl:value] range that it encloses.
A [rl:range] range
may have a [owner-layer]
annotation, which identifies the layer to which the
range represented by the
[rl:range] range belongs. The [owner-layer]
annotation must have a [base]
annotation which specifies the base of that layer. See the
descriptions of the [rl:document] and
[rl:annotation] ranges to see
how the layers are constructed.
Each [rl:range] and
[rl:annotation] range can have
a [name] annotation, which
represents an expanded name
used as the name of the
range represented by the
[rl:range] range or as the
name of the
annotation represented by the
[rl:annotation]. [rl:range] ranges representing
anonymous ranges must not
have [name] annotations; all other [rl:range] and
[rl:annotation] ranges must have a [name]
annotation.
The [name]
annotation's value
holds the local part of the
expanded name. The
[name] annotation may have its own [namespace] annotation, whose
value is the namespace name of
the expanded name represented by the [name] annotation. If the
[name] annotation does not have a [namespace]
annotation then the namespace name of the expanded name representated by the
[name] annotation is the empty string.
The [namespace]
annotation may have a [syn:prefix] annotation, which holds
the preferred prefix for the namespace.
Each [rl:range] range
may have a [syn:id] annotation,
which holds the name used, in preference, as the identifier used for disambiguating
tags.
The start and
length of the
range represented by a
[rl:range] range are derived from
the position of the [rl:range] range amongst the other ranges in
the document.
If the [rl:range]
range's [owner-layer]
annotation's [base]
annotation has the special value "#text", of if the
[rl:range] range doesn't have a [owner-layer]
annotation, then the start of the
range represented by the
[rl:range] range is based on the
[rl:text] ranges that:
[rl:annotation] range as the
[rl:range] range or, if the
[rl:range] range is not within a [rl:annotation]
range, are also not within a [rl:annotation] range and
[rl:range] range
The start of the
range represented by the
[rl:range] range is the sum of the
number of characters in the
[characters] annotations
of these [rl:text] ranges. The
length of the range represented
by the [rl:range] range is equal to the sum of the number of
characters in the [characters] annotations of the
[rl:text] ranges that are within the [rl:range] range
and not within any [rl:annotation] ranges that are also within the
[rl:range] range.
If the [rl:range]
range's [owner-layer]
annotation's [base]
annotation does not have the special value "#text", then the
start of the
range represented by the
[rl:range] range is the number of
[rl:range] ranges:
[rl:annotation] range as the
[rl:range] range or, if the
[rl:range] range is not within a [rl:annotation]
range, are also not within a [rl:annotation] range,
[rl:range] range, and
[owner-layer] annotation
value is
equal to the
[rl:range] range's [owner-layer] annotation's
[base] annotation's value or,
if the [rl:range] range doesn't have a [owner-layer]
annotation, has the special value "#default".
The length of the
range represented by the [rl:range] range is equal to the number
of [rl:range] ranges whose [owner-layer] annotation
is equal to the [base] annotation's value or, if the
[rl:range] doesn't have a [owner-layer] annotation,
is equal to the special value "#default", that are
within the [rl:range]
range and not within any [rl:annotation] ranges that are also
within the [rl:range] range.
The annotations of the
range represented by a
[rl:range] range are the
annotations represented by the
closest enclosed
[rl:annotation] ranges within the [rl:range] range.
A [rl:value] range
is a range that does not represent
anything itself but is used to delimit the value of a range represented by a
[rl:range] range or an
annotation represented by a
[rl:annotation] range. The
closest enclosing
range of a [rl:value] range must be a
[rl:range] or [rl:annotation] range. The
closest enclosed
ranges of the [rl:value] range can be
[rl:text], [syn:entity],
[syn:comment] or
[rl:range] ranges, though a [rl:value] range does not
necessarily enclose any ranges. A
[rl:value] range can overlap [rl:range] ranges and
other [rl:value] ranges, but it must not overlap any other ranges.
A [rl:annotation]
range is a range that represents
an annotation on a range or on an
annotation. The closest
enclosing range of a [rl:annotation] range must be a
[rl:range] or
[rl:annotation] range. The
closest enclosed
ranges of the [rl:annotation] range may include a single
[rl:value] range and may include any number of
[rl:annotation] ranges around the [rl:value] range. A
[rl:annotation] range must not overlap any other range.
The content of
the value of the
annotation represented by the
[rl:annotation] range is a
sequence of characters. These
characters are gathered by concatenating the
values of the
[characters] annotations
of the [rl:text] ranges
within the
[rl:annotation] range that are not within another
[rl:annotation] range that is
within this one.
The method of working out the
overlays of the
value of the
annotation represented by the
[rl:annotation] range is
described above. Basically it involves grouping the
[rl:range] ranges based on their
[owner-layer]
annotation's value.
Each [rl:annotation] range must have a
[name] annotation, which are
described above. The [name]
annotation on a [rl:annotation] range determines the
name of the
annotation represented by the
[rl:annotation] range.
The owner of
the annotation represented by a
[rl:annotation] range is the
range or annotation represented by the
closest enclosing
range (which will be either a [rl:range] or
[rl:annotation] range).
The annotations of the
annotation represented by a
[rl:annotation] range are the
annotations represented by its
closest enclosed
[rl:annotation] ranges, in order.
A [rl:text] range is a
range that represents a sequence of
characters in a
text layer. [rl:text]
ranges must not enclose or
overlap any other range. The
closest enclosing
range of a [rl:text] range must be a
[syn:entity],
[rl:value] or
[rl:document] range.
If a [rl:text] range is
within a [rl:annotation] range then the
characters that it represents are
part of the value of the
annotation represented by the
[rl:text] range's
closest enclosing
[rl:annotation] range. If the [rl:text] range is
not within a [rl:annotation] range then the
characters it represents are part of the
content of the
document.
A [rl:text]
range has a single [characters] annotation whose
value holds the sequence of
characters represented by the
[rl:text] range.
A [syn:entity] range is
a range that indicates where an
entity is used in a physical document. This
information is not passed through into the data
model. An [syn:entity] range may
enclose any number of
[rl:text] or other
[syn:entity] ranges but may not
overlap any range. The
closest enclosing
range of a [syn:entity] range must be a
[syn:entity], [rl:value] or
[rl:document] range.
Each [syn:entity]
range must have a [name]
annotation which holds the name of
the entity represented by the
[syn:entity] range.
Note that this name is not a
expanded name, and therefore
the [name] annotation on the [syn:entity] range does not have a
[namespace] annotation, unlike those on [lr:range] or [lr:annotation] ranges.
The ranges
within the [syn:entity] range represent the
value of the entity represented by the [syn:entity]
range.
A [syn:comment] range
is a range that indicates where an
comment is used in a physical document. This
information is not passed through into the data
model. An [syn:comment] range must not
enclose any ranges or
overlap any range. The
closest enclosing
range of a [syn:comment] range must be a
[rl:value] or
[rl:document] range.
Each [syn:comment]
range must have a [value]
annotation whose value
is the text of the comment.
The following example shows a possible reified LMNL layer for an XML representation of the example in the data model spec. The XML representation is:
<date day-of-week="Friday"> <year>2002</year>-<month name="August">08</month>-<day>23</day> </date>
There are multiple possible reified LMNL layers that could be generated from this XML document, varying in the precise values of the start and length properties of the ranges that it contains.
A reified LMNL layer for this document is:
layer l1 { base: {}, content: { r1, r2, r3, r4, r5, r6, r7, r8, r9, r10, r11, r12, r13, r14, r15, r16, r17, r18, r19 }, overlays: {} } range r1 { name: { "http://www.lmnl.org/namespace/reified", "document" }, owner layer: l1, start: 0, length: 97, annotations: {} } range r2 { name: { "http://www.lmnl.org/namespace/reified", "range" }, owner layer: l1, start: 0, length: 97, annotations: { a1, a2 } } annotation a1 { name: { "", "owner-layer" }, owner: r2, value: "layer1", annotations: { a3 } } annotation a3 { name: { "", "base" }, owner: a1, value: "#text", annotations: {} } annotation a2 { name: { "", "name" }, owner: r2, value: "date", annotations: {} } range r2 { name: { "http://www.lmnl.org/namespace/reified", "annotation" }, owner layer: l1, start: 6, length: 20, annotations: { a4 } } annotation a4 { name: { "", "name" }, owner: r2, value: "day-of-week", annotations: {} } range r3 { name: { "http://www.lmnl.org/namespace/reified", "value" }, owner layer: l1, start: 19, length: 6, annotations: {} } range r4 { name: { "http://www.lmnl.org/namespace/reified", "text" }, owner layer: l1, start: 19, length: 6, annotations: { a5 } } annotation a5 { name: { "", "characters" }, owner: r4, value: "Friday", annotations: {} } range r5 { name: { "http://www.lmnl.org/namespace/reified", "value" }, owner layer: l1, start: 27, length: 63, annotations: {} } range r6 { name: { "http://www.lmnl.org/namespace/reified", "range" }, owner layer: l1, start: 27, length: 17, annotations: { a6, a7 } } annotation a6 { name: { "", "owner-layer" }, owner: r6, value: "layer1", annotations: { a7 } } annotation a7 { name: { "", "base" }, owner: a6, value: "#text", annotations: {} } annotation a8 { name: { "", "name" }, owner: r6, value: "year", annotations: {} } range r7 { name: { "http://www.lmnl.org/namespace/reified", "value" }, owner layer: l1, start: 33, length: 4, annotations: {} } range r8 { name: { "http://www.lmnl.org/namespace/reified", "text" }, owner layer: l1, start: 33, length: 4, annotations: { a9 } } annotation a9 { name: { "", "characters" }, owner: r8, value: "2002", annotations: {} } range r9 { name: { "http://www.lmnl.org/namespace/reified", "text" }, owner layer: l1, start: 44, length: 1, annotations: { a10 } } annotation a10 { name: { "", "characters" }, owner: r9, value: "-", annotations: {} } range r10 { name: { "http://www.lmnl.org/namespace/reified", "range" }, owner layer: l1, start: 45, length: 31, annotations: { a11, a12 } } annotation a11 { name: { "", "owner-layer" }, owner: r10, value: "layer1", annotations: { a13 } } annotation a13 { name: { "", "base" }, owner: a11, value: "#text", annotations: {} } nnotation a12 { name: { "", "name" }, owner: r10, value: "month", annotations: {} } range r11 { name: { "http://www.lmnl.org/namespace/reified", "annotation" }, owner layer: l1, start: 52, length: 13, annotations: { a14 } } annotation a14 { name: { "", "name" }, owner: r11, value: "name", annotations: {} } range r12 { name: { "http://www.lmnl.org/namespace/reified", "value" }, owner layer: l1, start: 58, length: 6, annotations: {} } range r13 { name: { "http://www.lmnl.org/namespace/reified", "text" }, owner layer: l1, start: 58, length: 6, annotations: { a15 } } annotation a15 { name: { "", "characters" }, owner: r13, value: "August", annotations: {} } range r14 { name: { "http://www.lmnl.org/namespace/reified", "value" }, owner layer: l1, start: 66, length: 2, annotations: {} } range r15 { name: { "http://www.lmnl.org/namespace/reified", "text" }, owner layer: l1, start: 66, length: 2, annotations: { a16 } } annotation a16 { name: { "", "characters" }, owner: r15, value: "08", annotations: {} } range r16 { name: { "http://www.lmnl.org/namespace/reified", "text" }, owner layer: l1, start: 76, length: 1, annotations: { a17 } } annotation a17 { name: { "", "characters" }, owner: r16, value: "-", annotations: {} } range r17 { name: { "http://www.lmnl.org/namespace/reified", "range" }, owner layer: l1, start: 77, length: 13, annotations: { a18, a19 } } annotation a18 { name: { "", "owner-layer" }, owner: r17, value: "layer1", annotations: { a20 } } annotation a20 { name: { "", "owner-layer" }, owner: a18, value: "#text", annotations: {} } annotation a19 { name: { "", "name" }, owner: r17, value: "day", annotations: {} } ange r18 { name: { "http://www.lmnl.org/namespace/reified", "value" }, owner layer: l1, start: 82, length: 2, annotations: {} } range r19 { name: { "http://www.lmnl.org/namespace/reified", "text" }, owner layer: l1, start: 82, length: 2, annotations: { a21 } } annotation a21 { name: { "", "characters" }, owner: r19, value: "23", annotations: {} }
| © 2002 by the authors and LMNL.org All rights reserved |
![]() |