This tutorial introduces you to LMNL, the Layered Markup and Annotation Language. We've littered this tutorial with references to the data model, syntax and reified LMNL documents, so that when you're ready, you can move on to those more formal documents, but this should act as your introduction to LMNL.
The brief overview is that LMNL documents contain character data which is marked up using named and occasionally overlapping ranges. Ranges can have annotations, which can themselves be annotated and can have structured content. To support authoring, especially collaborative authoring, markup is namespaced and divided into layers, which might reflect different views on the text.
Last modified 11 Oct 2002 by Jeni Tennison.
It's traditional to start a tutorial with a “Hello World!” example. So to kick us off, let's look at the “Hello World!” LMNL document:
Hello World!
Yep, that's it. Not particularly hard, is it? In fact any text document is a legal LMNL document as long as it follows two rules:
[,
{ or &)
UTF-8 or UTF-16 The next couple of sections show you what to do if you want to
break either of these rules — how to escape the markup-significant characters
using the predefined entities, and how to use an encoding other than
UTF-8 or UTF-16 in your document.
So what happens if you want to include a [,
{ or & in your document? What if you wanted to
have the document say “Hello World, & Welcome!”? These
characters are special in LMNL because they indicate markup. If
you want to use them as just characters within some text, you have to
escape them so that a parser reading the document doesn't
misinterpret them as markup.
You can escape the [, { and
& characters using the predefined entities:
[, { and &. So
the “Hello World, & Welcome!” document would look like:
Hello World, & Welcome!
There are seven predefined entities in LMNL, but
[, { and & are
the only ones that you have to use in the normal run of things
(it doesn't hurt if you use the others, but there's no need to). The predefined
entities are:
& (&)
[ ([)
] (])
{ ({)
} (})
' (')
" (")
We'll see later that you can also define your own entities if you want to, to give shorthands for characters that are hard to type.
While we're looking at characters, we'll just quickly mention
that you can use character
references in LMNL as well, if you need to insert a Unicode
character that isn't supported by the encoding that you're using, or if it's
simply easier to type that way! Character references are the same as in XML.
For example, you can use Щ or Щ
to include a “CYRILLIC CAPITAL LETTER SHCHA” (Щ) in your text.
If you want to use an encoding other than UTF-8 or
UTF-16, or if you simply want to state for the record that the
document is a LMNL document, then you should include a
LMNL declaration right
at the top of the file, before anything else (even spaces). A basic LMNL
declaration looks like:
[!lmnl version="0.2"]
By the way, that's a legal LMNL document as well; a LMNL document doesn't have to have any content.
The version declaration specifies the version of LMNL being used in the document. We're on version 0.2 at the moment because we haven't finished drafting LMNL yet.
If you want to specify the encoding of the document (the way in
which the characters in the document are written, as bytes, on disk) then you
can use an encoding declaration. For example, to say that our “Hello
World!” document has been saved using ISO-8859-1 (the
encoding that's used by many Western text editors) you can use:
[!lmnl version="0.1" encoding="ISO-8859-1"] Hello World!
Note that you must specify the version of LMNL that you're using if you include a LMNL declaration; you can't just specify the encoding of the LMNL document without specifying the version.
Having a text document be recognised as a LMNL document is all very well, but the point of a markup language is that you mark up things — you indicate the structure of a document by embedding tags, markers in the normal text of the document.
To illustrate mark up, we're going to need a document that's a bit more sophisticated than the simple “Hello World!” example, so we'll use this extract from http://www.zeta.org.au/~annskea/Trickstr.htm:
Trickster has never been restricted to one society. In European countries he
appears in the guise of Jester or Fool, and his roots in the human psyche are
deep. Alan Garner has collected Trickster stories from many countries in his
book The Guizer and he writes:
If we take the elements from which our emotions are built
and give them separate names such as Mother, Hero, Father,
King, Child, Queen, the element that I think marks most of
us is that of the Fool. It is where our humanity lies. For
the Fool is the advocate of uncertainty: he is at once
creator and destroyer, bringer of help and harm. He draws
a boundary for chaos, so that we can make sense of the
rest. He is the shadow that shapes the light. Psychology
calls him Trickster. I have called him Guizer.
Guizer is the proper word for an actor in a mumming play.
He is comical, grotesque, stupid, cunning, ambiguous. He
is sometimes part animal, and always part something else.
The something else is what is so special. He is the
dawning godhead in Man. Using LMNL, we can mark up any contiguous sequence of characters
within this text as a range.
Ranges within text are indicated by tags which mark the start and end of the
range. For example, we can mark up the extract above as a
[paragraph] range and a [extract] range:
[paragraph}
Trickster has never been restricted to one society. In European countries he
appears in the guise of Jester or Fool, and his roots in the human psyche are
deep. Alan Garner has collected Trickster stories from many countries in his
book The Guizer and he writes:
{paragraph]
[extract}
If we take the elements from which our emotions are built and give them
separate names such as Mother, Hero, Father, King, Child, Queen, the element
that I think marks most of us is that of the Fool. It is where our humanity
lies. For the Fool is the advocate of uncertainty: he is at once creator and
destroyer, bringer of help and harm. He draws a boundary for chaos, so that we
can make sense of the rest. He is the shadow that shapes the light. Psychology
calls him Trickster. I have called him Guizer. Guizer is the proper word for an
actor in a mumming play. He is comical, grotesque, stupid, cunning, ambiguous.
He is sometimes part animal, and always part something else. The something else
is what is so special. He is the dawning godhead in Man.
{extract] Usually, the start tag looks like [name} while the
end tag looks like {name]. Within a document, every start tag must
have a matching end tag and vice versa. You can also have empty tags that mark
points within the text. Empty tags look like [name]. And you can
have tags without names that indicate anonymous ranges. Empty
tags and anonymous ranges really come into their own when you start adding
annotations (as we'll see later).
If you're familiar with SGML and XML you're probably thinking
“OK, but this is exactly what SGML and XML does. What's so
different?”. What's different in LMNL is that tags indicate ranges
rather than elements, and, unlike elements, ranges can overlap each other. For
example, if I wanted to mark up the section of the text that refers to and
quotes from the book “The Guizer”, I could do so despite the
fact that this range runs across the [paragraph] and
[extract]:
[paragraph}
Trickster has never been restricted to one society. In European countries he
appears in the guise of Jester or Fool, and his roots in the human psyche are
deep. [reference}Alan Garner has collected Trickster stories from many
countries in his book The Guizer and he writes:
{paragraph]
[extract}
If we take the elements from which our emotions are built and give them
separate names such as Mother, Hero, Father, King, Child, Queen, the element
that I think marks most of us is that of the Fool. It is where our humanity
lies. For the Fool is the advocate of uncertainty: he is at once creator and
destroyer, bringer of help and harm. He draws a boundary for chaos, so that we
can make sense of the rest. He is the shadow that shapes the light. Psychology
calls him Trickster. I have called him Guizer. Guizer is the proper word for an
actor in a mumming play. He is comical, grotesque, stupid, cunning, ambiguous.
He is sometimes part animal, and always part something else. The something else
is what is so special. He is the dawning godhead in Man.{reference]
{extract]Enabling ranges to overlap is incredibly useful. It's often very hard to squeeze a document's structure into a neat tree, for example if you're including comments, marking up insertions and deletions or marking up text that has multiple structures such as the Bible (chapters and verses vs. sections and paragraphs). This isn't to say that tree structures are useless — of course they're incredibly useful, not least because they're easy to process — but they don't meet everyone's requirements.
If you're particularly alert, you may have noticed that there's a potential problem with allowing overlapping ranges when two ranges with the same name overlap each other. For example:
[keyphrase}overlapping [keyphrase}ranges{keyphrase] with
identifiers{keyphrase] Given this piece of LMNL, we can see that there are two
[keyphrase]s, but it's not clear whether they're supposed to be:
To overcome this problem, you can assign an
identifier to a start tag, in
which case it can only match an end tag with the same identifier. For example,
to markup the two [keyphrase]s “overlapping ranges”
and “ranges with identifiers”, you could use:
[keyphrase=key1}overlapping [keyphrase}ranges{keyphrase=key1] with
identifiers{keyphrase]The values of the identifiers aren't important; they're not carried through into the data model, so applications can't use them for linking, for example. They also don't have to be unique within the document.
Because an identifier explicitly says which start tag an end tag matches, it's not actually necessary to have the name of the range in the end tag. You can use the shorthand:
[keyphrase=key1}overlapping [keyphrase}ranges{=key1] with
identifiers{keyphrase]to give the same two ranges as above.
If you don't use an identifier, or if the same identifier is used
twice in on ranges with the same name in the same scope then an application
will interpret the document as if the ranges were nested inside each other. For
example, both the following documents give the same pair of
[keyphrase] ranges — “overlapping ranges with
identifiers” and “ranges”:
[keyphrase}overlapping [keyphrase}ranges{keyphrase] with
identifiers{keyphrase]
[keyphrase=key1}overlapping [keyphrase=key1}ranges{=key1] with
identifiers{keyphrase=key1] As you've seen, LMNL enables you to label ranges of text within a
document — give them a name. LMNL also allows you to annotate them
— add meta-information to a range. For example, we can label our document as an
extract, and add a [href] annotation that points to the page where
we got it:
[extract [href}http://www.zeta.org.au/~annskea/Trickstr.htm{href]}
[paragraph}
Trickster has never been restricted to one society. In European countries he
appears in the guise of Jester or Fool, and his roots in the human psyche are
deep. [reference}Alan Garner has collected Trickster stories from many
countries in his book The Guizer and he writes:
{paragraph]
[extract}
If we take the elements from which our emotions are built and give them
separate names such as Mother, Hero, Father, King, Child, Queen, the element
that I think marks most of us is that of the Fool. It is where our humanity
lies. For the Fool is the advocate of uncertainty: he is at once creator and
destroyer, bringer of help and harm. He draws a boundary for chaos, so that we
can make sense of the rest. He is the shadow that shapes the light. Psychology
calls him Trickster. I have called him Guizer. Guizer is the proper word for an
actor in a mumming play. He is comical, grotesque, stupid, cunning, ambiguous.
He is sometimes part animal, and always part something else. The something else
is what is so special. He is the dawning godhead in Man.{reference]
{extract]
{extract] As you can see, an annotation can go within a
start tag, and it's delimited with start and end tags. Annotations look a lot
like ranges, but they have the important feature that, unlike ranges, they
can't overlap. Because it's guaranteed that annotations don't overlap, you can
actually use a shorthand for the end tag if you want, of {]. For
example:
[extract [href}http://www.zeta.org.au/~annskea/Trickstr.htm{]}
[paragraph}
Trickster has never been restricted to one society. In European countries he
appears in the guise of Jester or Fool, and his roots in the human psyche are
deep. [reference}Alan Garner has collected Trickster stories from many
countries in his book The Guizer and he writes:
{paragraph]
[extract}
If we take the elements from which our emotions are built and give them
separate names such as Mother, Hero, Father, King, Child, Queen, the element
that I think marks most of us is that of the Fool. It is where our humanity
lies. For the Fool is the advocate of uncertainty: he is at once creator and
destroyer, bringer of help and harm. He draws a boundary for chaos, so that we
can make sense of the rest. He is the shadow that shapes the light. Psychology
calls him Trickster. I have called him Guizer. Guizer is the proper word for an
actor in a mumming play. He is comical, grotesque, stupid, cunning, ambiguous.
He is sometimes part animal, and always part something else. The something else
is what is so special. He is the dawning godhead in Man.{reference]
{extract]
{extract]Whether or not you use the shorthand for annotation end tags is up to you; it can make it easier to type annotations, especially when they have simple values, but if you have complex annotations it can make it harder to keep track of where you are.
Annotations don't have to go in the start tag of a range; you can
also put them in an end tag if that's more appropriate. For example, it
sometimes feels more natural to put a citation or a comment
after the thing that you're citing or commenting on. In this
next example, the [reference] range has a [cite]
annotation in its end tag:
[extract [href}http://www.zeta.org.au/~annskea/Trickstr.htm{]}
[paragraph}
Trickster has never been restricted to one society. In European countries he
appears in the guise of Jester or Fool, and his roots in the human psyche are
deep. [reference}Alan Garner has collected Trickster stories from many
countries in his book The Guizer and he writes:
{paragraph]
[extract}
If we take the elements from which our emotions are built and give them
separate names such as Mother, Hero, Father, King, Child, Queen, the element
that I think marks most of us is that of the Fool. It is where our humanity
lies. For the Fool is the advocate of uncertainty: he is at once creator and
destroyer, bringer of help and harm. He draws a boundary for chaos, so that we
can make sense of the rest. He is the shadow that shapes the light. Psychology
calls him Trickster. I have called him Guizer. Guizer is the proper word for an
actor in a mumming play. He is comical, grotesque, stupid, cunning, ambiguous.
He is sometimes part animal, and always part something else. The something else
is what is so special. He is the dawning godhead in Man.
{reference [cite}Garner, A., The Guizer: A Book of Fools, London, Hamish Hamilton, 1975, p.9.{]]
{extract]
{extract] A tag can contain as many annotations as you like, and they don't
necessarily have to have different names (unlike attributes in SGML or XML).
The following example adds a [book] range with an
[ISBN] and a number of [buy] annotations that list
URLs from which you can buy the book:
[extract [href}http://www.zeta.org.au/~annskea/Trickstr.htm{]}
[paragraph}
Trickster has never been restricted to one society. In European countries he
appears in the guise of Jester or Fool, and his roots in the human psyche are
deep. [reference}Alan Garner has collected Trickster stories from many
countries in his book
[book [ISBN}0241892228{]
[buy}http://www.allbookstores.com/book/compare/0241892228{]
[buy}http://www.abebooks.com/{]
[buy}http://www.bookfinder.com/{]}The Guizer{book] and he writes:
{paragraph]
[extract}
If we take the elements from which our emotions are built and give them
separate names such as Mother, Hero, Father, King, Child, Queen, the element
that I think marks most of us is that of the Fool. It is where our humanity
lies. For the Fool is the advocate of uncertainty: he is at once creator and
destroyer, bringer of help and harm. He draws a boundary for chaos, so that we
can make sense of the rest. He is the shadow that shapes the light. Psychology
calls him Trickster. I have called him Guizer. Guizer is the proper word for an
actor in a mumming play. He is comical, grotesque, stupid, cunning, ambiguous.
He is sometimes part animal, and always part something else. The something else
is what is so special. He is the dawning godhead in Man.
{reference [cite}Garner, A., The Guizer: A Book of Fools, London, Hamish Hamilton, 1975, p.9.{]]
{extract]
{extract]Unlike attributes in elements, the order of annotations is preserved. Note that that doesn't necessarily mean the order matters — whether the annotations have to appear in a particular order or not depends on the markup language.
Another major difference between attributes in SGML and XML and annotations in LMNL is that annotations can themselves have structure. You can put ranges inside annotations; you can put annotations on annotations (and annotations on annotations on ranges inside annotations and so on). There is no limit. This means that the only decision you have to make when deciding whether to use an annotation or a range to hold a piece of information is whether it is “content” or “metadata”.
In this next example, the [href] annotation has
itself been annotated with a [title] annotation, giving the title
of the referenced page and the value of the [cite] annotation has
been marked up with ranges that indicate the different parts of the citation:
[extract
[href
[title}Ted Hughes and Crow{]
}http://www.zeta.org.au/~annskea/Trickstr.htm{]}
[paragraph}
Trickster has never been restricted to one society. In European countries he
appears in the guise of Jester or Fool, and his roots in the human psyche are
deep. [reference}Alan Garner has collected Trickster stories from many
countries in his book
[book [ISBN}0241892228{]
[buy}http://www.allbookstores.com/book/compare/0241892228{]
[buy}http://www.abebooks.com/{]
[buy}http://www.bookfinder.com/{]}The Guizer{book] and he writes:
{paragraph]
[extract}
If we take the elements from which our emotions are built and give them
separate names such as Mother, Hero, Father, King, Child, Queen, the element
that I think marks most of us is that of the Fool. It is where our humanity
lies. For the Fool is the advocate of uncertainty: he is at once creator and
destroyer, bringer of help and harm. He draws a boundary for chaos, so that we
can make sense of the rest. He is the shadow that shapes the light. Psychology
calls him Trickster. I have called him Guizer. Guizer is the proper word for an
actor in a mumming play. He is comical, grotesque, stupid, cunning, ambiguous.
He is sometimes part animal, and always part something else. The something else
is what is so special. He is the dawning godhead in Man.
{reference
[cite}[author}Garner, A.{author], [title}The Guizer: A Book of Fools{title],
London, [publisher}Hamish Hamilton{publisher], [year}1975{year],
p.[page}9{page].{cite]]
{extract]
{extract]Really, the above is all you need to know in order to use LMNL quite happily, so it might be a good idea to stop now and try to use what you've learned. The rest of the tutorial discusses some of the more esoteric aspects of LMNL.
Like any good language, LMNL has a syntax for including comments within the text. Comments don't get passed on to applications, so they're useful for commenting out bits of LMNL that you want to ignore. A comment looks like:
[!-- This is a comment --]
Comments can go pretty much anywhere within a document, including within start tags and end tags.
We introduced you to the predefined entities at the beginning of
this tutorial, and hinted that you could create your own as well. Well, guess
what, you can! You can declare entities with an
entity declaration
anywhere you like in your LMNL document, including within start and end tags,
as long as it's before the first use of that entity. For example, to declare a
entity in order to easily insert non-breaking spaces in
your document, you could use:
[!lmnl version="0.1"] [!entity nbsp=" "] Hello World!
Entities in LMNL are quite restricted compared to entities in XML, however. They're roughly the same as internal entities, but (and this is important) they can't contain markup. The point of entities in LMNL is to provide names for characters to save you from having to remember their Unicode code point or the precise sequence of keys you have to type to get them, not as a general include mechanism. We'll eventually be layering stuff on top of LMNL to provide inclusions...
Anyway, back to entities. It would be a real pain if you had to
declare every single entity that you wanted to use in your document, so there's
a quick and easy way to borrow entities from other documents: an
entities
declaration. For example, if html.lmnl contained a
bunch of entity declarations (including one for ), I
could reuse them in my document by importing them as follows:
[!lmnl version="0.1"] [!entities href="html.lmnl"] Hello World!
The html.lmnl document might look like:
[!lmnl version="0.1"]
This document contains declarations of the following entities for
you to use in your own documents:
[!entity nbsp = " "]
[entity [name}nbsp{] [char} {] }non-breaking space character{entity]
[!entity iexcl = "¡"]
[entity [name}iexcl{] [char}¡{]}inverted exclamation mark{entity]
[!entity cent = "¢"]
[entity [name}cent{] [char}¢{] }cent sign{entity]
[!entity pound = "£"]
[entity [name}pound{] [char}£{]}pound sign{entity]
... html.lmnl is a document of its own right, with its
own content. When you use an entities declaration, you pull in the entity
declarations from that document, regardless of the content. Documents (such as
html.lmnl) that exist purely to define bunches of entities can
have content that describes the entities they declare (as in the example above)
or can have no content at all if they want.
You can't override entities by redeclaring them with a different value after they've already been declared. It's probably a bad idea to use an entity for any text that you might want to override anyway, so this shouldn't be too much of a restriction.
If you're used to XML, you're probably wondering whether LMNL uses namespaces. The answer is that it does. Namespaces are built in to LMNL; when we talk about the name of a range or an annotation, we're really referring to its expanded name — a pair of a namespace and a local name — and the names that we use in tags are actually qualified names that are resolved into these expanded names.
Namespace declarations in LMNL are quite different in effect from those in XML, however.
First, namespace declarations can appear anywhere within a LMNL document, including within start tags and inside annotations, and they have a scope that extends from that point on in the document. Once you've associated a prefix with a namespace, that's it — you can't change the prefix for the namespace, nor the namespace for the prefix. This guarantees that all LMNL documents are “sane” (see http://www.flightlab.com/~joe/sgml/sanity.txt for the definition of namespace sanity).
Second, there's isn't such a thing as the “default namespace” in LMNL — if a name has a prefix then it's in a namespace, if it hasn't then it's not, and this applies to both ranges and annotations.
Third, prefixes are only significant within LMNL syntax — LMNL applications usually don't have access to what prefix was used on a particular range. Importantly, this means that if you want to include qualified names in content then you have to use another method (for example an annotation) to associate prefixes to namespaces.
So here's our example again. This time the
[reference] range, and the ranges held within its
[cite] annotation are in the namespace
http://www.example.com/bibliographic (associated with the prefix
bib) and the other ranges are in the namespace
http://www.example.com/paper (associated with the prefix
p). The annotations are all in no namespace aside from the
[ISBN] annotation, which is in the bibliographic namespace:
[!lmnl version="0.1"]
[!ns bib="http://www.example.com/bibliographic"]
[!ns p="http://www.example.com/paper"]
[p:extract
[href
[title}Ted Hughes and Crow{]
}http://www.zeta.org.au/~annskea/Trickstr.htm{]}
[p:paragraph}
Trickster has never been restricted to one society. In European countries he
appears in the guise of Jester or Fool, and his roots in the human psyche are
deep. [bib:reference}Alan Garner has collected Trickster stories from many
countries in his book
[p:book [bib:ISBN}0241892228{]
[buy}http://www.allbookstores.com/book/compare/0241892228{]
[buy}http://www.abebooks.com/{]
[buy}http://www.bookfinder.com/{]}The Guizer{p:book] and he writes:
{p:paragraph]
[p:extract}
If we take the elements from which our emotions are built and give them
separate names such as Mother, Hero, Father, King, Child, Queen, the element
that I think marks most of us is that of the Fool. It is where our humanity
lies. For the Fool is the advocate of uncertainty: he is at once creator and
destroyer, bringer of help and harm. He draws a boundary for chaos, so that we
can make sense of the rest. He is the shadow that shapes the light. Psychology
calls him Trickster. I have called him Guizer. Guizer is the proper word for an
actor in a mumming play. He is comical, grotesque, stupid, cunning, ambiguous.
He is sometimes part animal, and always part something else. The something else
is what is so special. He is the dawning godhead in Man.
{bib:reference
[cite}[bib:author}Garner, A.{bib:author], [bib:title}The Guizer: A Book of Fools{bib:title],
London, [bib:publisher}Hamish Hamilton{bib:publisher], [bib:year}1975{bib:year],
p.[bib:page}9{bib:page].{cite]]
{p:extract]
{p:extract]And here's the same document again, this time with the namespace declaration for the bibliographic namespace in a different place:
[!lmnl version="0.1"]
[!ns p="http://www.example.com/paper"]
[p:extract
[href
[title}Ted Hughes and Crow{]
}http://www.zeta.org.au/~annskea/Trickstr.htm{]}
[p:paragraph}
Trickster has never been restricted to one society. In European countries he
appears in the guise of Jester or Fool, and his roots in the human psyche are
deep.
[!ns bib="http://www.example.com/bibliographic"]
[bib:reference}Alan Garner has collected Trickster stories from many
countries in his book
[p:book [bib:ISBN}0241892228{]
[buy}http://www.allbookstores.com/book/compare/0241892228{]
[buy}http://www.abebooks.com/{]
[buy}http://www.bookfinder.com/{]}The Guizer{p:book] and he writes:
{p:paragraph]
[p:extract}
If we take the elements from which our emotions are built and give them
separate names such as Mother, Hero, Father, King, Child, Queen, the element
that I think marks most of us is that of the Fool. It is where our humanity
lies. For the Fool is the advocate of uncertainty: he is at once creator and
destroyer, bringer of help and harm. He draws a boundary for chaos, so that we
can make sense of the rest. He is the shadow that shapes the light. Psychology
calls him Trickster. I have called him Guizer. Guizer is the proper word for an
actor in a mumming play. He is comical, grotesque, stupid, cunning, ambiguous.
He is sometimes part animal, and always part something else. The something else
is what is so special. He is the dawning godhead in Man.
{bib:reference
[cite}[bib:author}Garner, A.{bib:author], [bib:title}The Guizer: A Book of Fools{bib:title],
London, [bib:publisher}Hamish Hamilton{bib:publisher], [bib:year}1975{bib:year],
p.[bib:page}9{bib:page].{cite]]
{p:extract]
{p:extract]Since namespace declarations are always scoped to “the whole of the rest of the document”, no matter where they appear, it's good practice to put them all at the top of the document. They're allowed to appear anywhere so that it's easy for streaming applications to serialise LMNL documents.
So far the documents that we've looked at have been what are known as flat documents. They consist purely of text and a single set of ranges over that text. In LMNL terms, they have two layers: a text layer and a layer containing ranges that range over the characters in the text layer. Diagrammatically, it might look like:

Flat LMNL
There are two kinds of extensions that we can make to flat LMNL.
First, we can add other layers that contain ranges that range over the text layer. This is especially useful because the different layers can provide different views of the same document without worrying out the ranges they contain interacting with each other. For example, another person could add their own markup to label the year, month and day differently:

LMNL with Two Layers over the
Text
Second, we can add other layers that contain ranges that range
over the ranges in another layer. This is mainly used by
applications that derive structure based on the presence of ranges in a layer.
For example, an application might detect the fact that [year],
[month] and [day] ranges occur in sequence, and from
that deduce a [date] range:

LMNL with Ranges over
Ranges
In the rest of this section we'll see how to represent these constructions using LMNL syntax.
If you want to create anything other than a flat document in
LMNL, you have to declare the layers that the document holds using
layer declarations. A layer
declaration specifies a name for the layer and indicates the base that the
ranges held within the layer range over. For example, to declare a layer called
type that contains ranges that range over the ranges in the layer
called lexical, you can use:
[!layer name="type" base="lexical"]
The base attribute can take two special values:
#text and #default. The value #text
means that the layer ranges over the text within the document. For example, in
our document we can say that the lexical layer contains ranges
that range over the text in the document, and the type layer
contains ranges that range over the lexical layer, we can use:
[!lmnl version="0.1"] [!layer name="lexical" base="#text"] [!layer name="type" base="lexical"] 2002-09-12
The special #default value in the base attribute
means that the layer ranges over the default layer — a layer that
contains all the ranges that aren't explicitly associated with another layer.
The default layer is a convenience because it means you can add layers to a
flat document without having to change the tags that it contains.
Like namespace declarations, layer declarations can go practically anywhere in the document (including within tags) and have a scope that spans from that point on. The only thing that actually limits where you put them is that they have to come before the first range that belongs to that layer and before any layer declarations that refer to it as a base. Like namespace declarations, it's a good idea to put all the layer declarations at the start of the document, but they're allowed anywhere because that makes LMNL documents easier to stream.
How do you know which ranges belong to which layers? Well, you
have to associate ranges with the layers that they belong to using a
layer identifier. Here's an
example where the default layer contains the ranges [year],
[month] and [day], the type layer
contains the [date] range which ranges over these three ranges,
and the fr layer contains [an], [mois]
and [jour] ranges:
[!lmnl version="0.1"]
[!layer name="fr" base="#text"]
[!layer name="type" base="#default"]
[date~type
}[an~fr}[year}2002{year]{an~fr
]-[mois~fr}[month}09{month]{mois~fr
]-[jour~fr}[day}12{day]{jour~fr]{date~type] In case you're wondering, within a start or end tag the range
identifier comes before the layer identifier, so you might have
[keyphrase=key1~jt}...{=key1~jt].
Now you've got your head around layers, it's time to look more closely at the details of how physical LMNL documents get processed. We've described the syntax of LMNL here in terms of how it gets interpreted as a basic LMNL data model. The trouble is that this way of interpreting LMNL documents leads to some unintuitive results. Take the following example:
[graphic [src}crow.gif{]}A crow{graphic] We can interpret this piece of LMNL as a [graphic]
range that ranges over the text “A crow”. Diagrammatically, it
would look like:
![LMNL of a [graphic]
Range](crow-example.jpg)
LMNL of a [graphic]
Range
Now let's say we add a [link] range to the same
layer:
[link}[graphic [src}crow.gif{]}A crow{graphic]{link [href}crow.xml{]] The layer now contains two ranges — a [link] range
and a [graphic] range, both of which span the same set of
characters:

LMNL with Two Ranges over the
Same Span
But here's where our problems start, because this same picture of two ranges spanning over the same set of characters could be generated from four different LMNL documents:
[link}[graphic [src}crow.gif{]}A crow{graphic]{link [href}crow.xml{]]
[link}[graphic [src}crow.gif{]}A crow{link [href}crow.xml{]]{graphic]
[graphic [src}crow.gif{]}[link}A crow{graphic]{link [href}crow.xml{]]
[graphic [src}crow.gif{]}[link}A crow{link [href}crow.xml{]]{graphic] An author, though, is likely to mean something different by each
of these markup possibilities. In the first case, they might mean that the
graphic crow.gif (or the text “A crow” if the
graphic can't be shown) should be linked to crow.xml. In the last
case, they might mean that should the graphic not be displayable, an
application should show the text “A crow” with a link to
crow.xml around it.
So, what to do? There is an argument that says that if the author
intended the [link] to be over the [graphic] rather
than the text “A crow” then the [link] and the
[graphic] should be on different layers:
![LMNL with [link]
Ranging over [graphic]](two-layer-crow-example.jpg)
LMNL with [link]
Ranging over [graphic]
But layering and containment are two different things — just because a range contains another range doesn't mean that it should be on a different layer from that range — so this isn't a valid solution.
Instead, we have reified LMNL. In reified XML, the text layer in the document is the LMNL syntax itself:

LMNL Document used as Text
Layer
An application then builds layers on top of this text layer, pulling out the important features of the LMNL syntax. For example, it could build a syntax layer as in the following:

Syntactic Ranges over LMNL
Document Text Layer
Eventually, the application comes to what's known as the
reified LMNL
layer. The reified LMNL layer contains a set of ranges in the
reified LMNL namespace of http://www.lmnl.org/namespace/reified:
[rl:document], [rl:range],
[rl:annotation], [rl:value] and
[rl:text]. The following diagram shows the reified LMNL layer for
the document we're looking at:

Reified LMNL
Layer
There's nothing to say that the processor has to go through the intermediate syntax layer before getting to the reified LMNL layer. There can be as many or as few layers between the LMNL text and the reified LMNL layer as is useful for an application.
Don't worry if you can't make out all the lines. In this example, the reified LMNL layer forms a nice tree structure. If we were to write it out the reified LMNL layer in LMNL, it would look something like:
[!lmnl version="0.1"]
[!ns lmnl="http://www.lmnl.org/namespace"]
[rl:document
}[rl:range [name}link{]
}[rl:value
}[rl:range [name}graphic{]
}[rl:annotation [name}src{]
}[rl:value
}[rl:text [characters}crow.gif{]
]{rl:value
]{rl:annotation
][rl:value
}[rl:text [characters}A crow{]
]{rl:value
]{rl:range
]{rl:value
][rl:annotation [name}href{]
}[rl:value
}[rl:text [characters}crow.xml{]
]{rl:value
]{rl:annotation
]{rl:range
]{rl:document] The important thing about this layer is that it contains all the
relevant information that you would normally get from the LMNL document — the
character data, the range and annotation markup and so on —
and maintains the relationships between the ranges. In this
example, if it needs to, an application can tell that the
[graphic] range is inside the [link]
range (rather than the other way round) because the reified
[rl:range] that represents the [graphic] range is
inside the reified [rl:range] that represents the
[link] range.
The reified LMNL layer also retains other occasionally useful information such as the prefix you've used for a particular namespace or the name you've used for a particular layer. These things are meaningless at the “pure” level of the LMNL data model, but people retain an attachment to meaningful prefixes and names, so they're not just thrown away.
You don't really need to know any of the details of the reified LMNL layer unless you're implementing a LMNL processor; if you're doing that, you'll have to dive into the real spec.
| © 2002 by the authors and LMNL.org All rights reserved |
![]() |