Longform Markup Language

A markup and templating language for producing HTML and XML document fragments.

Language Intro

Longform is an easy to read markup and templating language that outputs to HTML and XML. A Longform document can be parsed to a complete document in the output format or as fragments to be used by an application as a source of markup when generating a document, or manipulating DOM in a browser environment.

Example

header::
  hgroup::
    h1:: Longform Markup Language
    p::
      A markup and templating language for producing
      <b class=keyword>HTML</b> and <b class=keyword>
      XML</b> document fragments.

Result

<header>
  <hgroup>
    <h1>Longform Markup Language</h1>
    <p>
      A markup and templating language
      for producing <b class="keyword">
      HTML</b> and <b class="keyword">XML
      </b> document fragments.
    </p>
  </hgroup>
</header>

Unlike Markdown, which excels in marking up article content but cannot express more recent additions to the HTML spec without falling back to HTML, the Longform syntax adds no constraints on the possible markup of the output format. As a result Longform plays well with custom HTML elements and elements and attributes which might be added to HTML in the future. And in being able to express elements such as <nav> or <head> it can be used effectively for static content markup for regions of the website beyond the main content.

Longform also supports directives which alter how a block of Longform or plain HTML / XML is processed. This specification will formalize directives which may be used in a browser environment using a minimal Longform parser.

For example, in a Longform template the directives @allow-elements and @allow-attributes can be used to allow user defined markup to be used in a template while filtering out any exempt markup. User input markup in the Longform syntax will have filtering rules applied by the Longform parser, while embedded HTML or XML content will require that an external serializer such as DOMPurify is configured to hook into the parser and apply the rules to the user defined content.

@template
##card
section.card.note-#{position}::
  header::
    h3:: #{title}
  @allow-attributes:: lang dir
  @allow-elements:: details[open name] summary h4 p strong em a[href target]
  div.card-content::
    ##{content}

Finally, the work in progress Longform parser is small and fast. Currently at 2.6kb when minified and gzipped and supporting most intended features for the browser environment. The parser is likely to grow but is unlikely to reach near the size of Commonmark at 47.6kb or Marked at 12kb. The parser is also fast, as it can complete its job by building up the resulting HTML fragment strings line by line instead of with a two step process of constructing and abstract syntax tree and then forming the valid output markup.

Fragments

The root fragment

The root fragment is an optional fragment that has no Longform identifier. There can be only one root fragment in a Longform document.

The root element of the root fragment has no whitespace preceding it. The fragment has no Longform identifier but the root element can have a HTML identifier defined on it.

After a root fragment is found in the process of parsing a Longform document, all other fragments which do not have Longform identifiers assigned are ignored when rendering the output markup. The root fragment often would have the @doctype or @xml directive prefixing it to add a HTML doctype or XML declaration prefixing the output markup.

Example

@doctype:: html
html::
  head::
    title:: Example Root Fragment
  body::
    h1:: Example Root Fragment

Result

<!doctype html>
<html>
  <head>
    <title>Example Root Fragment</title>
  </head>
  <body>
    <h1>Example Root Fragment</h1>
  </body>
</html>

Embedded fragment identifiers

Fragments can have their Longform identifier embedded into the output markup as a HTML or XML id. Embedded ids should be unique to both the target document and Longform document. If a Longform fragment references a fragment with an embedded identifier, any other attempts to reference the embedded fragment will be ignored by the Longform parser. Fragments with embedded identifiers will also not be exported to be directly referenced by external software if they have been referenced in the Longform document.

#embedded-id
section::
  p:: A fragment that can be referenced by its <a href=#embedded-id>identifier</a>
Result
<section id="embedded-id">
  <p>
    A fragment that can be referenced by its <a href="#embedded-id">identifier</a>
  </p>
</section>

Bare fragments

Bare fragments have ids that can be referenced many times within the same Longform document and are always outputted to be used by external software. The given id is written to the output markup as a data attribute instead of an id.

##alert-something-went-wrong
dialog[open].error::
  p::
    strong:: Something went wrong!
  form[method=dialog]::
    button:: Close
Result
<dialog data-lf="alert-something-went-wrong" open class="error">
  <strong>Something went wrong!</strong>
  <form method="dialog">
    <button>Close</botton>
  </form>
</dialog>

Range fragments

Range fragments can have more than one topmost element which are outputted as siblings. All top-most elements of the range fragment would have the same data attribute.

#head-details [
  title:: The range fragment
  meta::
    [name=description]
    [content=Demonstrating the range fragment]
]
Result
<title data-lf="head-details">A range of fragments</title>
<meta data-lf="head-details" name="description" content="Demonstrating the range fragment" />

Text fragments

Text fragments do not include any elements. Programs using Longform output can use text fragments in locations where elements are not allowed such as HTML attributes. Text fragments are particularly useful where Longform is being used as a master document for all translated copy for a webpage.

#aria-label "
  Create a recipe
"
Result
Create a recipe

Whitespace

Whitespace is meaningful in Longform. Any markup indented two spaces out from an element will be outputted as a child of that element.

Some exceptions for this are when native markup of the output language and text are being processed. Or when in a preformatted block.

Declaring elements

Element tags

A sole element tag can be outputted using the element name followed by two colons ::.

div::

Element attributes

Element attributes are declared after the tag and are wrapped in square brackets [].

div[data-foo=bar][aria-describedby=#baz]::

Alternatively attributes can follow directly after the element tag with 1 level of indentation.

div::
  [data-foo=bar]
  [aria-describedby=#baz]

If an element is declared multiple times the content is concatenated into a single value. This behaviour does not apply to the element's id if it is defined using the attribute syntax. Classes will be concatenated with a space separating them.

meta::
  [name=description]
  [content=Lorem ipsum dolor sit amet, consectetur adipiscing elit.]
  [content=Quisque a sem et nisl mollis porttitor et sit amet neque.]
  [content=Maecenas suscipit nulla ac suscipit imperdiet. Quisque]
  [content=odio nisi, semper non dui quis, feugiat faucibus ipsum.]

Element output ids

The elements output markup id can be declared on the line before the tag at the same indentation level with the hash # symbol pre-fixing the id.

This form of giving an element an id also gives it a meaningful Longform identifier.

#element-id
div::

Alternatively the id can follow the tag name, before the closing semicolons. Again with a hash prefixing it. Unlike the form where the

div#element-id::

And finally the id can be declared using the attribute syntax.

div[id=element-id]::
<!-- or -->
div::
  [id=element-id]

If an element has an id declared for it twice only the first declaration is used.

Element classes

Classes can be defined following the element's tag declaration with a period . prefixing each tag.

div#element-id.class-1.class-2.class-3::

Alternatively classes can be defined using the attribute syntax on the lines following the tag definition.

div#element-id::
  [class=class-1]
  [class=class-2 class-3]

Element text and native markup content.

Elements can have text and native markup following the tag declaration with a space between the text and the double colons of the element definition.

div:: Text content with <em>some</em> native markup.

Alternatively, the text and native markup can follow the tag declaration with one extra indentation level.

div::
  Text content with <em>some</em> native markup.

Chained elements

Elements can be chained to create many elements in one line.

menu::
  li::a[href=/section1]::b:: Section 1
  li::a[href=/section2]::b:: Section 2
  li::a[href=/section3]::b:: Section 3

Preformatted blocks

Longform does not assign special meaning to any HTML tags so to retain formatting content can be wrapped in curly braces to create a preformatted block.

Escaped preformatted block
When an element is followed by a single curly brace, its content is HTML escaped and its whitespace is preserved at the children's indent level.
pre::
  code:: {
    div::
      <p>
        This content will preserve its formatting including
        the parent <code>div::</code>
      </p>
  }
Result
<pre><code>
  div::
    <p>
      This content will preserve its formatting including
      the parent <code>div::</code>
    </p>
</code></pre>
Un-escaped preformatted text
When an element is followed by a two curly braces, its formatting and content is kept intact.
script:: {{
  console.log('Hello, World!');
}}

style:: {{
  div {
    color: red;
  }
}}
Result
<script>
  console.log('Hello, World!');
</script><style>
  div {
    color: red;
  }
</style>

Comments

Comments can be written outside of a fragment using -- two hyphens. Within a fragment comments in the output markup language's syntax can be used but they will be written to the output alongside all other text in the native output markup.

Embedding fragments

Fragments can be embedded into other fragments as part of the Longform parsing allowing a full document to be created from many fragments. Fragments are embedded using the syntax #[fragment-id].

@root
@doctype:: html
html[lang=en]::
  body:: 
    header::
      menu::
        li::a[href=#section1]:: Section 1
        li::a[href=#section2]:: Section 2
    body::
      #[section1]
      #[section2]

#section1
section::
  ...

#section2
section::
  ...

Templating

Longform templates only allow key-value pairs inputed with the values being strings. The template is intended to be passed over to the client where the template might be rendered using client side state. It is assumed that the client has its own means to perform conditional statements and iterate, and is optimized to do so, so those functionalities are left to the scripting language to keep the templating logic lean.

@template
div::
  h3:: Recipe step #{position}

Templated markup

Arbitary Longform and HTML strings can be inserted in a template using the double hash variable expansion form. See content sanitization for rules on sanitizing untrusted input using this form.

@template
@allow-elements:: strong em
div::
  ##{markup}
  

Content sanitization

A Longform block can have sanitization rules applied to its content using the @allow-elements, @allow-attributes, @allow-data-attributes, @allow-all directives. These directives are designed to play well with the SanitizerConfig Web API, but for now a sanitizer library must be shiped with the parser to apply the directive rules on any content.

Sanitizer rules apply when expanding variables in a templating context using the double hash expansion syntax ##{var} and in situations where @patchable is used.

Sanitizer defaults

Longform cannot sanitize raw HTML without having a sanitizer library parsing HTML input and it cannot differenciate between text and HTML markup. If the double hash template expansion is used in a situation where no sanitizer is configured the variable expansion SHOULD be ignored by parser implementations. Parsers MAY support an option to bypass this default document level behaviour.

A document specifying no rules allowing elements, attributes or data attributes also SHOULD ignore all input using the double hash variable expansion. Again, a parser might allow equivilent options to the sanitizer directives to bypass this default behaviour.

Element specific sanitizer rules

A element can have rules applied allowing arbitary markup to be added to the document. Either @allow-elements or @allow-all must be used to allow any elements to persist.

@template
#embedding-content
section::
  header::
    h2:: #{header}
  @allow-elements:: strong em a[href target]
  @allow-attributes: class
  div::
    ##{markup}

Sanitization rules are inherited by child elements. So the following Longform would produce the same results.

@template
@allow-elements:: strong em a[href target]
@allow-attributes: class
#embedding-content
section::
  header::
    h2:: #{header}
  div::
    ##{markup}

Global settings

Settings can be configured document wide using the @global directive from the top level of the document.

@global::
  @allow-elements:: strong em a[href target]
  @allow-attributes: class
  
@template
#template1
section::
  ##{markup}

@template
#template2
section::
  ##{markup}

Directives

@url

Sets the URL of the Longform document. A HTTP Get request to the @url using the Accept header text/longform should produce the same document unless it has since been modified.

@url:: https://example.com/blog/article-1
@patchable

Asserts to the client that the document can be patched using a HTTP Patch request and the Content-Type header text/longform. The @patchable directive should be ignored unless the @url directive is used.

@url:: http://example.com/blog/longform-1
@patchable
@doctype

Inserts a doctype declaration at the beginning of a fragment.

@doctype:: html
html[lang=en]::
  head::
    ...
  body::
    ...
Result
<!doctype html>
<html lang="en">
  <head>...</head>
  <body>...</body>
</html>
@xml

Inserts an XML declaration at the beginning of a fragment.

@xml:: version="1.0" encoding="UTF-8"
html::
  [xmlns=http://www.w3.org/HTML/1998/html4]
  [xmlns:xdc=http://www.xml.com/books]
  body::
    ...
Result
<?xml version="1.0" encoding="UTF-8"?>
<html
  xmlns="http://www.w3.org/HTML/1998/html4"
  xmlns:xdc="http://www.xml.com/books"
>
  <body>
    ...
  </body>
</html>
@template

Marks a fragment as being a client side template. When a fragment is a template the Longform parser skips formatting the fragment and outputs it separately to the processed fragments to be passed through to the client. Client side logic can then pass the template into a special Longform template parser and have the HTML output returned.

@template
#button-text "
  Add new #{entityName}
"
@editable

Marks the children of an element as editable in a patch request. The element cannot be within a template and must have a Longform id set on it.

@editable
#edit-me
div::
  This content is editable.
@global

Applies directive rules to an entire Longform document. Directives applied before or within a fragment will typically override globally set rules.

@url:: https://example.com/pages/article-1
@patchable
@global::
  @allowed-elements:: h4 p strong em b i small hr br
@allow-elements

In a template this directive instructs the parser what elements can be rendered when applying non-escaped variable expansion within its scope. If used in a patchable document, client side editors should limit what elements can be edited in the editable element. The directive's rules should also be used to sanitize or reject input when merging edits from a HTTP Patch request into the document on the server.

Attributes can be allowed on specific elements by listing them in square brackets directly after the element in the directive's arguments.

If this or the @allow-all directives are not used all elements should be filtered or rejected when editing or applying variable expansion.

This directive can be applied in a @global directive block to set the default rules for a document. The directive applied closest to a template variable expansion or editable element takes precedence.

@editable
@allow-elements:: a[href target] p strong em
div::
  Edit me!
@allow-attributes

In a template this directive instructs the parser what attributes can be rendered when applying non-escaped variable expansion within its scope. If used in a patchable document client side editors can limit what attributes can be added to the markup. The directive's rules should be used to sanitize or reject input when merging edits from a HTTP Patch request into the document on the server.

If this or the @allow-all directives are not used all attributes should be filtered or rejected when editing or applying variable expansion.

This directive can be applied in a @global directive block to set the default rules for a document. The directive applied closest to a template variable expansion or editable element takes precedence.

@editable
@allow-attributes:: id name class
@allow-elements:: a[href target] p strong em form label[for] input button[submit]
div::
  Edit me...
@allow-data-attributes

In a template this directive instructs the parser to allow rendering of data-attributes when applying non-escaped variable expansion within its scope. If used in a patchable document client side editors can allow data-attributes to be added to the markup.

If this or the @allow-all directives are not used all data-attributes should be filtered or rejected when editing or applying variable expansion.

This directive can be applied in a @global directive block to set the default rules for a document. The directive applied closest to a template variable expansion or editable element takes precedence.

@editable
@allow-data-attributes
@allow-attributes:: id name class
@allow-elements:: a[href target] p strong em form label[for] input button[submit]
div::
  Edit me...
@allow-all

This directive instructs the parser to allow all elements, attributes and data-attributes when performing non-escaped variable expansion or when editing a patchable document.

This directive can be applied in a @global directive block to set the default rules for a document. The directive applied closest to a template variable expansion or editable element takes precedence.

@editable
@allow-all
body::
  Edit me...