Longform Markup Language

Language Intro

Longform is an easy to read markup and templating language that outputs to HTML and XML. A Longform document can be parsed to a complete document in the output format or as fragments to be used by an application as a source of markup when generating a document, or manipulating DOM in a browser environment.

Example

header::
  hgroup::
    h1:: Longform Markup Language
    p::
      A markup and templating language for producing
      <b class=keyword>HTML</b> and <b class=keyword>
      XML</b> document fragments.

Result

<header>
  <hgroup>
    <h1>Longform Markup Language</h1>
    <p>
      A markup and templating language
      for producing <b class="keyword">
      HTML</b> and <b class="keyword">XML
      </b> document fragments.
    </p>
  </hgroup>
</header>

Unlike Markdown, which excels in marking up article content but cannot express more recent additions to the HTML spec without falling back to HTML, the Longform syntax adds no constraints on the possible markup of the output format. As a result Longform plays well with custom HTML elements and elements and attributes which might be added to HTML in the future. And in being able to express elements such as <nav> or <head> it can be used effectively for static content markup for regions of the website beyond the main content.

Longform also supports directives which alter how a block of Longform or plain HTML / XML is processed. This specification will formalize directives which may be used in a browser environment using a minimal Longform parser.

For example, in a Longform template the directives @allow-elements and @allow-attributes can be used to allow user defined markup to be used in a template while filtering out any exempt markup. User input markup in the Longform syntax will have filtering rules applied by the Longform parser, while embedded HTML or XML content will require that an external serializer such as DOMPurify is configured to hook into the parser and apply the rules to the user defined content.

@template
##card
section.card.note-#{position}::
  header::
    h3:: #{title}
  @allow-attributes:: lang dir
  @allow-elements:: details[open name] summary h4 p strong em a[href target]
  div.card-content::
    ##{content}

Finally, the work in progress Longform parser is small and fast. Currently at 2.6kb when minified and gzipped and supporting most intended features for the browser environment. The parser is likely to grow but is unlikely to reach near the size of Commonmark at 47.6kb or Marked at 12kb. The parser is also fast, as it can complete its job by building up the resulting HTML fragment strings line by line instead of with a two step process of constructing and abstract syntax tree and then forming the valid output markup.

Fragments

Longform's primary design goal is to output fragments of HTML into a form that can be merged with other sources and rendered into a complete document by another program, and to do so in such a way that a client side runtime can extract the fragments from rendered DOM and re-use those fragments in a client application without transporting data in another form.

To achieve this, depending on the fragment kind, a Longform parser will embed HTML ids or data attributes in a fragment and export additional meta information that can be embedded into the rendered document for the client runtime to extract the fragments from the rendered DOM.

Note that not all fragment kinds can be transported once in the rendered document without duplication. Text fragments allow the client to present messages where HTML markup is not supported. For example if a client was to dynamically render an aria-label value, any included HTML markup would be presented or read to the user as text. By not having HTML markup wrapping the content there is no predicable and straightforward method for annotating the text fragment so the runtime can extract it from the DOM.

However, allowing text only fragments makes Longform a suitable format for transporting all translatable text and markup in the language specified by the client. If all textual content is placed in a Longform document for a given language, and merged into non-textual content to make a complete webpage, only the Longform document will require translation to support other languages; even if the page has lots of textual content embedded in interactive content rendered by the runtime.

`The root fragment`

The root fragment is an optional fragment that has no Longform identifier. There can be only one root fragment in a Longform document.

The root element of the root fragment has no whitespace preceding it. The fragment has no Longform identifier but the root element can have a HTML identifier defined on it.

After a root fragment is found in the process of parsing a Longform document, all other fragments which do not have Longform identifiers assigned are ignored when rendering the output markup. The root fragment often would have the @doctype or @xml directive prefixing it to add a HTML doctype or XML declaration prefixing the output markup.

`Example`

@doctype:: html
html::
  head::
    title:: Example Root Fragment
  body::
    h1:: Example Root Fragment

`Result`

<!doctype html>
<html>
  <head>
    <title>Example Root Fragment</title>
  </head>
  <body>
    <h1>Example Root Fragment</h1>
  </body>
</html>

`Embedded fragment identifiers`

Fragments can have their Longform identifier embedded into the output markup as a HTML or XML id. Embedded ids should be unique to both the target document and Longform document. If a Longform fragment references a fragment with an embedded identifier, any other attempts to reference the embedded fragment will be ignored by the Longform parser. Fragments with embedded identifiers will also not be exported to be directly referenced by external software if they have been referenced in the Longform document.

#embedded-id
section::
  p:: A fragment that can be referenced by its <a href=#embedded-id>identifier</a>

Result

<section id="embedded-id">
  <p>
    A fragment that can be referenced by its <a href="#embedded-id">identifier</a>
  </p>
</section>

`Bare fragments`

Bare fragments have ids that can be referenced many times within the same Longform document and are always outputted to be used by external software. The given id is written to the output markup as a data attribute instead of an id.

##alert-something-went-wrong
dialog[open].error::
  p::
    strong:: Something went wrong!
  form[method=dialog]::
    button:: Close

Result

<dialog data-lf="alert-something-went-wrong" open class="error">
  <strong>Something went wrong!</strong>
  <form method="dialog">
    <button>Close</botton>
  </form>
</dialog>

`Range fragments`

Range fragments can have more than one topmost element which are outputted as siblings. All top-most elements of the range fragment would have the same data attribute.

#head-details [
  title:: The range fragment
  meta::
    [name=description]
    [content=Demonstrating the range fragment]
]

Result

<title data-lf="head-details">A range of fragments</title>
<meta data-lf="head-details" name="description" content="Demonstrating the range fragment" />

`Text fragments`

Text fragments do not include any elements. Programs using Longform output can use text fragments in locations where elements are not allowed such as HTML attributes. Text fragments are particularly useful where Longform is being used as a master document for all translated copy for a webpage.

#aria-label "
  Create a recipe
"

Result

Create a recipe

 Whitespace
Whitespace is meaningful in Longform. Any markup indented two spaces out from an element will be outputted as a child of that element.
Some exceptions for this are when native markup of the output language and text are being processed. Or when in a preformatted block.
 Declaring elements
Element tags
A sole element tag can be outputted using the element name followed by two colons ::.
div::
Element attributes
Element attributes are declared after the tag and are wrapped in square brackets [].
div[data-foo=bar][aria-describedby=#baz]::
Alternatively attributes can follow directly after the element tag with 1 level of indentation.
div::
  [data-foo=bar]
  [aria-describedby=#baz]
If an element is declared multiple times the content is concatenated into a single value. This behaviour does not apply to the element's id if it is defined using the attribute syntax. Classes will be concatenated with a space separating them.
meta::
  [name=description]
  [content=Lorem ipsum dolor sit amet, consectetur adipiscing elit.]
  [content=Quisque a sem et nisl mollis porttitor et sit amet neque.]
  [content=Maecenas suscipit nulla ac suscipit imperdiet. Quisque]
  [content=odio nisi, semper non dui quis, feugiat faucibus ipsum.]
Element output ids
The elements output markup id can be declared on the line before the tag at the same indentation level with the hash # symbol pre-fixing the id.
This form of giving an element an id also gives it a meaningful Longform identifier.
#element-id
div::
Alternatively the id can follow the tag name, before the closing semicolons. Again with a hash prefixing it. Unlike the form where the
div#element-id::
And finally the id can be declared using the attribute syntax.
div[id=element-id]::
<!-- or -->
div::
  [id=element-id]
If an element has an id declared for it twice only the first declaration is used.
Element classes
Classes can be defined following the element's tag declaration with a period . prefixing each tag.
div#element-id.class-1.class-2.class-3::
Alternatively classes can be defined using the attribute syntax on the lines following the tag definition.
div#element-id::
  [class=class-1]
  [class=class-2 class-3]
Element text and native markup content.
Elements can have text and native markup following the tag declaration with a space between the text and the double colons of the element definition.
div:: Text content with <em>some</em> native markup.
Alternatively, the text and native markup can follow the tag declaration with one extra indentation level.
div::
  Text content with <em>some</em> native markup.
Chained elements
Chained elements have not been implemented by the Longform parser.
Elements can be chained to create many elements in one line.
menu::
  li::a[href=/section1]::b:: Section 1
  li::a[href=/section2]::b:: Section 2
  li::a[href=/section3]::b:: Section 3
 Preformatted blocks
Longform does not assign special meaning to any HTML tags so to retain formatting content can be wrapped in curly braces to create a preformatted block.
Escaped preformatted block
When an element is followed by a single curly brace, its content is HTML escaped and its whitespace is preserved at the children's indent level.pre::
  code:: {
    div::
      <p>
        This content will preserve its formatting including
        the parent <code>div::</code>
      </p>
  }
Result
<pre><code>
  div::
    <p>
      This content will preserve its formatting including
      the parent <code>div::</code>
    </p>
</code></pre>
Un-escaped preformatted text
When an element is followed by a two curly braces, its formatting and content is kept intact.script:: {{
  console.log('Hello, World!');
}}

style:: {{
  div {
    color: red;
  }
}}
Result
<script>
  console.log('Hello, World!');
</script><style>
  div {
    color: red;
  }
</style>
 Comments
Comments can be written outside of a fragment using -- two hyphens. Within a fragment comments in the output markup language's syntax can be used but they will be written to the output alongside all other text in the native output markup.
 Embedding fragments
Fragments can be embedded into other fragments as part of the Longform parsing allowing a full document to be created from many fragments. Fragments are embedded using the syntax #[fragment-id].
@root
@doctype:: html
html[lang=en]::
  body:: 
    header::
      menu::
        li::a[href=#section1]:: Section 1
        li::a[href=#section2]:: Section 2
    body::
      #[section1]
      #[section2]

#section1
section::
  ...

#section2
section::
  ...
  Templating
Longform templates are intended for client side templating where Javascript may be creating new markup and may format local values into the Longform markup. Future versions of this spec will include pre-processing rules for generating Longform documents using server side state and logic within the document.
When a Longform document is processed, any templates will be provided as strings indented to be re-processed using a separate template processing function which knows how to expand any variables in the template.
Longform templates only allow key-value pairs inputed with the values being strings. The template is intended to be passed over to the client where the template might be rendered using client side state. It is assumed that the client has its own means to perform conditional statements and iterate, and is optimized to do so, so those functionalities are left to the scripting language to keep the templating logic lean.
@template
div::
  h3:: Recipe step #{position}
Templated markup
Templated markup has not been implemented in the Longform parser.
Arbitary Longform and HTML strings can be inserted in a template using the double hash variable expansion form. See content sanitization for rules on sanitizing untrusted input using this form.
@template
@allow-elements:: strong em
div::
  ##{markup}
  
Content sanitization
Content sanitization has not been implemented in the Longform parser.
A Longform block can have sanitization rules applied to its content using the @allow-elements, @allow-attributes, @allow-data-attributes, @allow-all directives. These directives are designed to play well with the SanitizerConfig Web API, but for now a sanitizer library must be shiped with the parser to apply the directive rules on any content.
Sanitizer rules apply when expanding variables in a templating context using the double hash expansion syntax ##{var} and in situations where @patchable is used.
Sanitizer defaults
Longform cannot sanitize raw HTML without having a sanitizer library parsing HTML input and it cannot differenciate between text and HTML markup. If the double hash template expansion is used in a situation where no sanitizer is configured the variable expansion SHOULD be ignored by parser implementations. Parsers MAY support an option to bypass this default document level behaviour.
A document specifying no rules allowing elements, attributes or data attributes also SHOULD ignore all input using the double hash variable expansion. Again, a parser might allow equivilent options to the sanitizer directives to bypass this default behaviour.
Element specific sanitizer rules
A element can have rules applied allowing arbitary markup to be added to the document. Either @allow-elements or @allow-all must be used to allow any elements to persist.
@template
#embedding-content
section::
  header::
    h2:: #{header}
  @allow-elements:: strong em a[href target]
  @allow-attributes: class
  div::
    ##{markup}
Sanitization rules are inherited by child elements. So the following Longform would produce the same results.
@template
@allow-elements:: strong em a[href target]
@allow-attributes: class
#embedding-content
section::
  header::
    h2:: #{header}
  div::
    ##{markup}
Global settings
Settings can be configured document wide using the @global directive from the top level of the document.
@global::
  @allow-elements:: strong em a[href target]
  @allow-attributes: class
  
@template
#template1
section::
  ##{markup}

@template
#template2
section::
  ##{markup}
    Directives
@url
Sets the URL of the Longform document. A HTTP Get request to the @url using the Accept header text/longform should produce the same document unless it has since been modified.
@url:: https://example.com/blog/article-1
@patchable
Asserts to the client that the document can be patched using a HTTP Patch request and the Content-Type header text/longform. The @patchable directive should be ignored unless the @url directive is used.
@url:: http://example.com/blog/longform-1
@patchable
@doctype
Inserts a doctype declaration at the beginning of a fragment.
@doctype:: html
html[lang=en]::
  head::
    ...
  body::
    ...
Result
<!doctype html>
<html lang="en">
  <head>...</head>
  <body>...</body>
</html>
@xml
Inserts an XML declaration at the beginning of a fragment.
@xml:: version="1.0" encoding="UTF-8"
html::
  [xmlns=http://www.w3.org/HTML/1998/html4]
  [xmlns:xdc=http://www.xml.com/books]
  body::
    ...
Result
<?xml version="1.0" encoding="UTF-8"?>
<html
  xmlns="http://www.w3.org/HTML/1998/html4"
  xmlns:xdc="http://www.xml.com/books"
>
  <body>
    ...
  </body>
</html>
@template
Marks a fragment as being a client side template. When a fragment is a template the Longform parser skips formatting the fragment and outputs it separately to the processed fragments to be passed through to the client. Client side logic can then pass the template into a special Longform template parser and have the HTML output returned.
@template
#button-text "
  Add new #{entityName}
"
@editable
Marks the children of an element as editable in a patch request. The element cannot be within a template and must have a Longform id set on it.
@editable
#edit-me
div::
  This content is editable.
@global
Applies directive rules to an entire Longform document. Directives applied before or within a fragment will typically override globally set rules.
@url:: https://example.com/pages/article-1
@patchable
@global::
  @allowed-elements:: h4 p strong em b i small hr br
@allow-elements
In a template this directive instructs the parser what elements can be rendered when applying non-escaped variable expansion within its scope. If used in a patchable document, client side editors should limit what elements can be edited in the editable element. The directive's rules should also be used to sanitize or reject input when merging edits from a HTTP Patch request into the document on the server.
Attributes can be allowed on specific elements by listing them in square brackets directly after the element in the directive's arguments.
If this or the @allow-all directives are not used all elements should be filtered or rejected when editing or applying variable expansion.
This directive can be applied in a @global directive block to set the default rules for a document. The directive applied closest to a template variable expansion or editable element takes precedence.
@editable
@allow-elements:: a[href target] p strong em
div::
  Edit me!
@allow-attributes
In a template this directive instructs the parser what attributes can be rendered when applying non-escaped variable expansion within its scope. If used in a patchable document client side editors can limit what attributes can be added to the markup. The directive's rules should be used to sanitize or reject input when merging edits from a HTTP Patch request into the document on the server.
If this or the @allow-all directives are not used all attributes should be filtered or rejected when editing or applying variable expansion.
This directive can be applied in a @global directive block to set the default rules for a document. The directive applied closest to a template variable expansion or editable element takes precedence.
@editable
@allow-attributes:: id name class
@allow-elements:: a[href target] p strong em form label[for] input button[submit]
div::
  Edit me...
@allow-data-attributes
In a template this directive instructs the parser to allow rendering of data-attributes when applying non-escaped variable expansion within its scope. If used in a patchable document client side editors can allow data-attributes to be added to the markup.
If this or the @allow-all directives are not used all data-attributes should be filtered or rejected when editing or applying variable expansion.
This directive can be applied in a @global directive block to set the default rules for a document. The directive applied closest to a template variable expansion or editable element takes precedence.
@editable
@allow-data-attributes
@allow-attributes:: id name class
@allow-elements:: a[href target] p strong em form label[for] input button[submit]
div::
  Edit me...
@allow-all
This directive instructs the parser to allow all elements, attributes and data-attributes when performing non-escaped variable expansion or when editing a patchable document.
This directive can be applied in a @global directive block to set the default rules for a document. The directive applied closest to a template variable expansion or editable element takes precedence.
@editable
@allow-all
body::
  Edit me...