lux is a text-processing tool, like sed. Its job is to take a line of input, transform it by a rule we specify, and write the result into output. Rule is two-folded: there is an input template and an output template.
For example, this expression parses time.
Snippet's layout is as follows: on top is the control panel, with play and reset buttons and a menu. Directly below is an input template area.
Below is an area for input and a pre-rendered output.
These snippets are interactive. Change the time in the input and hit the โถ๏ธ button.
Structure of JSON output will be discussed later.
Literals
The simplest parser is a literal parser. To make a literal, surround a desired string with a pair of single or double quotes.
Change the input to something else and see how the program rejects it.
Built-ins
To parse data unknown beforehand, we use built-in primitives. There are 3 built-ins: any, alpha and digit. To use a built-in parser, just use its name.
any parser matches a single utf-8 character.
alpha and digit match a single letter and digit respectively. In this snippet, replace any with alpha or digit and inspect the output.
Sequencing
, (comma) we can sequence parsers. A compound parser succeeds only if both parsers succeed.Modifiers
In the previous example, we fixed the number of digits in hours and minutes to two. But what if we want to allow a non-fixed number of something? For this occurrence, we have modifiers. There are four of them in total.
+ (plus) modifier runs the parser one or more times.* (star) modifier runs the parser zero or more times.? (question mark) modifier runs the parser once or never.The final modifier is repeat and it looks like this: {n-m}. It repeats the modifier parser n to m times.
For the common case of {n-n}, shorthand exists: {n}.
- (minus) before time (imagine we're parsing UTC time zone offset).Alternation
Sometimes we want the parser to accept either one option or another. For example, if we need to parse a binary digit, we need a parser that only accepts 0 or 1.
This behavior is achieved with |(pipe). Parsers connected with | will be tried in sequence until one succeeds.
Parentheses
Match data
For now, every parser we introduced could only validate input: it either accepted or rejected it. Parsing implies some kind of structured output, that can be used down the line.
This structured output that lux yields on successful parsing is called match. Match contains a piece of source from which it was parsed and, optionally, match data. Match data comes in three kinds.
Array
Perhaps the simplest kind of match data is the array. Arrays contain homogeneous matches.
To construct a parser that yields an array, we surround it with brackets. At runtime, when the parser succeeds, its output is wrapped into a singleton array.
Choose infer type in the snippet menu and push โถ๏ธ. Note that this parser has type [u]โunlike parsers we constructed before, all of which had type u. u in types stands for unitโmatch without data. For brevity, we will say type of parser instead of type of value yielded by the parser.
In the output pane, we can see how match with data is encoded in JSON: field s contains the aforementioned source, and x contains match data when it is present.
Records
Very often, the data we parse is heterogeneous. For that case, lux has records.
Like with arrays, records have syntax to wrap a parser: {i:p}. At runtime, the composite parser adds a label i to the value yielded by the inner parser. Note that labels are tracked in types.
Also, lux allows writing {i} instead of {i:i}.
And, of course, records can be nested. See how, again, types mirror the structure of our output.
Variants
Often, when alternating between parsers, we want to imprint which parser succeeded on the result. This is achieved with variants. Like a record, a variant adds a label to the inner value: <i:p>. We call this label-type pair a case. A match with variant data contains exactly one case, unlike a record, which contains its every field. The type of a variant signifies which cases it might contain.
To construct a variant with multiple cases, we use alternation. A compound parser's type is a union of all cases.
Examples
Together, records and variants form a quite powerful system that resembles the algebraic data types found in many modern languages. Armed with these tools, let's parse some real-world formats.