How to write your own JSON parser using functional TypeScript & fp‑ts ‑ part I

JSON is an omnipresent format that can be read by every mainstream language in the wild. It is well-specified and simple-yet-complex enough to make writing a parser a challenging and fun test of our programming skill.

In the first post of this two part series, we lay down the basic components for building a general string parser. In the second post, we will focus more on the actual JSON format specification and parsing every part of it. In the end, we will have a fully functional JSON parser (think of the native JSON.parse function).

We will use TypeScript and the fp-ts library, and basic knowledge of functional concepts (function composition, immutability) and data types (Option, Either) is expected. That said, the code should be understandable even if you are not familiar with seemingly scary stuff like functors, monoids or monads.

The beauty of functional programming is that it allows us to build complex things from simple ones, and that is exactly how we are going to build our JSON parser - by combining simple things.

Shaping the parser

First we’ll give our parser some shape by defining some types. Every parser needs an input and ours is no exception.

-- CODE language-js --

Defining parser type

With <inline-code>input<inline-code> present, we can define a type representing the parser - <inline-code>Parser<A><inline-code>. It's a function that, when given some input, returns a parsed value of type A and the rest of the input. Parsing can fail, so we’ll wrap return value in Either type and define <inline-code>ParserError<inline-code> to represent all the stuff that can go wrong during parsing.

-- CODE language-js --

The <inline-code>advance<inline-code> function will read one character from the <inline-code>input<inline-code> and return it with the rest of the <inline-code>input<inline-code>. The return value is wrapped inside Option in order to represent the state when there are no more chars to read (<inline-code>none<inline-code> is returned in that case).

-- CODE language-js --

Note that <inline-code>input<inline-code> is immutable — we are creating a new instance every time we advance the position.

Building the first parser

Now we have everything in place to build our first parser — one capable of no less than parsing a specific character!

We will read a single character from <inline-code>input<inline-code> using an advanced function and check to see if it matches the expected character.

-- CODE language-js --

Time to see if our parser behaves correctly.

Checking functionality

First, we'll create a helper for displaying parser results as a human-readable string. We handle both possible outcomes, failure and success (represented by Either type's left & right value).

-- CODE language-js --

Works like a charm! Notice that our parser is returning a string value. If we want it to return a number, we can transform parsers the same way we transform values inside an array using the map function.

-- CODE language-js --

Changing value types

We have defined a map function that can be easily used inside a pipe function. Implementation was simple — we just applied function f to a successfully parsed value.

Using map, we can transform our <inline-code>Parser<string><inline-code> into <inline-code>Parser<number><inline-code>.

-- CODE language-js --

Combining multiple parsers into sequence

Parsing a simple char was fun, but how about combining multiple parsers into one that could parse the whole word?

Let's start simple by combining two parsers into one. The product function does nothing more than apply the first parser, applying the second one and combining their results into one tuple.

-- CODE language-js --

We can generalize this concept further to create a function sequence that can combine any number of <inline-code>Parser<A><inline-code> instances. Notice that we are turning the type inside out — we provide the function with <inline-code>ReadonlyArray<Parser<A>>><inline-code>, and get <inline-code>Parser<ReadonlyArray<A>>><inline-code> as a result.

In order to implement the sequence, we also need a parser that always succeeds as it will be used as a starting value for reducing the array of parsers.

Do you wonder why we decided that an empty sequence yields a successful result? A sequence of parsers succeeds when all parsers succeed, and vice versa. Our parser fails when a parser fails. Obviously, this will never happen for an empty sequence, so by combining empty sequences we get the parser that always succeeds.

-- CODE language-js --

Now we have a parser that executes all character parsers one by one in the sequence and returns an array of parsed characters. But, it would be nice to have a parser that concatenates all characters and returns the whole word. We will solve it by using a map and concatenating all chars in the resulting array.

-- CODE language-js --

We often come across a situation where we want our parser to choose from multiple options. For those cases, we need a function that combines multiple parsers into one in such a way that it succeeds when one of the already-passed parsers succeeds.

Let's implement it and call it oneOf!

Combining parsers with multiple options

In contrast to sequence, the oneOf function yields a failing parser when it passes an empty list of possible parsers. The reason is simple: it must be true that there is a parser that succeeds, but this condition is not met for an empty list.

-- CODE language-js --

Now, one last function combining parsers - many. It will run a given parser repeatedly until it fails, and will succeed when parsing succeeds at least once.

-- CODE language-js --

Now we have the basic building blocks to create more complex parsers. As promised, next time we will dig deeper into the JSON specifications, try to represent it in TypesScript, and parse it using our own parser.

Stay tuned...and functional!

Honza Trtík
Doing stuff. Occasional functional inquisition.