Nick Mudge Ignition Consulting & Development

This post is with reference to the parsing library for the book Programming in Haskell, chapter 8.

To combine multiple parsers into a single parser you can use the sequence operator <<= or do notation.

The nice thing about the sequence operator is that it is more descriptive about how parsers are combined. And with the definition of the sequence operator you can work out and understand how the parsers are combined.

The nice thing about do notation is that it is nice. It is more concise and elegant. It is syntactic sugar for using the sequence operator.

item is a parser that takes a string and returns a pair with the first part containing the first character of the string and the second part containing the rest of the string.

Using the sequence operator, here is how to combine two item parsers into one parser with the result of returning the first two characters, making them a string, and the second part containing the rest of the string:

twoChar = item >>= \v1 -> item >>= \v2 -> return (v1:v2:[])

parse is just a function that takes input and applies it to a parser function. Here's an example of applying twoChar:

> parse twoChar "hello"
[("he", "llo")]

A couple things to notice: the syntax \ -> denotes a lambda or anonymous function. The return function accumulates values generated by the parsers, makes them a string, and returns the string along with any left over string. If input is less than two characters, the twoChar function will return an empty list [] denoting failure.

Here's twoChar defined with do notation:

twoChar = do v1 <- item
             v2 <- item
             return (v1:v2:[])

That's cleaner. Check out this definition:

twoChar = do v1 <- item
             do v2 <- item
                return (v1:v2:[])
              +++ return (v1:[])

+++ is a conditional operator. In this case it says that if trying to parse a second character fails (because the string is one character long), then just return the first character and the rest of the string. The function will work if an input string is one character long or longer. Notice the nested do notation. Since do notation is just syntactic surgar for using the sequence operator, how do you write the equivalent nested do notation using the sequence operator?

twoChar = item >>= \v1 -> (item >>= \v2 -> return (v1:v2:[])) +++ return (v1:[])

But I think it is clearer in this case to use prefix notation, like this:

twoChar = item >>= \v1 -> (+++) (item >>= \v2 -> return (v1:v2:[])) (return (v1:[]))

For kicks: Here's do notation with prefix notation:

thingy = do v1 <- item
              (+++) 
               (do v2 <- item 
                   return (v1:v2:[]))
               (return (v1:[]))

Comments

Name: (required)
Email: (required)
Website:
What has four legs, rhymes with bat and says, "Meow?" (One word answer.)
Spam Filter: