What's the Point of Applicative?

Scala Programming

Lots of Scala developers know what a Monad is. A smaller number are aware what the definition of an Applicative Functor. It’s tempting to think of Applicative as a “worse Monad”. This is not exactly wrong, but it doesn’t provide us with any insight into what Applicatives are good for.

In this blog post, I’ll attempt to answer that question by looking at an example where requiring an Applicative, rather than a Monad gives us very different capabilities.

Writing a Command-Line Argument Parser

A fairly common programming problem is writing a parser for command line arguments. That is, a function that takes a list of strings, such as --firstname Joe --surname Warren --tall and converts this into some kind of domain object, such as:

case class Person(tall: Boolean, firstname: String, surname: String)
val joe = Person(true, "Joe", "Warren")

Let’s write a simple interface to a command line parser Fair warning, this is going to be simple demonstration code, rather than a fully featured parser.

We’ll start by defining an error type, because it’s ~~more stylish~~ clearer than littering String everywhere.

case class ParseFailure(error: String)

And now, let’s define an interface.

/* Making the class sealed prevents other users from creating their `Parser` subclasses
 * Our aim is to provide simple building blocks
 * that can be combined into a more complex parser
 * not to provide an interface that they have to implement
 *
 * The class is parameterized with a type `A`
 * this is the type of object produced by a successful parse.
 */
sealed abstract class Parser[A] {
  // defining a type alias saves us some typing
  type PartialParseResult = Either[ParseFailure, (List[String], A)]

  // Each parser can attempt to partially parse a block of command line arguments
  // This will either return a `Left[ParseFailure]` if the arguments are invalid
  // or it will "consume" some of the arguments, returning a `Right[(List[String], A)]`
  // containing both a reduced list of arguments, and a value of type `A`
  //
  // We can mark this protected, because it isn't part of the external dsl
  protected def partialParse(args: List[String]): PartialParseResult

  // we parse a list of arguments by calling `partialParse`,
  // and then validating that there are no arguments left unconsumed
  // every Parser should implement `parse` in the same way, so we can mark this `final`
  final def parse(args: List[String]): Either[ParseFailure, A] =
    partialParse(args) flatMap {
      case (List(), result) => Right(result)
      case (remainingArgs, _) =>
        Left(ParseFailure(
          s"Unrecognised Arguments:${remainingArgs.mkString("\t\n", "\t\n", "")}"
        ))
    }
}

Now let’s write a “concrete” parser. We want this to be relatively simple, in this case, a boolean “flag” that will return true or false depending on whether or not the flag is present.

By implementing this as an anonymous class, we prevent the user from pattern matching on the specific implementation of a Parser. This makes our Parser type “opaque”; external code can’t look inside at the implementation details. Creating the anonymous class via the apply method on a Flag object, gives us some nice syntactic sugar, and lets us write Flag("tall") as if we were constructing an instance of a case class.

object Flag {
  def apply(name: String): Parser[Boolean] = new Parser[Boolean] {
    private val flag = s"--$name"

    /* a flag is either present, or it is absent
     * both valid states, so `partialParse can't fail
     * and always returns a `Right` with a tuple
     * The first value returned is the arguments list with the flag (if present) removed
     * The second value is a boolean; whether the flag was passed
     */
    override protected def partialParse(args: List[String]): PartialParseResult =
      Right(args.filter(_ != flag), args.contains(flag))
  }
}

In a similar vein, let’s add support for a (mandatory) String argument. This is going to require that an argument is passed, such as --firstname Joe, and it’s going to have the type Parser[String]. Unlike the Flag Parser, it is possible for this parser to fail, since either the value, or the entire argument could be missing.

object StringArg {
  def apply(name: String): Parser[String] = new Parser[String] {
    private val flag = s"--$name"

    override protected def partialParse(args: List[String]): PartialParseResult = {
      args.span(_ != flag) match {
        case (_, List()) =>
          Left(ParseFailure(s"Missing argument $flag"))
        case (_, List(_)) =>
          Left(ParseFailure(s"Expected a value after $flag"))
        case (prefix, _ :: value :: suffix) =>
          Right((prefix ++ suffix, value))
      }
    }
  }
}

Now, we have some simple building blocks, we need a way to compose them. We’re going to do this by defining a Monad instance for our Parser type.

The world contains enough Monad tutorials, so I’m going to assume that you already know what a Monad is. But for completeness, I will include a (potentially inaccurate and definitely unhelpful) definition.

Monad is a typeclass where instances have both a pure method, which takes a value, and returns that value “within” an instance of the monad (in our case pure[A](a: A): Parser[A]), and a flatMapmethod, which takes an instance of the monad of one type, and a function from that type, to an instance of the monad of a second type, and returns an instance of the monad of the second type (in our case flatMap[A, B](fa: Parser[A])(f: A => Parser[B]): Parser[B]), and also where instances must obey the “Monad Laws”: “Left Identity”, “Right Identity” and “Associativity”.

In my defence, I did say upfront that it would be unhelpful.

We’re going to use the Cats Monad implementation.

If you’re following along at home, the required import is:

import cats.Monad

object Parser {
  implicit def monadForParser: Monad[Parser] = new Monad[Parser] {
    // pure creates a parser that always produces the same value
    // no matter what the input is
    override def pure[A](x: A): Parser[A] = new Parser[A] {
      override protected def partialParse(args: List[String]): PartialParseResult =
        Right((args, x))
    }

    /* flatMap runs the first parser
     * then calls the provided function on the output to create a new parser
     * and runs that on the arguments that weren't consumed by the first parser
     * this implementation uses the `Either` Monad
     * to compose the potential errors from the two parsers
     */
    override def flatMap[A, B](fa: Parser[A])(f: A => Parser[B]): Parser[B] =
      new Parser[B] {
        override protected def partialParse(args: List[String]): PartialParseResult =
          for {
            (nextArgs, a) <- fa.partialParse(args)
            result <- f(a).partialParse(nextArgs)
          } yield result
      }

    /* cats Monad requires we write tailRecM for reasons of stack safety
     * Because this is example code, lets more or less ignore it
     * and go with a stock implementation
     */
    override def tailRecM[A, B](a: A)(f: A => Parser[Either[A, B]]): Parser[B] =
      flatMap(f(a)) {
        case Right(b)    => pure(b)
        case Left(nextA) => tailRecM(nextA)(f)
      }
  }
}

Now let’s see our parser in action, and write a parser for the Person case class mentioned above.

case class Person(isTall: Boolean, firstname: String, surname: String)

This is going to require another set of imports, as we’re using some cats syntactic sugar:

import cats.syntax.all._

We can use a “for comprehension” to provide an aesthetically pleasing way to “Monadically compose” our simple parsers

  val personParser: Parser[Person] = for {
    tall <- Flag("tall")
    firstname <- StringArg("firstname")
    surname <- StringArg("surname")
  } yield Person(tall, firstname, surname)

And we can write some tests to validate that this works as expected:

import org.scalatest.freespec.AnyFreeSpec
import org.scalatest.matchers.should.Matchers

class ParserSpec extends AnyFreeSpec with Matchers {
  "A Monadic Parser" - {
    "Should be able to parse argument lists" - {
      "basic clean parse" in {
        val args = "--tall --firstname John --surname Doe".split(' ').toList
        personParser.parse(args) should be(
          Right(Person(true, "John", "Doe"))
        )
      }
      "flags are optional" in {
        val args = "--firstname John --surname Doe".split(' ').toList
        personParser.parse(args) should be(
          Right(Person(false, "John", "Doe"))
        )
      }
      "arguments are not optional" in {
        val args = "--tall --surname Doe".split(' ').toList
        personParser.parse(args) should be(
          Left(ParseFailure("Missing argument --firstname"))
        )
      }
    }
  }
}

Monadic Shortcomings

A common thing to want a command line parser to do (on top of parsing arguments) is to print a simple “usage” summary of the arguments that are expected. Let’s try to extend our Parser interface in order to support this.

If you’re following along, at this point we’re more or less starting from scratch (in a separate package). The only thing we share is the error type, ParseFailure.

The Parser interface is more or less unchanged, except for the addition of a “usage” method.

sealed abstract class PrintingParser[A] {
  type PartialParseResult = Either[ParseFailure, (List[String], A)]

  protected def partialParse(args: List[String]): PartialParseResult

  final def parse(args: List[String]): Either[ParseFailure, A] =
    partialParse(args) flatMap {
      case (List(), result) => Right(result)
      case (remainingArgs, _) =>
        Left(ParseFailure(
          s"Unrecognised Arguments:${remainingArgs.mkString("\t\n", "\t\n", "")}"))
    }

  def usage: String
}

Flag and StringOption are also more or less unchanged, other than by the addition of simple usage strings.

object Flag {
  def apply(name: String): PrintingParser[Boolean] = new PrintingParser[Boolean] {
    private val flag = s"--$name"

    override protected def partialParse(args: List[String]): PartialParseResult = {
      Right(args.filter(_ != flag), args.contains(flag))
    }

    override def usage: String = s"[$flag]"
  }
}

object StringOption {
  def apply(name: String): PrintingParser[String] = new PrintingParser[String] {
    private val flag = s"--$name"

    override protected def partialParse(args: List[String]): PartialParseResult = {
      args.span(_ != flag) match {
        case (_, List()) => 
          Left(ParseFailure(s"Missing argument $flag"))
        case (_, List(_)) =>
          Left(ParseFailure(s"Expected a value after $flag"))
        case (prefix, _ :: value :: suffix) =>
          Right((prefix ++ suffix, value))
      }
    }

    override def usage: String = s"--$name <value>"
  }
}

Now, let’s try to define a Monad for our PrintingParser.

Defining pure is easy enough, a pure Parser doesn’t require any arguments, so we can return an empty usage string, and leave the rest of the implementation unchanged

  override def pure[A](x: A): PrintingParser[A] = new PrintingParser[A] {
    override protected def partialParse(args: List[String]): PartialParseResult =
      Right((args, x))
    override def usage: String = ""
  }

However, when we try to define flatMap we run into some difficulties with the usage string

  override def flatMap[A, B]
    (fa: PrintingParser[A])
    (f: A => PrintingParser[B]):
    PrintingParser[B] =
    new PrintingParser[B] {
      override protected def partialParse(args: List[String]): PartialParseResult = for {
        (nextArgs, a) <- fa.partialParse(args)
        result <- f(a).partialParse(nextArgs)
        } yield result
      override def usage: String = ???
    }

Ideally, we’d like for this to include the usage strings from both the first and second parsers.

Unfortunately:

We have a fa: PrintingParser[A] (the first parser); it gives us the first half of the usage string 😀
We don’t have a PrintingParser[B] (the second parser), just a function f: A => PrintingParser[B] 😞
So we need a value of type A in order to get a PrintingParser[B] 🤔
We can’t get an A out of our PrintingParser[A] without running it on some arguments, and we want to be able to use usage without supplying an arguments list 🤬

Fortunately:

This is where Applicative comes to the rescue 🦸‍♀️

Applicative is a typeclass like Monad, except instead of requiring flatMap, it requires the (weaker) ap. (“Weaker” in this case means that we can build ap out of pure and flatMap, but not the other way round. Because of this, every Monad can be used as an Applicative, but not vice versa.)

ap has the signature (when applied to Parser) def ap[A, B](ff: PrintingParser[A => B])(fa: PrintingParser[A]): PrintingParser[B]. Compared to flatMap, in ap, the function sits “inside” the Parser, so we have direct access to both Parsers when composing them, and can look at both usage strings.

import cats.Applicative

object PrintingParser {
  implicit def applicativeForPrintingParser: Applicative[PrintingParser] =
    new Applicative[PrintingParser] {
      override def pure[A](x: A): PrintingParser[A] = new PrintingParser[A] {
        override protected def partialParse(args: List[String]): PartialParseResult =
          Right((args, x))
        override def usage: String = ""
      }

      override def ap[A, B]
        (ff: PrintingParser[A => B])
        (fa: PrintingParser[A]):
        PrintingParser[B]
        = new PrintingParser[B] {
          override protected def partialParse(args: List[String]): PartialParseResult =
            for {
              (nextArgs, a) <- fa.partialParse(args)
              (nextNextArgs, function) <- ff.partialParse(nextArgs)
            } yield (nextNextArgs, function(a))

          /* If we were to join both usage strings together with a space
           * this would break the Monad identity laws
           * Adding no space if either string is empty makes the Applicative lawful 
           * (to the best of my understanding)
           */
          override def usage: String = (ff.usage, fa.usage) match {
            case ("", "")   => ""
            case ("", snd)  => snd
            case (fst, "")  => fst
            case (fst, snd) => s"$fst $snd"
        }
      }
    }
}

Because Applicative doesn’t provide flatMap, the code to compose our parsers together looks a little different.

import cats.syntax.all._

case class Person(tall: Boolean, firstname: String, surname: String)

val personParser = (
    Flag("tall"),
    StringOption("firstname"),
    StringOption("surname")
  ).mapN(Person)

mapN from cats, lets us compose a tuple full of applicatives of the same kind together.

I think it’s up for debate whether or not the Applicative or Monadic style of Person parser is more readable. But one thing that I do find noteworthy, is that Scala contains a fair amount of syntactic sugar for working with Monads, in the form of for comprehensions. By comparison, Applicatives don’t have any syntactic sugar, and are build using the “standard” syntax of the Scala language, despite this, people have been able to define operators that allow for an elegant style when working with Applicatives.

The same statement could be made about Haskell, and “do notation” (Haskell syntactic sugar similar to a for comprehension) vs the <$> and <*> opperators.

Applicative Shortcomings

The downside to working with Applicative rather than Monad, is that it does limit the kinds of Parser that it’s possible to write.

With a Monadic parser, it’s possible to use a result returned early in the parse, in order to decide which of two different parsers to use. This may be best explained by example, let’s throw away our Person case class, and make a more complex Algebraic Data Type (ADT):

sealed trait Person {
  def firstname: String
}
case class PersonWithoutSurname(firstname: String) extends Person
case class PersonWithSurname(firstname: String, surname: String) extends Person

In this ADT, a Person always has a firstname and may have a surname. This could possibly be represented by two different styles of command line argument

--has-surname --firstname John --surname Doe
--firstname John

And we could implement a Parser for this like so:

val personParser = for {
  hasSurname <- Flag("has-surname")
  firstname <- StringArg("firstname")
  person: Person <- if (hasSurname) {
    StringArg("surname").map( surname =>
      PersonWithSurname(firstname, surname)
    )
  } else {
    PersonWithoutSurname(firstname).pure[Parser]
  }
} yield person

It is possible to make optional arguments without requiring a Monad, (for example, using typeclasses such as Alternative, which I won’t cover here).

With that said, the general problem of wanting to vary the Parser used as a function of the values produced earlier on cannot be solved without a Monad.

There is a trade off between Applicative which makes it possible to define classes with more functionality (such as Parsers with usage strings), vs Monad, which gives us more freedom in how we compose our classes together.

This trade off still applies in languages without Monads, Applicatives and other Higher Kinded Types (HKTs). It’s equally possible to define a Parser such as this one in a dynamically typed language such as Python. But it would not be possible to provide both an automatic usage string, and flatMap style composition as part of the same interface.

One of my favourite things about statically typed functional programming, is that it makes this kind of trade off easy to spot.

Appendix

If you want to run the code in this blogpost, it may be helpful to see the build.sbt with the versions that were used to write it:

addCompilerPlugin("com.olegpy" %% "better-monadic-for" % "0.3.1")

lazy val root = (project in file("."))
  .settings(
    name := "parsing-blogpost",
    libraryDependencies += "org.typelevel" %% "cats-core" % "2.1.0",
    libraryDependencies += "org.scalatest" %% "scalatest" % "3.1.1" % "test"
  )

The code for this blogpost is available on BitBucket, here.

If you liked this blogpost, and can read Haskell, you may be interested in the optparse-applicative library. While optparse-applicative contains vastly more functionality than the parser discussed here, it did provide the inspiration for this post.