In the previous instalments of this series, we’ve introduced two core components of the programs as values paradigm: datatypes that represent effects, and functions that combine instances of those datatypes in order to construct programs by describing control flow explicitly.
Today we will formalise and clarify this structure, by introducing the concept of an algebra.
Note: In an unfortunate abuse of notation, the word “algebra” is used with a few different meanings in the FP world. See the Appendix for more details on the terminology.
An algebra is a structure consisting of four components:
A single, concrete type
T
, which is sometimes called the carrier of the algebra. Depending on the algebra, it can have different shapes:V
,F[A]
,G[E, R]
, etc. The carrier type is the datatype that represents the particular effect its algebra encodes.A set of functions which return values of the carrier type
T
, called introduction forms. These functions can all take different inputs, but what characterises them as introduction forms is that none of their inputs is of the carrier typeT
, i.e. they construct or introduceT
. Introduction forms are the primitive programs of their algebra, the starting points from which we can build more complex logic.A set of functions which return values of the carrier type
T
, called combinators. These functions can all take different inputs, but at least one of those inputs needs to be of the carrier typeT
. In other words, a combinator takes one or more programs inT
, and returns another program inT
.
Combinators are the essential building blocks of programs as values we’ve been talking about so far, they take smaller programs and construct bigger programs, often by explicitly describing some form of control flow. Most of the logic in your code will be expressed through combinators.A set of functions called elimination forms. They can all take different inputs, but one of those inputs needs to be of the carrier type
T
. Furthermore, their output needs to be a different type fromT
, i.e. they eliminateT
.
Informally, elimination forms encode the concept of “running” a program written in a specific algebra after we’re done constructing it. The concept of running is however expressed as a translation from the carrier typeT
to another type.
In some cases, algebras also have a fifth component: laws, i.e. equalities between programs written in the algebra. We won’t be talking much about laws in this post, but they will be more prominent later.
So to recap, to build a program in a given algebra we start from the
introduction forms, and use combinators to express all the logic we
need. Once we’re done, we run it by translating it to another type via
one of the elimination forms. Another way to look at it is that each
algebra describes a mini language: we write programs in that
language, and then translate them to another language, all the way to
the most general language of all, which is, as we will see, IO
.
We’re now ready to look at some concrete examples.
The Doc algebra
For our first example, we’re going to look at paiges, a pretty printing library that lets you generate text that needs to be well formatted, with paragraph wrapping at different line lengths. The entire library is centred around a document algebra, so let’s look at its components in more detail.
The carrier type of the algebra is called Doc
, which means we
represent documents as programs of type Doc
, like val myDoc: Doc
.
It might sound strange that we’re referring to a static artifact such
as a document as a “program”, but you will see that the mindset we’re
working in doesn’t change that much once we move to dynamic behavior
with types such as cats.effect.IO
or cats.parse.Parser
.
The introduction forms are the initial building blocks, as per their
definition they create values of type Doc
without requiring any
value of type Doc
as input.
Here are a few:
Doc.text: String => Doc
Doc.char: Char => Doc
Doc.spaces: Int => Doc
Doc.comma: Doc
Doc.line: Doc
... // and more
Note that I’m showing them as functions to make the types clearer, but they are generally methods in Scala, so:
val char: Char => Doc
// is more typically written as
def char(c: Char): Doc
also, they tend to appear in the companion object of the carrier type, but don’t use that as a strict rule, sometimes combinators are there too.
Introduction forms have shape (A, B, ..) => Doc
, so you might be
wondering where do primitive values such as comma
or line
fit. You
can think of them as introduction forms from Unit
to the carrier:
val comma: Doc
// can be thought as
val comma: () => Doc
Let’s now look at combinators, functions that take one or more values
of type Doc
and return a value of type Doc
. Combinators will
express the lion’s share of our logic, so much so that libraries based
on algebras are often called combinator libraries. Here are a few:
indent: (Doc, Int) => Doc
line: (Doc, Doc) => Doc
bracketBy: (Doc, Doc, Doc, Int) => Doc
aligned: Doc => Doc
stack: Iterable[Doc] => Doc
... // and more
Again, I wrote them as standalone functions to make the types clearer,
in reality they are written as instance methods, which obscures the
fact that at least one of their inputs (this
) is of type Doc
:
sealed trait Doc {
def indent(i: Int): Doc
def line(that: Doc): Doc
def bracketBy(left: Doc, right: Doc, indent: Int = 2): Doc
def aligned: Doc
...
object Doc {
def stack(ds: Iterable[Doc]): Doc
...
Doc.stack
is a notable exception, it cannot be an instance method on
Doc
because it operates on a collection of docs. The same
intuition about combinators applies though: stack
combines programs
in the Doc
algebra into a bigger program in the Doc
algebra.
Finally, eliminations forms translate Doc
programs to another type:
isEmpty: Doc => Boolean
maxWidth: Doc => Int
render: (Doc, Int) => String
with the usual caveat that they are defined as instance methods:
sealed trait Doc {
def isEmpty: Boolean
def maxWidth: Int
def render(width: Int): String
Equipped with this knowledge, we can now write Doc
programs:
import org.typelevel.paiges.Doc
val openBracket: Doc = Doc.char('[')
val closeBracket: Doc = Doc.char(']')
val north: Doc = Doc.text("NORTH")
val east: Doc = Doc.text("EAST")
val south: Doc = Doc.text("SOUTH")
val west: Doc = Doc.text("WEST")
val directions: Doc =
Doc
.stack(List(north, east, south, west))
.bracketBy(openBracket, closeBracket)
In the above, I’ve separated usages of introductions forms from combinators for extra clarity, but remember that referential transparency holds, so you can inline definitions or abstract them out freely. We can keep composing to build more complex programs:
val output: Doc =
Doc
.text("Directions:")
.line(directions.indent(2))
.bracketBy(Doc.space, Doc.space)
and once we’re done, we can “run” our Doc
program by translating it
to String
via the render
elimination form:
output.render(width = 30)
// res0: String = """
// Directions:
// [ NORTH EAST SOUTH WEST ]
// """
output.render(width = 15)
// res1: String = """
// Directions:
// [
// NORTH
// EAST
// SOUTH
// WEST
// ]
// """
The Out algebra
Let’s now look at a made up algebra we’re going to call Out
, whose
only capability is to print Strings to stdout. It should provide a
first, simple example of encoding behaviour, which is more in line
with how we think about the word “program”.
Here’s how it looks like:
// Carrier type
Out
// Introduction forms
print: String => Out
// Combinators
andThen: (Out, Out) => Out
// Elimination forms
run: Out => IO[Unit]
except we’re going to assume it’s written idiomatically, with
andThen
as an instance method and print
as a method in the
companion object of Out
.
Let’s write a program with it:
object Logic {
val printNewline: Out = Out.print("\n")
val helloWorld: Out =
Out
.print("Hello, world!")
.andThen(printNewLine)
.andThen(Out.print("I'm a program!"))
}
object Main extends IOApp.Simple {
def run: IO[Unit] = Logic.helloWorld.run
}
Main.main(Array())
// Hello, world!
// I'm a program!
So the Out
algebra lets us create simple imperative programs that
are akin to:
print("tea")
print("coffee")
except we’re in programs as values, so we can easily write compositional code as programs that manipulate other programs. Here’s a basic example, we will see much more interesting ones once we work with algebras that encode complex control flow:
def repeat(p: Out, n: Int): Out =
if n <= 0 Out.print("")
else p.andThen(repeat(p, n - 1))
It’s significant that we can write Out
programs without any
knowledge of its internal structure: we didn’t do any pattern matching
on the concrete case classes that form Out
, but only used the
operations defined on it, or, in other words, its algebra. This style of
algebraic thinking is very valuable, because it scales from
datatypes with very simple internal structure, such as Option
, all
the way to highly sophisticated datatypes such as IO
or
fs2.Stream
.
Finally, note that there really is no difference in mechanics nor
mindset between imperative-looking algebras such as Out
, and
algebras that encode static data such as Doc
. In fact, we can
literally implement Out
with Doc
:
type Out = Doc
def print(s: String): Out = Doc.text(s)
def andThen(this: Out, that: Out): Out = this + that
def run(p: Out): IO[Unit] = IO.println(p.render(Int.MaxValue))
Conclusion
In this article, we’ve talked about algebras, the fundamental structure at the heart of programs as values. Next time is when things really get interesting, as we’re going to enrich our algebras with the ability to handle outputs. See you then!
Appendix on terminology
Terminology can be a source of confusion, since it’s used loosely and inconsistently and mostly reflects historical connections between fields and communities.
“Algebra” is a particularly overloaded term: the meaning we used in the article, which also appears in “Boolean algebra” and “Algebraic Data Types”, originates from a branch of Maths called Universal Algebra. It’s defined as:
- A set
A
, called the carrier. - Some finitary operations on
A
: functions that take tuples of elements ofA
and return an element ofA
.
which we called “carrier” and “combinators” respectively. The terms “introduction forms” and “elimination forms” come from logic via type theory instead.
Universal Algebra also introduces the concept of a signature, which defines the set of typed operation symbols without specifying the functions that would be the actual operations. Signatures correspond roughly to the notion of “interface” in programming, and are often encoded in Scala via typeclasses.
The issue is that FP terminology uses the word “algebra” to also mean “signature”, for example in the phrase “algebras and interpreters” which roughly corresponds to “abstractions and implementations” (the usage of the word “interpreter” comes from the theory of embedded domain specific languages, or eDSLs).