Whenever I write “language” I implicitly mean “programming langauge”.

Dependent Types

Metaprogramming

Metaprogramming is the design of programs that generate code meant to be used at the same of a similarly high level of abstraction as the metaprogram’s language. For example, the popular languages Python and Javascript each provide a built-in function called eval that takes as input a string and then tries to interpret that string as a piece of code in the respective language (here, “interpret” refers to immediately executing the code a high level rather than first compiling that code to a low-level representation, usually C or Assembly).

Metaprogramming is the ultimate abstraction, since (almost) any pattern of code at all can be generated by a metaprogram of the same or smaller size (asymptotically, at the very least). However, metaprogramming can also yield some severe side-effects.

  1. Generated code is not directly visible while programming, so there is an additional layer of obfuscation between programming and feedback.
  2. Metaprograms usually only have limited access to static information, which inferred by the compiler, about the programs to be generated. On top of this, metaprograms are not usually checked to make sure they generate valid programs. Python and Javascript’s eval function know that their argument is a string and nothing more!

For example, Template Haskell provides a monad Q :: * -> * which is the type of computations that produce Haskell code. Here is a metaprogram that, at compile-time, generates a function that sums n arguments. Without metaprogramming, there is no basic functionality for variable-arity functions in Haskell.

addN :: Int -> Q Exp
addN n = do
  xs <- replicateM n (newName "x")
  lamE (varP <$> xs) (foldM (\e v -> [| $e + $v |]) [| 0 |] xs)

n :: Int
n = $(addN 2) 1 2
-- n == 1 + 2
-- n == 3

Embedded Languages

The definition of a language consists of its syntax and semantics. The syntax of a language is the specification for well-formed programs in the language. The semantics of a language is a mapping from it’s syntax to computational behaviors.

Usually, the semantics of a language are represented as a function from the syntax of a higher-level language to the syntax of a lower-level langauge. For example, a compiled language (e.g. C++, Haskell) has a corresponding compiler which encodes semantics with a function, compile, from the language syntax to assembly code. Then the assembly code is compiled into executable bytecode. And finally hte executable bytecode is “compiled” into basic computational operations. Beyond this is an descent into the physical implementation of the computer, and further into the physical processes that make up the physical execution.

Metaprogramming slighly confuses this setup, however. A metaprogram is a program that can actually refer to the syntax of itself by representing the program as a string, and it can refer to the semantics of itself by calling the eval function. So, the hierarchy described in the previous section can have cycles.

Even without metaprogramming though, we can interface directly with this hierarchy via an embedded language. An embedded language is a language where its syntax and semantics are defined entirely within a higher-level language. A compiled language is almost like an embedded language within its compiler’s language, but it is not because the compiler’s language is not a (usually) a lower-level language than the compiled language. So, embedded languages are sort of like reverse compiled languages.

As a very simple example, here is a way of embedding the untyped lambda calculus within Haskell:

data Term = Var String | Lam String Term | App Term Term
  deriving (Show)

type Ctx = String -> Maybe Term

eval :: Ctx -> Term -> Maybe Term
eval ctx (Var x) = ctx x
eval ctx (Lam x b) = Just (Lam x b)
eval ctx (App (Lam x b) a) = eval (\y -> if x == y then b else ctx y) b
eval ctx (App _ _) = Nothing

example1 :: Maybe Term
example1 = eval (\_ -> Nothing) (App (Lam "x" (Var "x")) (Var "y"))
-- example1 = eval ((λ x . x) y)
-- example1 == Just (Var "y")

example2 :: Maybe Term
example2 = eval (\_ -> Nothing) (App (Var "x") (Var "y"))
-- example2 = eval (x y)
-- example2 == Nothing

example3 :: Maybe Term
example3 = eval (\_ -> Nothing) (App (Lam "x" (Var "y")) (Var "z"))
-- example3 = eval ((λ x . y) z)
-- example3 == Nothing

The datatype Term defined the syntax of the language, and the function eval defined the syntax of the language. Note that, since eval returns a Maybe Term, it can be considered partial on the entire syntax domain. That is, some syntactically-valid programs are not meaningful i.e. they don’t have semantics. For example, example2 tries to apply a non-lambda, and example3 refers to an unbound variable.

So, what’s the point of embedded languages? Two interesting use cases are the following:

  1. A language (or, at least, parts of it) can be embedded within itself to facilitate metaprogramming capabilities. Rather than just attempt to parse unstrucutred strings like JavaScript and Python’s eval, languages like Haskell (via Template Haskell) and C++ (via templates) offer a metaprogramming interface for manipulate structured representations of the language’s syntax. This makes writing and reasoning about metaprograms somewhat easier.
  2. A language of interest can be embedded within a proof assistant language, such as Agda or Coq, in order for the programmer to write proofs about the syntax and semantics of programs in the embedded language. This is useful for verifying properties of important programs in the embedded language, such as compilers.

In the following section, I will introduce another use case, mesaprogramming, which is a combination of these two use cases.

Mesaprogramming

The usual technique of metaprogramming (“above-programming”) is to provide a special interface for invoking code generation (e.g. templates as a built-in or library feature). The main issue with this is that it requires a new metaprogramming language to be built on top of the base language, which comes with complicated language features and usually little to none of the safety features of the original language. This is a serious issue because writing correct metaprograms is very difficult, especially when their output is not easily accessible.

Mesaprogramming (“below-programming”) is an alternative technique that can be implemented in any language, and is most powerful in proof assistant languages. Rather than write a metaprogramming language on top of the base language, the technique of mesaprogramming is to define an embedded mesaprogramming language inside of the base language as a formalization of a fragment of the base language (defining the entire language within itself comes with many issues: TODO).

For example, variable arity functions in Haskell:

{-# LANGUAGE DataKinds #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE TypeFamilies #-}

data Nat = Zero | Suc Nat

data SNat :: Nat -> * where
  SZero :: SNat Zero
  SSuc :: SNat n -> SNat (Suc n)

n0 :: SNat Zero
n0 = SZero

n1 :: SNat (Suc Zero)
n1 = SSuc SZero

n2 :: SNat (Suc (Suc Zero))
n2 = SSuc (SSuc SZero)

n3 :: SNat (Suc (Suc (Suc Zero)))
n3 = SSuc (SSuc (SSuc SZero))

type family EndoN (a :: *) (n :: Nat) :: * where
  EndoN a Zero = a
  EndoN a (Suc n) = a -> EndoN a n

-- given a (singleton for) natural number `n`, is an `n`-ary function that adds its inputs
addN :: SNat n -> EndoN Int n
addN SZero = 0
addN (SSuc SZero) = \x -> x
addN (SSuc (SSuc n)) = \x y -> addN (SSuc n) (x + y)

For example, TODO: example of metaprogramming in python