Mesaprogramming
Published: 2021-08-24
Tags: computics
Abstract
In general, metaprogramming is implemented in a type-unsafe way i.e. it ignores type information in generated code. This is often considered satisfactory since type-checking is still performed at compile-time, so code generation cannot introduce new runtime errors that would be caught at compile-time by type-checking the generated code. However, a type-safe approach offers a much more robust way to provide expressive metaprogramming capabilities in a way that naturally parallels the use of the base language. For example. this approach extends naturally to safe tactics.
Table of Contents
Whenever I write "language" I implicitly mean "programming langauge".
Dependent Types
Metaprogramming
Metaprogramming is the design of programs that generate code
meant to be used at the same of a similarly high level of abstraction as
the metaprogram's language. For example, the popular languages Python
and Javascript each provide a built-in function called eval
that takes as input a string and then tries to interpret that
string as a piece of code in the respective language (here, "interpret"
refers to immediately executing the code a high level rather than first
compiling that code to a low-level representation, usually C or
Assembly).
Metaprogramming is the ultimate abstraction, since (almost) any pattern of code at all can be generated by a metaprogram of the same or smaller size (asymptotically, at the very least). However, metaprogramming can also yield some severe side-effects.
- Generated code is not directly visible while programming, so there is an additional layer of obfuscation between programming and feedback.
- Metaprograms usually only have limited access to static information,
which inferred by the compiler, about the programs to be generated. On
top of this, metaprograms are not usually checked to make sure they
generate valid programs. Python and Javascript's
eval
function know that their argument is a string and nothing more!
For example, Template Haskell provides a monad
Q :: * -> *
which is the type of computations that
produce Haskell code. Here is a metaprogram that, at compile-time,
generates a function that sums n
arguments. Without
metaprogramming, there is no basic functionality for variable-arity
functions in Haskell.
addN :: Int -> Q Exp
= do
addN n <- replicateM n (newName "x")
xs <$> xs) (foldM (\e v -> [| $e + $v |]) [| 0 |] xs)
lamE (varP
n :: Int
= $(addN 2) 1 2
n -- n == 1 + 2
-- n == 3
Embedded Languages
The definition of a language consists of its syntax and semantics. The syntax of a language is the specification for well-formed programs in the language. The semantics of a language is a mapping from it's syntax to computational behaviors.
Usually, the semantics of a language are represented as a function
from the syntax of a higher-level language to the syntax of a
lower-level langauge. For example, a compiled language (e.g. C++,
Haskell) has a corresponding compiler which encodes semantics with a
function, compile
, from the language syntax to assembly
code. Then the assembly code is compiled into executable bytecode. And
finally hte executable bytecode is "compiled" into basic computational
operations. Beyond this is an descent into the physical implementation
of the computer, and further into the physical processes that make up
the physical execution.
Metaprogramming slighly confuses this setup, however. A metaprogram
is a program that can actually refer to the syntax of itself by
representing the program as a string, and it can refer to the semantics
of itself by calling the eval
function. So, the hierarchy
described in the previous section can have cycles.
Even without metaprogramming though, we can interface directly with this hierarchy via an embedded language. An embedded language is a language where its syntax and semantics are defined entirely within a higher-level language. A compiled language is almost like an embedded language within its compiler's language, but it is not because the compiler's language is not a (usually) a lower-level language than the compiled language. So, embedded languages are sort of like reverse compiled languages.
As a very simple example, here is a way of embedding the untyped lambda calculus within Haskell:
data Term = Var String | Lam String Term | App Term Term
deriving (Show)
type Ctx = String -> Maybe Term
eval :: Ctx -> Term -> Maybe Term
Var x) = ctx x
eval ctx (Lam x b) = Just (Lam x b)
eval ctx (App (Lam x b) a) = eval (\y -> if x == y then b else ctx y) b
eval ctx (App _ _) = Nothing
eval ctx (
example1 :: Maybe Term
= eval (\_ -> Nothing) (App (Lam "x" (Var "x")) (Var "y"))
example1 -- example1 = eval ((λ x . x) y)
-- example1 == Just (Var "y")
example2 :: Maybe Term
= eval (\_ -> Nothing) (App (Var "x") (Var "y"))
example2 -- example2 = eval (x y)
-- example2 == Nothing
example3 :: Maybe Term
= eval (\_ -> Nothing) (App (Lam "x" (Var "y")) (Var "z"))
example3 -- example3 = eval ((λ x . y) z)
-- example3 == Nothing
The datatype Term
defined the syntax of the language,
and the function eval
defined the syntax of the language.
Note that, since eval
returns a Maybe Term
, it
can be considered partial on the entire syntax domain. That is, some
syntactically-valid programs are not meaningful i.e. they don't have
semantics. For example, example2
tries to apply a
non-lambda, and example3
refers to an unbound variable.
So, what's the point of embedded languages? Two interesting use cases are the following:
- A language (or, at least, parts of it) can be embedded within itself
to facilitate metaprogramming capabilities. Rather than just attempt to
parse unstrucutred strings like JavaScript and Python's
eval
, languages like Haskell (viaTemplate Haskell
https://wiki.haskell.org/Template_HaskellPlaceholder description for https://wiki.haskell.org/Template_Haskell) and C++ (via
templates
https://docs.microsoft.com/en-us/cpp/cpp/templates-cppPlaceholder description for https://docs.microsoft.com/en-us/cpp/cpp/templates-cpp) offer a metaprogramming interface for manipulate structured representations of the language's syntax. This makes writing and reasoning about metaprograms somewhat easier.
- A language of interest can be embedded within a proof assistant
language, such as Agda or Coq, in order for the programmer to write
proofs about the syntax and semantics of programs in the embedded
language. This is useful for verifying properties of important programs
in the embedded language, such as
compilers
https://compcert.orgPlaceholder description for https://compcert.org.
In the following section, I will introduce another use case, mesaprogramming, which is a combination of these two use cases.
Mesaprogramming
The usual technique of metaprogramming ("above-programming") is to provide a special interface for invoking code generation (e.g. templates as a built-in or library feature). The main issue with this is that it requires a new metaprogramming language to be built on top of the base language, which comes with complicated language features and usually little to none of the safety features of the original language. This is a serious issue because writing correct metaprograms is very difficult, especially when their output is not easily accessible.
Mesaprogramming ("below-programming") is an alternative technique that can be implemented in any language, and is most powerful in proof assistant languages. Rather than write a metaprogramming language on top of the base language, the technique of mesaprogramming is to define an embedded mesaprogramming language inside of the base language as a formalization of a fragment of the base language (defining the entire language within itself comes with many issues: TODO).
For example, variable arity functions in Haskell:
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE TypeFamilies #-}
data Nat = Zero | Suc Nat
data SNat :: Nat -> * where
SZero :: SNat Zero
SSuc :: SNat n -> SNat (Suc n)
n0 :: SNat Zero
= SZero
n0
n1 :: SNat (Suc Zero)
= SSuc SZero
n1
n2 :: SNat (Suc (Suc Zero))
= SSuc (SSuc SZero)
n2
n3 :: SNat (Suc (Suc (Suc Zero)))
= SSuc (SSuc (SSuc SZero))
n3
type family EndoN (a :: *) (n :: Nat) :: * where
EndoN a Zero = a
EndoN a (Suc n) = a -> EndoN a n
-- given a (singleton for) natural number `n`, is an `n`-ary function that adds its inputs
addN :: SNat n -> EndoN Int n
SZero = 0
addN SSuc SZero) = \x -> x
addN (SSuc (SSuc n)) = \x y -> addN (SSuc n) (x + y) addN (
For example, TODO: example of metaprogramming in python