Multi-Stage Programming
Evaluating Programs
In the previous section,
we saw how to construct and compose code fragments using code quasiquotes.
It is possible to evaluate or execute the resulting code at runtime.
One way of doing this is to use the run
function:
scala> val cde = { val a = code"2" ; code"($a + $a) * 2" }
cde: IR.ClosedCode[Int] = code"(2).+(2).*(2)"
scala> cde.run
res0: Int = 8
This Squid functionality uses Java runtime reflection
to interpret the constructed program and return its value.
In order to avoid the overhead associated with interpretation,
it is also possible to compile the code at runtime
before executing the resulting Java bytecode,
which will be as fast executing normally-compiled code.
This is done via the compile
method:
scala> cde.compile
res1: Int = 8
Note: Squid can also transform and optimize programs at compile time, which avoids the overhead of runtime compilation (and avoids having the Scala compiler as a runtime dependency!). This possibility is explained in the Static Optimizers section of the Squid reference on transformers.
Staging The Power Function
The features we have seen allow what is commonly known as multi-stage programming (MSP), whereby we separate the execution of a program into distinct code-generation phases or stages. Each stage executes some parts of the program that are known at this stage, and delays the execution of the rest by emitting code for it. The last stage consists in a program much simpler and more efficient than the original, where most abstractions have typically been removed.
The canonical example of MSP is that of the power function.
We start from a normal, inefficient implementation of the power function for Double
precision numbers,
which proceeds by recursion:
scala> def power(n: Int, x: Double): Double = {
| if (n > 0) power(n-1, x) * x
| else 1.0
| }
power: (n: Int, x: Double)Double
scala> power(2, 3)
res2: Double = 9.0
And then, by simply adding staging annotations (in the form of Code
types and code
quasiquotes)
to indicate what computations should be delayed
as opposed to executed immediately (i.e., in the “current stage”),
we construct a code generator ― given any program fragment of type Double
,
we construct a sequence of multiplications of that number:
scala> def power(n: Int, x: ClosedCode[Double]): ClosedCode[Double] = {
| if (n > 0) code"${power(n-1, x)} * $x"
| else code"1.0"
| }
power: (n: Int, x: IR.Predef.ClosedCode[Double])IR.Predef.ClosedCode[Double]
scala> val pgrm = power(3, code"0.5")
pgrm: IR.Predef.ClosedCode[Double] = code"1.0.*(0.5).*(0.5).*(0.5)"
scala> pgrm.compile
res3: Double = 0.125
The example above is not very convincing,
because generating the code of pgrm
and then compiling it will certainly be slower
than if we had called the original (unstaged) power function!
In fact, the whole point of MSP is to generate and compile pieces of code that can be reused many times,
in order to amortize the cost of runtime compilation.
What we really want here is to use power
to generate partially-evaluated functions
that efficiently compute the power of any number, given a fixed exponent.
Generating Specialized Power Implementations
For any given exponent n
, for any given exponent n
we’d like to generate the code of a function
(x: Double) => x * x * ... [n times] ... * x
.
Let us try to do that naively for exponent 3
:
scala> val pow3 = code"(x: Double) => ${ power(3, code"x") }"
<console>:17: error: Embedding Error: Quoted expression does not type check: not found: value x
val pow3 = code"(x: Double) => ${ power(3, code"x") }"
^
It does not work, because x
is undefined in code"x"
,
even though it looks like we’re binding it in the outer quote.
This is a limitation of string interpolation macros in Scala.
To fix this, we need to escape the inner insertion and quotation, as follows
(due to macro limitations,
this code will not work if you defined power
in the REPL,
but it will work in a normal Scala file):
val pow3 = code"""(x: Double) => $${ power(3, code"x") }"""
Notice the use of triple-quotation """
to escape the inner quotation,
and the double-dollar sign $$
to escape the inner insertion.
On the other hand, no escapes are needed in the alternative quasicode syntax (which also works better in the REPL):
import IR.Quasicodes._
scala> val pow3 = code{ (x: Double) => ${ power(3, code{x}) } }
<console>:20: error: type mismatch;
found : IR.Code[Double,x.type]
required: IR.Predef.ClosedCode[Double]
(which expands to) IR.Code[Double,Any]
val pow3 = code{ (x: Double) => ${ power(3, code{x}) } }
^
Our code still does not work, though!
The reason is given in the error message:
our power
function expects to receive an argument of type ClosedCode[Double]
,
but we pass it a different type, Code[Double,x.type]
…
This was to be expected, since the term code"x"
is obviously not “closed”
— it contains a free variable x
, which is only bound in the outer quote.
If code"x"
was to be assigned type ClosedCode[Int]
,
Squid’s type system would become unsound as
we could easily crash the program,
for example by calling code"x".run
or by storing code"x"
in a mutable variable
and reusing it later, outside of the outer quote where x
is bound.
Thankfully, the fix is easy:
we need to make our power
function parametric in the context that it accepts:
def power[C](n: Int, x: Code[Double,C]): Code[Double,C] = {
if (n > 0) code"${power(n-1, x)} * $x"
else code"1.0"
}
We can now construct our pow3
function:
scala> val pow3 = code{ (x: Double) => ${ power(3, code{x}) } }
pow3: IR.ClosedCode[Double => Double] = code"((x_0: scala.Double) => 1.0.*(x_0).*(x_0).*(x_0))"
Dealing with Context Types
In the previous subsection,
we saw an open term of type Code[Double,x.type]
,
where the free variable x
was bound in some outer quote.
The singleton x.type
is used as a “phantom type”
to indicate a dependency on the corresponding variable.
But what happens if there are several free variables in the term?
In general, a context type is made of an intersection of context requirements,
where a context requirement can be an abstract type or type parameter C
or a specific variable requirement like x.type
.
For example, Code[Double, C & x.type & y.type]
is the type of a code fragment whose context requirement is C
augmented
with the requirements for local variables x
and y
,
where &
denotes type intersections
(equivalent to A with B
in legacy Scala syntax).
The &
notation can be imported from squid.utils
.
Here is an example where the type Code[Double, C & x.type & y.type]
comes up:
import squid.utils.&
def foo[C](ls: Code[List[Int],C]) = code {
${ls}.map(_.toString).fold("")((x, y) => ${
val res: Code[String, C & x.type & y.type] = code{x + ", " + y}
code{ println("Folding: " + y); ${res} }
})
}
Calling foo(code"List(1,2,3)")
will result in a code fragment
that maps and folds List(1,2,3)
,
which can be verified using the term equivalence method =~=
as below:
scala> foo(code"List(1,2,3)") =~= code"""
| List(1,2,3).map(_.toString).fold("")((x, y) => {
| println("Folding: " + y); x + ", " + y })"""
res4: Boolean = true
The Code
class is contravariant in its Ctx
type argument,
so that a term with a context C
can be used as a term with a more specific context C' <: C
.
For example, we have:
ClosedCode[Int] = Code[Int,Any] <: Code[Int, v.type] <: Code[Int, v.type & w.type] <: OpenCode[Int] = Code[Int,Nothing]
Note:
The current Scala compiler has an arbitrary limitation that prevents users
from writing x.type
if the type of x
is not a subtype of AnyRef
,
for example if we have x: Int
.
To work around this, Squid provides a singleton.scope
macro utility,
which is used as follows:
import squid.utils.typing.singleton.scope
scala> code{ (x: Int) => ${ val c: Code[Int, scope.x] = code{x + 1}; c } }
res5: IR.ClosedCode[Int => Int] = code"((x_0: scala.Int) => x_0.+(1))"
Last Words on MSP
Beyond the simplistic power function examples, MSP becomes truly useful when we start applying it to the construction of non-trivial, highly-polymorphic programs for which manual specialization would be too tedious or unfeasible to maintain.
Because it is based on runtime program generation and compilation, MSP allows the program generation process itself to depend on runtime input values, which enables entirely new patterns of performance-oriented software engineering.
In conclusion, MSP allows programmers to design abstract yet performant programs, sparing them the burden of having to write and maintain lots of boilerplate and repetitive code. Programmers can leverage generative techniques instead, with solid static guarantees that the generated code will be well-formed (well-typed and well-scoped).
Epilogue: Don’t Want to Track Contexts Statically?
Tracking contexts in the type system provides complete scope-safety for Squid metaprograms,
but can make metaprogramming more verbose and difficult.
Squid also supports a style of programming where one only manipulates OpenCode
fragments,
and where dynamic checks are used to convert OpenCode
to ClosedCode
when needed.
For example, below we redefine the power
function in this style:
def pow(n: Int)(v: OpenCode[Double]): OpenCode[Double] = {
var cde = if (n == 0) code"1.0" else v
for (i <- 1 until n) cde = code"$cde * $v"
cde
}
def mkPow(n: Int) = code"(x: Double) => ${pow(n) _}(x)"
In the definition of mkPow
above, we used automatic function lifting,
which converts an OpenCode[A] => OpenCode[B]
to an OpenCode[A => B]
on insertion into a program fragment.
We can then try to close
the result of mkPow(5)
at runtime, returning an Option[ClosedCode[Double => Double]]
:
scala> val pow5 = mkPow(5).close
pow5: Option[IR.ClosedCode[Double => Double]] = Some(code"((x_0: scala.Double) => x_0.*(x_0).*(x_0).*(x_0).*(x_0))")
scala> val power5 = pow5.get.compile
power5: Double => Double = __wrapper$5$9da5fc18fedb41aeab33d4a83f700f69.__wrapper$5$9da5fc18fedb41aeab33d4a83f700f69$$$Lambda$6239/393317394@24f747a4
scala> power5(1.5)
res6: Double = 7.59375
Naturally, trying to close an open term will fail and return None
.
If you are interested in learning a more advanced technique for manipulating free variables and open terms, read on. Otherwise, you can stop reading now!
Advanced Topic: First-Class Variable Symbols
In addition to often being unsafe in several contexts (see our paper [3]), traditional approaches to MSP such as MetaML and Dotty’s new quoted terms do not provide a way to explicitly manipulate variable bindings.
In Squid, not only is scope safety ensured statically using the type system, but we can manipulate variables explicitly (which will become especially useful when we start matching code fragments in the next section). This is the topic of this section.
The Variable Type
Squid allows the manipulation of first-class variable symbols,
of type Variable[T]
.
Each Variable
instance is unique,
and can be used to hygienically bind and reference named values in the constructed program.
For example, the following openCde
fragment contains an unbound reference to variable v
:
scala> val v = Variable[Int]
v: IR.Variable[Int] = Variable[Int](v@17e3a378)
scala> val openCde = code"println($v + 1)"
openCde: IR.Code[Unit,v.Ctx] = code"scala.Predef.println(v@17e3a378.+(1))"
Notice the type of openCde
, as printed in the REPL:
it is no longer a ClosedCode
, but a Code
type with a context!
This is explained in the next section.
We can now bind the v
reference in openCde
to a value definition,
and run the resulting code:
scala> val closedCde = code"val $v = 123; $openCde"
closedCde: IR.ClosedCode[Unit] =
code"""{
val v_0 = 123;
scala.Predef.println(v_0.+(1))
}"""
scala> closedCde.run
124
Note: By default, the name of a first-class variable
(as it will appear in the generated program) is inferred
from the name of the variable it is assigned to
(like “v
” above).
It is also possible to pass a name explicitly
with syntax Variable[T]("myName")
.
However,
variable names have no impact on the semantics of program manipulation,
and are just there to help debugging.
For example, two distinct variables with the same name will not be mixed up:
scala> val v0, v1 = Variable[Int]("myName") // v0 and v1 distinct but share same name
v0: IR.Variable[Int] = Variable[Int](myName@557e4c48)
v1: IR.Variable[Int] = Variable[Int](myName@7324dc07)
scala> code"($v0: Int) => ($v1: Int) => $v0 + $v1"
res8: IR.ClosedCode[Int => (Int => Int)] = code"((myName_0: scala.Int) => ((myName_1: scala.Int) => myName_0.+(myName_1)))"
Notice that the code generated by Squid in the above example contains distinct binders for the two distinct variables we used, and that there is no unintended variable capture (shadowing). This exemplifies a property of code manipulation systems known as hygiene.
Contexts and Open Terms
Open terms are terms that contain unbound variable references (i.e., free variables).
Squid quasiquotes disallow the use of undeclared variables;
for example, code"x + 1"
is illegal, because x
is unbound.
However, unbound references to first-class variable may be inserted in code fragments,
which results in open terms, like openCde
as defined in the previous section.
Squid is a scope-safe metaprogramming framework, in that it statically keeps track of open terms, making sure that they can never be mistaken for closed ones. For example, we cannot mistakenly try to evaluate an open term:
scala> openCde.run
<console>:23: error: Cannot prove that v.Ctx =:= Any.
openCde.run
^
In order to keep track of what free variables are contained in a term,
Squid code fragments have type Code[T,C]
,
where the second type parameter C
represents the context requirement of the term.
Context requirements are expressed as type intersections
where each component represents one set of context requirement.
In Scala, an empty type intersection corresponds to type Any
;
indeed, remember
that ClosedCode[T]
is a type synonym equivalent to Code[T,Any]
.
You can read it as: a ClosedCode
term is valid to use in any context.
The context requirement associated with a variable symbol v
is v.Ctx
,
where Ctx
is an abstract type member of v
(a path-dependent types),
which means that each instance of Variable
contains its own unique Ctx
type,
that cannot be mixed up with the Ctx
from an other instance.
The Code
class is contravariant in its Ctx
type argument,
so that a term with a context C
can be used as a term with a more specific context C' <: C
.
For example, we have:
ClosedCode[Int] = Code[Int,Any] <: Code[Int, v.Ctx] <: Code[Int, v.Ctx & v1.Ctx] <: OpenCode[Int] = Code[Int,Nothing]
where A & B
is the type intersection of A
and B
(which is equivalent to A with B
in legacy Scala syntax).
Free Variable Substitution
One can replace all the free occurrences of a variable v
in an open term t
by using the subs
syntax below:
scala> val t = code"$v0 + $v1 + 1"
t: IR.Code[Int,v1.Ctx with v0.Ctx] = code"myName@557e4c48.+(myName@7324dc07).+(1)"
scala> val s = t.subs(v0) ~> code"42"
s: IR.Code[Int,v1.Ctx] = code"(42).+(myName@7324dc07).+(1)"
“Renaming” a free variable v0
to v1
is achieved by
substituting free occurrences of v0
with code references to v1
, as in:
scala> t.subs(v0) ~> v1.toCode
res10: IR.Code[Int,v1.Ctx] = code"myName@7324dc07.+(myName@7324dc07).+(1)"
Note: syntax v.toCode
converts a variable v
of type Variable[T]
to a code fragment of type Code[T]
.
It is equivalent to code"$v"
.
These operations will turn out to be crucial in the Code Rewritings section of this tutorial.
Back to the Power Function
We saw that the power
function defined above, when partially applied, yields a code generator which
takes a program of type Double
and returns a program of the same type:
scala> val p3 = power(3, _ : ClosedCode[Double])
p3: IR.Predef.ClosedCode[Double] => IR.Predef.Code[Double,Any] = $$Lambda$6277/2050269171@561544a0
scala> p3(code"2.0")
res11: IR.Predef.Code[Double,Any] = code"1.0.*(2.0).*(2.0).*(2.0)"
What we would now like to have is a term of type ClosedCode[Double => Double]
,
that we can compile and execute efficiently.
To do this, we have to pass a variable reference to power
instead of a closed term,
and to allow for this, we must make power
polymorphic
in the context its base
parameter (the body of the function does not need to change):
def power[C](n: Int, x: Code[Double,C]): Code[Double,C] = {
if (n > 0) code"${power(n-1, x)} * $x"
else code"1.0"
}
We can now construct function, say for exponent 5
:
scala> val pow5 = { val x = Variable[Double]; code"{ $x => ${power(5, x.toCode)} }" }
pow5: IR.ClosedCode[Double => Double] = code"((x_0: scala.Double) => 1.0.*(x_0).*(x_0).*(x_0).*(x_0).*(x_0))"
Squid actually has some syntax sugar to allow writing the above as:
val pow5 = code"(x: Double) => ${(x:Variable[Double]) => power(5, x.toCode)}"
(It would be easy to perform some rewriting after the fact to remove the useless 1.0 *
from the generated code,
or even to partially evaluate it away automatically using an online transformer
(like in this example).
For more details on this subject, see the
documentation on transformers.)
We can now generate on the fly efficient code for calculating the 5th power of any Double
:
scala> val power5 = pow5.compile
power5: Double => (Double => Double) = __wrapper$6$1fa75d9a912841cf9cc36cd3294daaf5.__wrapper$6$1fa75d9a912841cf9cc36cd3294daaf5$$$Lambda$6303/1672054098@22c98c0
scala> power5(1.5)
res12: Double => Double = __wrapper$6$1fa75d9a912841cf9cc36cd3294daaf5.__wrapper$6$1fa75d9a912841cf9cc36cd3294daaf5$$$Lambda$6304/1902345300@191832f2
The last line will execute the function power5
,
which is really just as fast as if we had written out:
val power5 = (x: Double) => 1.0 * x * x * x * x * x