Nemerle type-safe macros

1. Intro

You can think about macros as system of compile-time transformations and automatic generation of code with regard to some rules. It can be used either to automate manipulations performed on similar data-types and fragments of code, add syntax shorcuts to language, optimize and make some computations safer by moving them from runtime to compile-time.

Idea of making simple inline operations on code comes from preprocessor macros, which many languages (especially C, C++) contained since early times of compiler design. We are following them in direction of much more powerful, and at the same time more secure (type-safe), solutions like Haskell Template Meta-programming.

2. Key features

3. What exactly macro is?

Basically every macro is a function, which takes some fragment of code as parameter(s) and returns some other code. On the highest level it doesn't matter if they are type definitions, function calls or just a sequence of assignments. Most important fact is that they are not common objects (e.g. instances of defined types, like integer numbers), but their internal representation in compiler (i.e. syntax tree).

Those functions are defined in program just like any other functions. They are written in common Nemerle syntax and the only difference is the structure of data they operate on (we provide special ways to parse and generate syntax trees).

Macros, once defined, can be used to process some part of the code. It's done by calling them with block(s) of code as parameters. This operation is in most cases indistinguishable from common function calls, so programmer using macros won't be confused by unknown syntax. Main concept of our design is to make usage of macros as much transparent as possible. From the user point of view, it is not important if particular parameters are passed to ordinary function or one, which would process them at compile-time and insert some new code in their place.

4. Defining new macro

Writing a macro is as simple as writing common function, except it is proceeded by keyword macro. This will make compiler know about how to use defined method (i.e. run it at compile-time when it is called).

Macros can take zero (if we just want to generate new code) or more parameters. They are all some of elements of language grammar, so they have type limited to the set of defined syntax objects. The same holds for return value of macro.

Example:

macro generate_expression ()
{
  compute_some_expession ();
}

This example macro doesn't take any parameters and it's used in code by simply writing generate_expression ();. Most important is a difference between generate_expression and compute_some_expression - first one is a function executed by compiler during compile-time, while latter is just some common function that must return syntax tree of expression (which is here returned and inserted into program's code by generate_expression).

5. Operating on syntax trees

Definition of function compute_some_expression might look like this:

compute_some_expression () : Expr 
{
  if (debug_on) 
    <[ System.Console.WriteLine ("Hello, I'm debug message") ]>
  else
    <[ () ]>
}

The examples above shows macro, which conditionally inlines expression printing some message. It's not quite useful yet, but it introduced meaning of compile-time computations and also some new syntax used only in writing macros. We've written here <[ ... ]> constructor to build syntax tree of expression (e.g. '()').

5.1. Quotation operator

<[ ... ]> is used to both construction and decomposition of syntax trees. Those operations are similar to quotation of code. Simply, everything which is written inside <[ ... ]>, corresponds to its own syntax tree. It can be any valid Nemerle code, so programmer doesn't have to learn internal representation of syntax trees in compiler.

macro print_date (at_compile_time)
{                   
  match (at_compile_time) {
    | <[ true ]> => print_compile_time ()
    | _ => <[ WriteLine (DateTime.Now.ToString ()) ]>
  }
}

Quotation alone allows using only constant expressions, which is insufficient for most tasks. For example, to write function print_compile_time we must be able to create expression based on value known at compile-time. In next sections we introduce rest of macros' syntax to operate on general syntax trees.

5.2. Matching subexpressions

When we want to decompose some large code (or more precisely, its syntax tree), we must bind its smaller parts to variables. Then we can process them recursively or just use them in arbitrary way to construct the result.

We can operate on entire subexpressions by writing $( ... ) or $ID inside quotation operator <[ ... ]>. This means binding value of ID or interior of parenthesized expression to part of syntax tree described by corresponding quotation.

macro for (init, cond, change, body)
{
  <[ 
    $init;
    def loop () : void {
      if ($cond) { $body; $change; loop() } 
      else ()
    };
    loop ()
  ]>
}

The above macro defines function for, which is similar to the loop known from C. It can be used like this

for (mutable i <- 0; i < 10; i <- i + 1, printf ("%d", i))

Later we show how to extend language syntax to make syntax of for exactly the same like in C.

5.3. Base elements of grammar

Sometimes quoted expressions have literals inside of them (like strings, integers, etc.) and we want to operate on their value, not on their syntax trees. It's possible, because they are constant expressions and their runtime value is known during compie-time.

Let's consider previously used function print_compile_time.

print_compile_time () : Expr
{                   
  <[ System.Console.WriteLine ($(DateTime.Now.ToString () : string)) ]>
}

Here we see some new extension of splicing syntax where we create syntax tree of string literal from some known value. It is done by adding : string inside of $(...) construct. One can think about it as of enforcing type of spliced expression to literal (similar to common Nemerle type enforcement), but in the matter of fact something more is happening here - real value is lifted to its representation as syntax tree of literal.

Other types of literals are treated in the same way (int, bool, float, char). This notation can be used also in pattern matching. We can match constant values in expressions this way.

There is also similar schema for splicing and matching variables of given name. $(v : var) denotes variable, whose name is equal to value of v (which is of type string).