8 Memory Management Via Regions
8.1 Introduction
C gives programmers complete control over how memory is managed. An
expert programmer can exploit this to write very fast programs.
However, bugs that creep into memory-management code can cause
crashes and are notoriously hard to debug.
Languages like Java and ML use garbage collectors instead of leaving
memory management in the hands of ordinary programmers. This makes
memory management much safer, since the garbage collector is written
by experts, and it is used, and, therefore, debugged, by every
program. However, removing memory management from the control of the
applications programmer can make for slower programs.
Safety is the main goal of Cyclone, so we provide a garbage collector.
But, like C, we also want to give programmers as much control over
memory management as possible, without sacrificing safety. Cyclone's
region system is a way to give programmers more explicit control over
memory management.
In Cyclone, objects are placed into regions. A region is simply an
area of memory that is allocated and deallocated all at once (but not for
our two special regions; see below). So to deallocate an object, you
deallocate its region, and when you deallocate a region, you deallocate all
of the objects in the region. Regions are sometimes called ``arenas'' or
``zones.''
Cyclone has five kinds of regions:
-
Block regions
- As in C, local variables are allocated on the
runtime stack; the stack grows when a block is entered, and it
shrinks when the block exits. We call the area on the stack
allocated for the local variables of a block the block region
of the block. A block region has a fixed size---it is just large
enough to hold the locals of the block, and no more objects can be
placed into it. The region is deallocated when the block containing
the declarations of the local variables finishes executing. With
respect to regions, the parameters of a function are considered
locals---when a function is called, its actual parameters are placed
in the same block region as the variables declared at the start of
the function.
- Growable regions
- Cyclone also has growable regions,
which are regions that you can add objects to over time. You create
a growable region in Cyclone with a statement,
region identifier; statement
This declares and allocates a new dynamic region, named
identifier, and executes statement. After
statement finishes executing, the region is deallocated.
Within statement, objects can be added to the region, as we
will explain below.
Typically, statement is a compound statement:
{ region identifier;
statement1
...
statementn
}
- The heap
- Cyclone has a special region called the heap.
There is only one heap, and it is never deallocated. New objects
can be added to the heap at any time (the heap can grow). Cyclone
uses a garbage collector to automatically remove objects from the
heap when they are no longer needed. You can think of garbage
collection as an optimization that tries to keep the size of the
heap small. (Alternatively, you can avoid garbage collection all
together by specifying the -nogc flag when building the
executable.)
- Dynamic regions
- Block and growable regions obey a strictly
last-in-first-out (LIFO) lifetime discipline. This is often
convenient for storing temporary data, but sometimes, the lifetime
of data cannot be statically determined. Such data can be allocated
in a dynamic region. A dynamic region supports deallocation
at (esentially) any program point. However, before the data in a
dynamic region may be accessed, the dynamic region must be opened.
The open operation fails by throwing an exception if the dynamic
region has already been freed. Note that each data access within
a dynamic region does not require a check. Rather, you can open
a given dynamic region once, access the data many times with no
additional cost, and then exit the scope of the open. Thus,
dynamic regions amortize the cost of checking whether or not data
are still live and localize failure points.
- The unique region
- Cyclone has another special region called the
unique region. The unique region type is `U, meaning that
its pointers (so-called unique pointers) look like,
e.g. int *`U. The unique region is like the heap in that it can
grow arbitrarily, it is never deallocated en masse, and that it
uses garbage collection to free unreachable memory. In addition,
individual objects objects inside of the unique region can be freed
explicitly using ufree. For freeing objects to be sound, we
impose restrictions on how pointers into the unique region can be used.
Objects outside of the heap and unique region live until their region is
deallocated; there is no way to free such an object earlier. Objects in the
heap or unique region can be garbage collected once they are unreachable
(i.e., they cannot be reached by traversing pointers) from the program's
variables. Objects in other regions always appear reachable to the garbage
collector (so everything reachable from them appears reachable as well).
Cyclone forbids following dangling pointers. This restriction is part of
the type system: it's a compile-time error if a dangling pointer (a pointer
into a deallocated region or to a deallocated object) might be followed.
There are no run-time checks of the form, ``is this pointing into a live
region?'' As explained below, each pointer type has a region and objects of
the type may only point into that region.
8.2 Allocation
You can create a new object on the heap using one of a few kinds of
expression:
-
new expr evaluates expr, places the
result into the heap, and returns a pointer to the result. It is
roughly equivalent to
t @ temp = malloc(sizeof(t)); // where t is the type of expr
*temp = expr;
For example, new 17 allocates space for an integer on the
heap, initializes it to 17, and returns a pointer to the space. For
another example, if we have declared
struct Pair { int x; int y; };
then new Pair(7,9) allocates space for two integers on the
heap, initializes the first to 7 and the second to 9, and returns a
pointer to the first.
- new array-initializer allocates space for an
array, initializes it according to array-initializer, and
returns a pointer to the first element. For example,
let x = new { 3, 4, 5 };
declares a new array containing 3, 4, and 5, and initializes
x to point to the first element. More interestingly,
new { for identifier < expr1 : expr2 }
is roughly equivalent to
unsigned int sz = expr1;
t @ temp = malloc(sz * sizeof(t2)); // where t is the type of expr
for (int identifier = 0; identifier < sz; identifier++)
temp[ identifier] = expr2;
That is,
expr1
is evaluated first to get the size of the new array,
the array is allocated, and each element of the array is
initialized by the result of evaluating
expr2.
expr2 may use identifier, which
holds the index of the element currently being initialized.
For example, this function returns an array containing the first
n positive even numbers:
int *@fat n_evens(int n) {
return new {for next < n : 2*(next+1)};
}
Note that:
-
expr1 is evaluated exactly once, while expr2 is evaluated expr1 times.
- expr1 might evaluate to 0.
- expr1 might evaluate to a negative number.
If so, it is implicitly converted to a very large unsigned
integer; the allocation is likely to fail due to insufficient
memory. Currently, this will cause a crash!!
- Currently, for array initializers are the only way to
create an object whose size depends on run-time data.
- malloc(sizeof(type)). Returns a @notnull
pointer to an uninitialized value of type type.
- malloc(n*sizeof(type)) or
malloc(sizeof(type)*n). The type must be a bits-only
type (i.e., cannot contain pointers, tagged unions, zero-terminated
values, etc.) If n is a compile-time constant expression,
returns a @thin pointer with @numelts(n). If
n is not a compile-time constant, returns
a @fat pointer to the sequence of
n uninitialized values.
- calloc(n,sizeof(type)). Similar to
the malloc case above, but returns memory that is zero'd. Therefore,
calloc supports types that are bits-only or zero-terminated.
- malloc(e) where e is an expression not of one
of the above forms. If e is constant, returns a
char *@numelts(e)@nozeroterm otherwise
returns a char *@fat@nozeroterm.
Unique pointers can be allocated just as with the heap, but the context must
make clear that a unique pointer is desired. For example, in the following
the variable temp is allocated in the heap:
t * temp = malloc(sizeof(t));
Modifying it slightly, we allocate into the unique region instead:
t *`U temp = malloc(sizeof(t));
t * temp2 = (t *`U)malloc(sizeof(t));
Unfortunately, our type inference system for allocation is overly simple, so
you can't do something like:
t * temp = malloc(sizeof(t));
ufree(temp);
In an ideal world, the fact that temp was passed to ufree
would signal that it is a unique pointer, rather than a heap pointer.
Objects can be created in a dynamic region using the following analogous
expressions.
-
rnew(identifier) expr
- rnew(identifier) array-initializer
- rmalloc(identifier,sizeof(type))
- rmalloc(identifier,n*sizeof(type))
- rmalloc(identifier,sizeof(type)*n)
- rmalloc(identifier,e))
- rcalloc(identifier,n,sizeof(type))
Note that new, malloc, calloc,
rnew, rmalloc and rcalloc are keywords.
The Cyclone library has global variables Core::heap_region and
Core::unique_region which are handles for the heap and unique
regions, respectively. So, for example, new expr can be
replaced with rnew(heap_region,expr). We also define a macro
unew(expr) that expands to rnew(unique_region,expr).
The only way to create an object in a stack region is declaring it as
a local variable. Cyclone does not currently support salloc;
use a growable region instead.
8.3 Common Uses
Although the type system associated with regions is complicated, there are
some simple common idioms. If you understand these idioms, you should be
able to easily write programs using regions, and port many legacy C programs
to Cyclone. The next subsection will explain the usage patterns of unique
pointers, since they are substantially more restrictive than other pointers.
Remember that every pointer points into a region, and although the
pointer can be updated, it must always point into that same region (or
a region known to outlive that region). The region that the pointer
points to is indicated in its type, but omitted regions are filled in
by the compiler according to context.
When regions are omitted from pointer types in function bodies, the
compiler tries to infer the region. However, it can sometimes be too
``eager'' and end up rejecting code. For example, in
void f1(int * x) {
int * y = new 42;
y = &x;
}
the compiler uses y's initializer to decide that y's type is
int * `H. Hence the assignment is illegal, the parameter's
region (called `f1) does not outlive the heap. On the other
hand, this function type-checks:
void f2(int x) {
int * y = &x;
y = new 42;
}
because y's types is inferred to be int * `f2 and the
assignment makes y point into a region that outlives `f2. We
can fix our first function by being more explicit:
void f1(int * x) {
int *`f1 y = new 42;
y = &x;
}
Function bodies are the only places where the compiler tries to infer
the region by how a pointer is used. In function prototypes, type
declarations, and top-level global declarations, the rules for the
meaning of omitted region annotations are fixed. This is necessary
for separate compilation: we often have no information other than the
prototype or declaration.
In the absence of region annotations, function-parameter pointers are
assumed to point into any possible region. Hence, given
void f(int * x, int * y);
we could call f with two stack pointers, a dynamic-region pointer and
a heap-pointer, etc. Hence this type is the ``most useful'' type from
the caller's perspective. But the callee's body (f) may not
type-check with this type. For example, x cannot be assigned to a
heap pointer because we do not know that x points into the heap. If
this is necessary, we must give x the type int *`H. Other
times, we may not care what region x and y are in so long as they are
the same region. Again, our prototype for f does not indicate
this, but we could rewrite it as
void f(int *`r x, int *`r y);
Finally, we may need to refer to the region for x or y in the function
body. If we omit the names (relying on the compiler to make up
names), then we obviously won't be able to do so.
Formally, omitted regions in function parameters are filled in by
fresh region names and the function is ``region polymorphic'' over
these names (as well as all explicit regions).
In the absence of region annotations, function-return pointers are
assumed to point into the heap. Hence the following function will not
type-check:
int * f(int * x) { return x; }
Both of these functions will type-check:
int * f(int *`H x) { return x; }
int *`r f(int *`r x) {return x; }
The second one is more useful because it can be called with any
region.
In type declarations (including typedef for now) and
top-level variables, omitted region annotations are assumed to point
into the heap. In the future, the meaning of typedef may
depend on where the typedef is used. In the meantime, this
code will type-check because it is equivalent to the first function in
the previous example:
typedef int * foo_t;
foo_t f(foo_t x) { return x; }
If you want to write a function that creates new objects in a region
determined by the caller, your function should take a region handle as one
of its arguments.1 The type of a handle is
region_t<`r>, where `r is the region information
associated with pointers into the region. For example, this function
allocates a pair of integers into the region whose handle is r:
$(int,int)*`r f(region_t<`r> r, int x, int y) {
return rnew(r) $(x,y);
}
Notice that we used the same `r for the handle and the return
type. We could have also passed the object back through a pointer
parameter like this:
void f2(region_t<`r> r,int x,int y,$(int,int)*`r *`s p){
*p = rnew(r) $(7,9);
}
Notice that we have been careful to indicate that the region where
*p lives (corresponding to `s) may be different from
the region for which r is the handle (corresponding to
`r). Here's how to use f2:
{ region rgn;
$(int,int) *`rgn x = NULL;
f2(rgn,3,4,&x);
}
The `s and `rgn in our example are unnecessary
because they would be inferred.
typedef, struct, and datatype
declarations can all be parameterized by regions,
just as they can be parameterized by types. For example, here is part
of the list library.
struct List<`a,`r>{`a hd; struct List<`a,`r> *`r tl;};
typedef struct List<`a,`r> *`r list_t<`a,`r>;
// return a fresh copy of the list in r2
list_t<`a,`r2> rcopy(region_t<`r2> r2, list_t<`a> x) {
list_t result, prev;
if (x == NULL) return NULL;
result = rnew(r2) List{.hd=x->hd,.tl=NULL};
prev = result;
for (x=x->tl; x != NULL; x=x->tl) {
prev->tl = rnew(r2) List(x->hd,NULL);
prev = prev->tl;
}
return result;
}
list_t<`a> copy(list_t<`a> x) {
return rcopy(heap_region, x);
}
// Return the length of a list.
int length(list_t x) {
int i = 0;
while (x != NULL) {
++i;
x = x->tl;
}
return i;
}
The type list_t<type,rgn> describes
pointers to lists whose elements have type type and whose
``spines'' are in rgn.
The functions are interesting for what they don't say.
Specifically, when types and regions are omitted from a type
instantiation, the compiler uses rules similar to those used for
omitted regions on pointer types. More explicit versions of the
functions would look like this:
list_t<`a,`r2> rcopy(region_t<`r2> r2, list_t<`a,`r1> x) {
list_t<`a,`r2> result, prev;
...
}
list_t<`a,`H> copy(list_t<`a,`r> x) { ... }
int length(list_t<`a,`r> x) { ... }
8.4 Dynamic Regions
To be filled in, but see the tutorial for hints.
8.5 Using Unique Pointers
Note: Unique pointers are still under developement and are
likely to change substantially by the next release. Therefore, we
discourage their use for now.
The main benefit of regions is also their drawback: to free data you must
free an entire region. This implies that to amortize the cost of creating a
region, one needs to allocate into it many times. Furthermore, the objects
allocated in a region should be mostly in use until the region is freed, or
else memory will be wasted in the region that is unused by the program.
For the cases in which neither situation holds, we can use the unique region
which allows unique pointers to be freed individually. To prevent dangling
pointers, a static analysis ensures that no unique pointer is aliased (i.e.,
the object is, in fact, uniquely pointed to) at the time it is freed;
however, we allow controlled forms of aliasing up until that point. In
particular, we have a primitive alias that allows pointers to be
aliased within the surrounding code block, and we use syntax a
:=: b to allow two unique pointers a and b to be atomically
swapped. Careful use of the swap operator allows us to store unique
pointers in objects that are not themselves uniquely pointed to. Finally,
to properly deal with polymorphism, particularly when performing allocation,
we introduce new kinds for describing regions. In practice, all of these
mechanisms are necessary for writing useful and reusable code.
8.5.1 Simple Unique Pointers
Having a unique pointer ensures the object pointed to is not reachable by
any other means. When pointers are first allocated, e.g. using
malloc, they are unique. Such pointers are allowed to be
read through (that is, dereferenced or indexed) but not copied, as
the following example shows:
char c, *@fat`U buf = calloc(MAXBUF,sizeof(char));
int i = 0;
while ((c = getchar()) > 0 && !isspace(c)) {
buf[i++] = c;
}
printf("%s",buf);
ufree(buf);
This piece of code reads characters from stdin until a word is
formed, and then prints that word. Because the process of storing to the
buffer does not copy its unique pointer, it can be safely freed.
When a unique pointer is copied, e.g. when passed as a parameter to a
function or stored in a datastructure, we say it has been consumed.
We ensure that consumed pointers are not read through or copied via a
dataflow analysis. When a consumed pointer is assigned to, very often it
can be unconsumed, making it accessible again. Here is a simple
example that initializes a datastructure with unique pointers:
1 struct pair { int *`U x; int *`U y; } p;
2 int *`U x = new 1; // initializes x
3 p.x = x; // consumes x
4 x = new 2; // unconsumes x
5 p.y = x; // consumes x
If an attempt was made to read through or copy x between lines 3 and 4 or
after line 5, the flow analysis would reject the code, as in
int *`U x = new 1; // initializes x
p.x = x; // consumes x
p.y = x; // rejected! x has been consumed already
Note that if you fail to free a unique pointer, it will eventually be
garbage collected.
8.5.2 Aliasing Unique Pointers
Programmers often write code that aliases values temporarily, e.g. by
storing them in loop iterator variables or by passing them to functions.
Such reasonable uses would be severely hampered by ``no alias'' restrictions
on unique pointers. To address this problem, we introduce a primitive
called alias that permits temporary aliasing of a unique pointer,
provided that no aliases are live when the block completes.
Here is a simple example:
char *@fat`U dst, *@fat`U src = ...
{ alias <`r> x = (char *@fat`r)src; // consumes src
memcpy(dst,x,numelts(x)); }
// src unconsumed
...
ufree(src);
The alias primitive introduces a fresh region variable `r,
and aliases src with the variable x which is cast to point
into `r. This operation consumes src for the duration of
the surrounding block, and allows x to be freely aliased. As such,
we can pass x to the memcpy function, and when the block
exits, we unconsume and can therefore ultimately free src.
Intuitively, the alias operation is sound because we cast a unique
pointer to instead point into a fresh region, for which there is no
possibility of either creating new values or storing existing values into
escaping data structures. As such we cannot create aliases. However, we
must take care when aliasing data having recursive type. For example, the
following code is unsound:
void foo(list_t<`a,`U> l) {
alias <`r> x = (list_t<`a,`r>)l;
x->tl = x; // unsound: creates alias!
}
In this case, the alias effectively created many values in the
fresh region `r: one for each element of the list. This allows
storing an alias in an element reachable from the original expression
l, so that when the block is exited, this alias escapes.
To prevent this, we only allow ``deep'' aliasing when the aliased pointers
are immutable. For example, if we have a list structure whose tail pointers
are const, call it clist_t, we rule out the above code
because the assignment to x->tl would be forbidden. Here is an
example implementation of the length function using deep aliasing:
int length(list_t<int,`U> l) {
alias <`r> x = (clist_t<int,`r>)l;
int len = 0;
while (x != NULL) {
len++;
x = x->tl;
}
return len;
}
Note that this function is not that useful, since it will consume the list
l. Instead, we would rather that the function itself take a
clist_t pointer, and have the caller perform the alias, so that
the unique list could be unconsumed after calling length. It is on
our to-do list to change the standard libraries to be more
``unique-friendly'' in this way.
8.5.3 Nested Unique Pointers
You can also store unique pointers into other datastructures that could be
themselves allocated in some region. For example, the above example
function length took in a list whose ``spine'' was allocated in the
unique region. Therefore, each tl pointer in the list is a nested
pointer. Nested unique pointers cannot be read directly. In particular,
the following code is illegal:
void f(list_t<int,`U> l) {
l = l->tl;
}
This code is disallowed because the unique pointer l->tl is nested.
We make this restriction to keep the invariant that if some unique pointer
l is unconsumed, then all unique pointers that it points to are
also unconsumed. The assignment above would violate this invariant. We
maintain this invariant to both simplify the flow analysis, and also to
allow unique pointers to be pointed to by non-unique pointers. In
particular, a non-unique pointer can always be considered unconsumed, which
implies that any unique pointers it points to must always be unconsumed as
well.
To allow nested unique pointers to be read and used, we provide an swap
operator, having syntax :=:. In particular, the code a :=:b will swap the contents of a and b. We can use
this to swap out a nested unique pointer, and replace it will a different
one; we will often swap in NULL, since this is a unique pointer that is
always unconsumed. This allows us to write the length function on a unique
list without using the alias primitive (as we did above), freeing
each list element as we go:
int length_unique(list_t<int,`U> l) {
int len = 0;
list_t<int,`U> x;
while (l != NULL) {
x = NULL;
len++;
x :=: l->tl;
Core::ufree(l);
l = x;
}
return len;
}
It is often useful to have a non-unique datastructure contain unique
pointers. For example, you could have a normal queue whose elements are
unique pointers. This way you can freely add and remove elements from the
queue, and then free the unique pointers when the elements are removed. One
caveat is that we currently do not support instantiating polymorphic value
variables (i.e. non-region variables) with unique pointers, because this
could result in aliasing. For example, the following code is not allowed:
list_t<int @`U> l = new List(new 1,NULL);
We can fix this problem by extending the current type system, and plan to do
so in the near future. In the meantime, you need to create separate,
non-polymoprhic versions of library utilities. For example, you would have
to create your own version of list that can hold int @`U pointers.
Note that by implementing the swap operation atomically, unique pointers
would be thread-safe as well. We implemented things this way looking ahead
to when Cyclone will have threads.
8.5.4 Polymorphic Region Allocation
As described in Section 8.3, we can write functions that
take as arguments a region handle to allocate into. For example, we wrote a
function rcopy that copies a list into some region `r2.
However, we didn't provide the full story that accounts for the unique
region. In particular, consider the following function:
$(int @`r, int @`r) make_pair(region_t<`r> rgn) {
int @x = rnew (rgn) 1;
return $(x, x);
}
This function will return a pair of pointers to the same object. If we pass
in something other than the unique region, this function will behave
properly:
$(int @,int@) pair = make_pair(heap_region);
However, things can go badly wrong if we pass in the unique region instead:
$(int @`U,int @`U) pair = make_pair(unique_region);
ufree(pair[0]);
int x = pair[1]; // error! dereferences freed pointer
The problem is that make_pair creates an alias; if we pass in the
unique region for rgn, we can free one of these aliases (e.g. the
pointer via the first element of the pair), but then dereference the other
(i.e. via the second pair element).
To prevent this behavior, we have to classify the two different kinds of
regions that we support: aliasable regions, whose pointers can be freely
aliased, and unique regions, whose pointers cannot be aliased but can be
freed. To do this, we define kinds R for aliasable regions and UR for
unique ones. We can then classify a polymorphic region variable with the
proper kind. This allows us to change the make_pair function as
follows:
$(int @`r, int @`r) make_pair(region_t<`r::R> rgn) {
int @x = rnew (rgn) 1;
return $(x, x);
}
Now we have specified specifically that `r must be an aliasable
region (in fact, when not specified, this is the default). As such, the
illegal code above will not typecheck because we are attempting to
instantiate a unique region (having kind UR) for an aliasable one, which is
disallowed.
For generality, we introduce a third region kind TR (which stands for ``top
region''); TR is a ``super-kind'' of R and UR, meaning that types having TR
kind can be used in places expecting types of R or UR kind. This also means
that pointers into a TR-kinded region can neither be aliased nor freed,
since we might instantiate either the unique region (whose pointers cannot
be aliased) or an aliasable region (whose pointers cannot be freed) in place
of the TR-kinded variable.
We can now generalize the rcopy example above:
struct List<`a,`r::TR>{`a hd; struct List<`a,`r> *`r tl;};
typedef struct List<`a,`r> *`r list_t<`a,`r>;
// return a fresh copy of the list in r2
list_t<`a,`r2> rcopy(region_t<`r2::TR> r2, list_t<`a> x) {
if (x == NULL) return NULL;
else {
list_t rest = rcopy(r2,x->tl);
return rnew(r2) List{.hd=x->hd,.tl=rest};
}
}
list_t<`a> copy(list_t<`a> x) {
return rcopy(heap_region, x);
}
We have made three key changes to the prior version of rcopy:
-
The definition of List has been generalized so that its
`r region variable now has kind TR. This implies that lists can
point into any region, whether unique or aliasable.
- The region handle r2 now has kind TR, rather than the default
R. This means that we can pass in any region handle, and thus copy a list
into any kind of region.
- We have made rcopy's implementation recursive. This was
necessary to avoid creating aliases to the newly created list. In
particular, if we were to have used a prev pointer as in the
version from Section 8.3, we would have two pointers to
the last-copied element: the tl field of the element before it in
the list, and the current iterator variable prev. The use of
recursion allows us to iterate to the end of the list and construct it
back to front, in which no aliases are required. The cost is we need to
do extra stack allocation. This example illustrates that it is sometimes
difficult to program using no-alias pointers. This is why, in cases other
than allocation, we would prefer to use the alias construct to
allow temporary aliasing.
8.6 Type-Checking Regions
Because of recursive functions, there can be any number of live
regions at run time. The compiler the following general strategy to
ensure that only pointers into live regions are dereferenced:
-
Use compile-time region names. Syntactically these are
just type variables, but they are used differently.
- Decorate each pointer type and handle type with one region name.
- Decorate each program point with a (finite) set of region names.
We call the set the point's capability.
- To dereference a pointer (via *, ->, or
subscript), the pointer's type's region name must be in the program
point's capability. Similarly, to use a handle for allocation, the
handle type's region name must be in the capability.
- Enforce a type system such that the following is impossible: A
program point P's capability contains a region name `r that
decorates a pointer (or handle) expression expr that, at
run time, points into a region that has been deallocated and the
operation at P dereferences expr.
This strategy is probably too vague to make sense at this point, but
it may help to refer back to it as we explain specific aspects of the
type system.
Note that in the rest of the documentation (and in common parlance) we
abuse the word ``region'' to refer both to region names and to
run-time collections of objects. Similarly, we confuse a block of
declarations, its region-name, and the run-time space allocated for
the block. (With loops and recursive functions, ``the space
allocated'' for the block is really any number of distinct regions.)
But in the rest of this section, we painstakingly distinguish
region names, regions, etc.
8.6.1 Region Names
Given a function, we associate a distinct region name with each
program point that creates a region, as follows:
-
If a block (blocks create stack regions) has label L,
then the region-name for the block is `L.
- If a block has no label, the compiler makes up a unique
region-name for the block.
- In region r <`foo> s, the region-name for the construct
is `foo.
- In region r s, the region-name for the construct is
`r.
The region name for the heap is `H, and the region name for the
unique region in `U. Region names associated with program points
within a function should be distinct from each other, distinct from any
region names appearing in the function's prototype, and should not be
`H or `U. (So you cannot use H as a label name.)
Because the function's return type cannot mention a region name for a block
or region-construct in the function, it is impossible to return a pointer to
deallocated storage.
In region r <`r> s and region r s, the type of
r is region_t<`r>. In other words, the handle is
decorated with the region name for the construct. Pointer types'
region names are explicit, although you generally rely on inference to
put in the correct one for you.
8.6.2 Capabilities
In the absence of explicit effects (see below), the capability for a
program point includes exactly:
-
`H and `U
- The effect corresponding to the function's prototype. Briefly,
any region name in the prototype (or inserted by the compiler due to
an omission) is in the corresponding effect. Furthermore, for each
type variable `a that appears (or is inserted),
``regions(`a)'' is in the corresponding effect. This latter
effect roughly means, ``I don't know what `a is, but if you
instantiate with a type mentioning some regions, then add those
regions to the effect of the instantiated prototype.'' This is
necessary for safely type-checking calls that include function pointers.
- The region names for the blocks and ``region r s''
statements that contain the program point
For each dereference or allocation operation, we simply check that the
region name for the type of the object is in the capability. It takes
extremely tricky code (such as existential region names) to make the
check fail.
8.6.3 Assignment and Outlives
A pointer type's region name is part of the type. If e1 and
e2 are pointers, then e1 = e2 is well-typed only if
the region name for e2's type ``outlives'' the region name
for e1's type. By outlives, we intuitively mean the region
corresponding to one region name will be deallocated after the region
corresponding to the other region name. The rules for outlives are as
follows:
For handlers, if `r is a region name, there is at most one
value of type region_t<`r> (there are 0 if `r is a
block's name), so there is little use in creating variables of type
region_t<`r>.
8.6.4 Type Declarations
A struct, typedef, or datatype
declaration may be parameterized by any number of region names. The region
names are placed in the list of type parameters. They must be followed by
their kind -- i.e. either ``::R'', ``::UR'', or
``::TR'' -- except for typedef declarations (where the
region name appears in the underlying type). For example, given
struct List<`a,`r::TR>{`a hd; struct List<`a,`r> *`r tl;};
the type struct List<int,`H> is for a list of ints in the heap.
Notice that all of the ``cons cells'' of the List will be in
the same region (the type of the tl field uses the same
region name `r that is used to instantiate the recursive
instance of struct List<`a,`r>). However, we could instantiate
`a with a pointer type that has a different region name, as long as
that region has kind R.
datatype declarations must also be
instantiated with an additional region name. An object of type
datatype `r Foo
is treated (capability-wise) as a pointer with region name
`r. If the region name is omitted from a use of a
datatype declaration, it is implicitly `H.
8.6.5 Function Calls
If a function parameter or result has type int *`r or
region_t<`r>, the function is polymorphic over the region name
`r. That is, the caller can instantiate `r with any
region in the caller's current capability as long as the region has
the correct kind. This instantiation is usually implicit, so the caller just
calls the function and the compiler uses the types of the actual arguments
to infer the instantiation of the region names (just like it infers the
instantiation of type variables).
The callee is checked knowing nothing about `r except that it is in
its capability (plus whatever can be determined from explicit outlives
assumptions), and that it has the given kind. For example, it will be
impossible to assign a parameter of type int*`r to a global
variable. Why? Because the global would have to have a type that allowed
it to point into any region. There is no such type because we could never
safely follow such a pointer (since it could point into a deallocated
region).
8.6.6 Explicit and Default Effects
If you are not using existential types, you now know everything you
need to know about Cyclone regions and memory management. Even if you
are using these types and functions over them (such as the closure
library in the Cyclone library), you probably don't need to know much more
than ``provide a region that the hiddent types outlive''.
The problem with existential types is that when you ``unpack'' the
type, you no longer know that the regions into which the fields point
are allocated. We are sound because the corresponding region names
are not in the capability, but this makes the fields unusable. To
make them usable, we do not hide the capability needed to use them.
Instead, we use a region bound that is not existentially
bound.
If the contents of existential packages contain only heap pointers,
then `H is a fine choice for a region bound.
These issues are discussed in
Section 12.