# Language Reference Manual

## Contents

## Abstract

Thorin is an intermediate representation based on continuation-passing style (CPS). For an introduction to CPS please refer to Wikipedia. This article also suggests further reading about CPS. Please refer to A Graph-Based Higher-Order Intermediate Representation for a more scientific introduction to Thorin.

## Definitions

There exist three ways to define values in Thorin, called definitions:

- Functions are definitions,
- their parameters are definitions and
- Primops are definitions.

Primops are a way to express simple acyclic calculations without using continuations. Primops induce an acyclic, functional data-flow graph which allows for simple code analyses and transformations. Primops are not explicitly scheduled in functions. Instead, they are "floating", i.e., their data dependencies determine certain constraints where a primop may be scheduled.

## Type System

### Primitive Types

#### Integer Types

Thorin supports *signed* (`s`

) and *unsigned* (`u`

) two-complement integer types of the bit widths 8, 16, 32 and 64.
Additionally, each type comes in two flavors:
Precise (`p`

) types using wrap-around arithmetic and so-called quick (`q`

) types which will yield `bottom`

if an overflow or underflow occurs.
Consequently, the following types are available:

`ps8`

`ps16`

`ps32`

`ps64`

`pu8`

`pu16`

`pu32`

`pu64`

`qs8`

`qs16`

`qs32`

`qs64`

`qu8`

`qu16`

`qu32`

`qu64`

Additionally, there is a boolean type:

`bool`

#### Floating-point Types

Thorin knows two floating-point types - each of them in a *precise* (`p`

) and a *quick* (`q`

) flavor:

`pf32`

(`float`

in many C implementations)`pf64`

(`double`

in many C implementations)`qf32`

(`float`

with`-ffast-math`

in many C implementations)`qf64`

(`double`

with`-ffast-math`

in many C implementations)

#### Compound Types

#### Tuple Types

Two tuple types `(t1, ..., tn)`

which are syntactically equivalent, are considered equal.

#### Struct Types

TODO

#### Function Types

As function never return in CPS, function types do not posses a return type.
The type `fn(t1, ... tn)`

denotes the type of a function taking arguments of types `t1`

, …, `tn`

.

### Memory-Related Types

#### Frame type

The type of a stack frame `frame`

is created by `enter`

or destroyed by escaping a function.

#### Memory monad type

Thorin uses a memory monad to capture changes in memory and side effects.
The type of this monad is `mem`

.

#### Pointer type

A pointer of type `T`

is referred to as `T*`

.

### Generics

Thorin features impredicative polymorphism based on System F.
Programmers not familiar with System F can think of function templates in C++ while type checking is performed on the templated version of a function (as opposed to type checking just each template instantiation as in C++).
Generic types (or just *generics*) are Thorin’s vehicle to express this parametric polymorphism.

## Functions and Parameters

The main ingredient of all Thorin programs are functions. In a classical compiler framework, there exist basic blocks and (usually first-order) functions. In Thorin, both concepts are represented by functions and yet Thorin’s functions are more powerful than classic first-order representations: Thorin allows higher-order functions, i.e., you can pass functions as parameters to other functions.

Each lambda consists of a head and a body:

```
name(param_1: type_1, ..., param_n: type_n): pi(type_1, ..., type_n)
target(arg_1, ..., arg_m)
```

### Function Head

The head is essentially just a parameter list `param_1`

, …, `param_n`

of type `type_1`

, …, `type_n`

.
The type of this function is then `fn(type_1, ..., type_n)`

.

### Function Body

The body of a function just consists of an invocation of `target`

while passing `arg_1`

, …, `arg_m`

as arguments.
The arguments `arg_1`

, …, `arg_n`

may be arbitrary definitions.

The `target`

may either be

- another function,
- a parameter (of function type) or
- a
`select cond, t, f`

where`cond`

is of type`bool`

and`t`

and`f`

are again functions of type`fn()`

. Note that neither`t`

nor`f`

are allowed to be parameters.

The latter construct allows to create branches. For example, the following Impala program

```
fn main(i: int) -> int {
let a: int;
if (i == 0) {
a = 23;
} else {
a = 42;
}
a
}
```

translates to the following Thorin program:

```
main(i: qs32, return: fn(qs32))
u1 cond = cmp_eq i, qs32 0
fn() to = select cond, if_then, if_else
to()
if_then()
next(23)
if_else()
next(42)
next(a: qs32)
return(a)
```

As already mentioned, primops are not explicitly scheduled in Thorin.
Due to data dependencies of primops, several possible schedules are implied.
In order to make the Thorin output textual representable, AnyDSL decides on a schedule.
This means, that a code placement algorithm places all primops into functions.
The output we can see at the top is a scheduled version of the actual program representation.
For this reason, it looks as if Thorin explicitly knows that `u1 cond = cmp_eq i, qs32 0`

must be within `main`

.
But again: This is not true.
Only the code placement algorithm decided on a schedule.
This is what we see.

## Primops

Thorin’s primops are by design very similar to LLVM’s instructions. The reason for this is twofold: First, AnyDSL uses LLVM under the hood to generate code. Similar design allows us to simplify our backend. Second, LLVM is a well-designed compiler IR. Many thoughts and ideas back LLVM. Hence, it is reasonable to build on top of this design.

### Literals

Literals are modelled by nullary primops, i.e., primops taking no further arguments.

#### PrimLit

Primitive literals exist for all primitive types.

#### Bottom

A value which is not a value. Sounds contradictory but useful for data-flow analysis.

### Arithmetic Primops

All arithmetic primops work on two definitions of the same *primitive* type. The same type is also the result type.

#### add

```
add a, b
```

Addition: `a + b`

#### sub

```
sub a, b
```

Subtraction: `a + b`

#### mul

```
mul a, b
```

Multiplication: `a + b`

#### div

```
div a, b
```

Division: `a / b`

.
Dividing by zero creates a `bottom`

value.

#### rem

```
urem a, b
```

Remainder (aka modulo): `a % b`

.
Dividing by zero creates a `bottom`

value.

#### and

```
and a, b
```

Bitwise AND: `a & b`

#### or

```
or a, b
```

Bitwise OR: `a | b`

#### xor

```
xor a, b
```

Bitwise XOR: `a ^ b`

#### shl

```
shl a, b
```

Shifts bits in `a`

by `b`

positions to the left: `a << b`

.
The least significant `b`

bits are filled with `0`

. If `b`

is equal to or larger then the number of bits of `a`

, the result will be `bottom`

.

#### shr

```
shr a, b
```

Shifts bits in `a`

by `b`

positions to the right: `a >> b`

.
The most significant `b`

bits are filled with `0`

in the case of unsigned operands or with `1`

in the case of signed operands.
If `b`

is equal to or larger then the number of bits of `a`

, the result will be `bottom`

.

### Comparisons

All comparisons work on two definitions of the same primitive type.
The result type is always `bool`

.
Comparisons between precise floats are always ordered; comparisons between quick floats are always unordered.

```
eq a, b
```

Yields `1`

if the operands are equal, `0`

otherwise.

#### ne

```
ne a, b
```

Yields `1`

if the operands are unequal, `0`

otherwise.

#### gt

```
gt a, b
```

Yields `1`

if `a`

is greater than `b`

.

#### ge

```
ge a, b
```

Yields `1`

if `a`

is greater than or equal to `b`

.

#### lt

```
lt a, b
```

Yields `1`

if `a`

is less than `b`

.

#### le

```
le a, b
```

Yields `1`

if `a`

is less than or equal to `b`

.

### Conversion Primops

#### cast

```
cast a to T
```

Converts `a`

to T.
Both the type of `a`

and `T`

must be primitive or pointer types.

#### bitcast

```
bitcast a to T
```

Converts `a`

to T by reinterpreting the bits of `a`

.
No computation is performed.
Both the type of `a`

and `T`

must have the same bit width.

### Aggregate-Related Primops

#### tuple

```
tuple v1, ..., vn
```

Constructs a tuple with values `v1`

till `vn`

.
The result will be of type `(typeof(v1), ..., typeof(vn))`

.
It is allowed to construct the empty tuple.

#### extract

```
extract t, i
```

Extracts from `t`

, which must be of tuple, array or vector type, the (i-1)th value.
The argument `i`

must be a PrimLit of integer type and it must hold 0 <= `i`

< num_elems_of(t).
The result type is the type of the (i-1)th element in `t`

.

#### insert

```
insert t, i, v
```

Constructs a new value of the same type as `t`

(which must be of sigma, array or vector type) by inserting `v`

into the (i-1)th position in `t`

.
The argument `i`

must be a PrimLit of integer type and it must hold 0 <= `i`

< num_elems_of(t).
Moreover, the type of `v`

must be equal to the type of the (i-1)th element in `t`

.

### Memory-Related Primops

The following primops deal with memory accesses.
Thereby, we assume that `p`

is of pointer, `m`

of monad and `f`

of frame type.

#### load

```
load m, p
```

Loads from monad `m`

the value pointed to by `p`

: `*p`

.
The new monad value and the loaded value is yielded.
Thus, the type of a `load`

is `(mem, typeof(*p))`

.

#### store

```
store m, p, v
```

Stores `v`

at the location pointed to by `p`

in the monad `m`

.
It must hold that `typeof(*p) = typeof(v)`

.
A new monad is returned.

#### enter

```
enter m
```

Creates a stack frame.
A pair of a new monad and a frame is returned.
Thus, the type of a `enter`

is `(mem, frame)`

.

#### slot

```
slot f, i of T
```

Creates a stack slot of number `i`

in frame `f`

of type `T`

.
A pointer to this slot (of type `T*`

) is returned.
The purpose of index `i`

is to make stack slots unique.
Dereferencing a pointer pointing to a slot which has been destroyed yields undefined behavior.