Thorin 1.9.0
The Higher ORder INtermediate representation
Loading...
Searching...
No Matches
Language Reference

Notation

We use the following notation:

Notation Meaning
a literally the terminal token a
[ab] matches a or b
[a-c] matches a - c
a* zero or more repetitions of "a"
a+ one or more repetitions of "a"
a? "a" is optional
a , ... , a ,-separated list of zero or more "a" elements; may contain a trailing , at the end of the list

Lexical Structure

Thorin files are UTF-8 encoded and lexed from left to right.

Note
The maximal munch strategy resolves any ambiguities in the lexical rules below. For Example, <<< is lexed as << and <.

Terminals

The actual terminals are specified in the following tables.

Primary Terminals

The grammatical rules will directly reference these primary terminals. Secondary terminals are ASCII-only tokens that represent the same lexical element as its corresponding primary token. For example, the lexer doesn't care, if you use or .bot. Both tokens are identified as .

Primary Terminals Secondary Terminals Comment
( ) [ ] { } delimiters
« » << >> < > UTF-8 delimiters
λ Π -> .bot .top * .lm |~| further UTF-8 tokens
= , ; . # : % @ further tokens
<eof> marks the end of the file

In addition you can use , , , and as an alternative for , , «, and ».

Keywords

In addition the following keywords are terminals:

Terminal Comment
.module starts a module
.import imports another Thorin file
.plugin like .import and additionally loads the compiler plugin
.ax axiom
.let let declaration
.con continuation declaration
.fun function declaration
.lam lambda declaration
.ret ret expression
.cn continuation expression
.fn function expression
.lm lambda expression
.Pi Pi declaration
.Sigma Sigma declaration
.extern marks function as external
.ins thorin::Insert expression
.insert alias for .ins
.Nat thorin::Nat
.Idx thorin::Idx
.Type thorin::Type
.Univ thorin::Univ
.ff alias for 0₂
.tt alias for 1₂
Terminal Alias Terminal Alias
.tt ff 0₂ 1₂ .Bool .Idx .i1
.i1 2 .I1 .Idx .i1
.i8 0x100 .I8 .Idx .i8
.i16 0x1'0000 .I16 .Idx .i16
.i32 0x1'0000'0000 .I32 .Idx .i32
.i64 0 .I64 .Idx .i64

All keywords start with a . to prevent name clashes with identifiers.

Other Terminals

The following terminals comprise more complicated patterns:

Terminal Regular Expression Comment
𝖨 sym identifier
A % sym . sym (. sym)? Annex name
L dec+ unsigned decimal literal
L 0b bin+ unsigned binary literal
L 0o oct+ unsigned octal literal
L 0x hex+ unsigned hexadecimal literal
L sign dec+ signed decimal literal
L sign 0b bin+ signed binary literal
L sign 0o oct+ signed octal literal
L sign 0x hex+ signed hexadecimal literal
L sign? dec+ eE sign dec+ floating-point literal
L sign? dec+ . dec* (eE sign dec+)? floating-point literal
L sign? dec* . dec+ (eE sign dec+)? floating-point literal
L sign? 0x hex+ pP sign dec+ floating-point hexadecimal literal
L sign? 0x hex+ . hex* pP sign dec+ floating-point hexadecimal literal
L sign? 0x hex* . hex+ pP sign dec+ floating-point hexadecimal literal
Xn dec+ sub+n index literal of type .Idx n
Xn dec+ _ dec+n index literal of type .Idx n
C '(a | esc)' character literal; a ∊ ASCII ∖ {\, '}
S "(a | esc)*" string literal; a ∊ ASCII ∖ {\, "}

The previous table resorts to the following definitions as shorthand:

Name Definition Comment
0b 0 [ bB ] prefix for binary literals
0o 0 [ oO ] prefix for octal literals
0x 0 [ xX ] prefix for hexadecimal literals
bin [ 01 ] binary digit
oct [ 0-7 ] octal digit
dec [ 0-9 ] decimal digit
sub [ - ] subscript digit (always decimal)
hex [ 0-9a-fA-F ] hexadecimal digit
eE [ e E ] exponent in floating point literals
pP [ p P ] exponent in floating point hexadecimal literals
sign [ + - ]
sym [ _a-zA-Z ][ ._0-9a-zA-Z ]* symbol
esc [ \'\"\0\a\b\f\n\r\t\v ] escape sequences

So, sym refers to the shorthand rule while 𝖨 refers to the terminal that is identical to sym. However, the terminal A also uses the shorthand rule sym.

Comments

In addition, the following comments are available:

  • /* ... */: multi-line comment
  • //: single-line comment
  • ///: single-line comment for Markdown output:
    • Single-line /// xxx comments will put xxx directly into the Markdown output. You can put an optional space after the /// that will be elided in the Markdown output.
    • Everything else will be put verbatim within a fenced code block.

Grammar

Thorin's grammar is defined as a context-free grammar that consists of the terminals defined above as well as the nonterminals and productions defined below. The start symbol is "m" (module).

The follwing tables summarizes the main nonterminals:

Nonterminal Meaning
m module
d declaration
p ()-style pattern
b []-style pattern
e expression

The following tables comprise all production rules:

Module

LHS RHS Comment Thorin Class
m dep* d* module World
dep .import S ; import
dep .plugin S ; plugin

Declarations

LHS RHS Comment Thorin Class
d .let (p | A) = e ; let -
d .lam n (.? p)+ (: ecodom)? ( = d* e)? ; lambda declarations Lam
d .con n (.? p)+ ( = d* e)? ; continuation declarations Lam
d .fun n (.? p)+ (: eret)? ( = d* e)? ; function declarations Lam
d .Pi n (: etype)? (= e)? ; Pi declaration Pi
d .Sigma n (: etype )? (, Larity)? (= b[ ])? ; sigma declaration Sigma
d .ax A : etype (( sa , ... , sa ))?
(, 𝖨normalizer)? (, Lcurry)? (, Ltrip)? ;
axiom Axiom
n 𝖨 | A identifier or annex name fe::Sym/Annex
sa 𝖨 (= 𝖨 , ... , 𝖨)? subtag with aliases
Note
An elided type of a .Pi or .Sigma declaration defaults to *.

Patterns

Patterns allow you to decompose a value into its components like in Standard ML or other functional languages.

LHS RHS Comment
p `? 𝖨 (: etype )? identifier ()-pattern
p (`? 𝖨 ::)? ( d* (p | g) , ... , d* (p | g) ) ()-()-tuple patterns
p (`? 𝖨 ::)? b[ ] []-()-tuple pattern
b (`? 𝖨 :)? etype identifier []-pattern
b (`? 𝖨 ::)? b[ ] []-[]-tuple pattern
b[ ] [ d* (b | g) , ... , d* (b | g) ] []-tuple patterns
g 𝖨+ : e group

()-style vs []-style

There are

  • p: parenthesis-style patterns (()-style), and
  • b: bracket-style patterns ([]-style) .

The main difference is that

  • (a, b, c) means (a: ?, b: ?, c: ?) whereas
  • [a, b, c] means [_: a, _: c, _: d] while
  • (a: A, b: B, c: C) is the same as [a: A, b: B, C: C].

You can switch from a ()-style pattern to a []-pattern but not vice versa. For this reason there is no rule for a ()-[]-pattern.

Groups

Tuple patterns allow for groups:

  • (a b c: .Nat, d e: .Bool) means (a: .Nat, b: .Nat, c: .Nat, d: .Bool, e: .Bool).
  • [a b c: .Nat, d e: .Bool] means [a: .Nat, b: .Nat, c: .Nat, d: .Bool, e: .Bool].

You can introduce an optional name for the whole tuple pattern:

.let abc::(a, b, c) = (1, 2, 3);

This will bind

  • a to 1,
  • b to 2,
  • c to 3, and
  • abc to (1, 2, 3).

Here is another example:

Π.Tas::[T: *, as: .Nat][%mem.M, %mem.Ptr Tas] → [%mem.M, T]

Rebind

Finally, you can put a ` in front of an identifier of a ()-style pattern to (potentially) rebind a name to a different value. This is particularly useful, when dealing with memory:

.let (`mem, ptr) = %mem.alloc (I32, 0) mem;
.let `mem = %mem.store (mem, ptr, 23:I32);
.let (`mem, val) = %mem.load (mem, ptr);

Expressions

Kinds & Builtin Types

LHS RHS Comment Thorin Class
e .Univ universe: type of a type level Univ
e .Type e type of level e Type
e * alias for .Type (0:.Univ) Type
e alias for .Type (1:.Univ) Type
e .Nat natural number Nat
e .Idx builtin of type .Nat → * Idx
e .Bool alias for .Bool Idx

Literals & Co.

LHS RHS Comment Thorin Class
e L (: etype)? literal Lit
e Xn literal of type .Idx n Lit
e .ff alias for 0_2 Lit
e .tt alias for 1_2 Lit
e C character literal of type .I8 Lit
e S string tuple of type «n; .I8» Tuple
e (: etype)? bottom Bot
e (: etype)? top Top
e n identifier or annex name fe::Sym/Annex
e { d* e } blocks -
Note
An elided type of
  • a literal defaults to .Nat,
  • a bottom/top defaults to *.

Functions

LHS RHS Comment Thorin Class
e edom ecodom function type Pi
e Π .? b (.? b[ ])* (: ecodom)? dependent function types Pi
e .Cn .? b (.? b[ ])* continuation types Pi
e .Fn .? b (.? b[ ])* (: eret)? returning continuation types Pi
e λ (.? p)+ ( ecodom)? = d* e lambda expressions Lam
e .cn (.? p)+ = d* e continuation expressions Lam
e .fn (.? p)+ ( ecodom)? = d* e function expressions Lam
e e e application App
e e @ e application making implicit arguments explicit App
e .ret p = e $ e ; d* e ret expresison App

Tuples

LHS RHS Comment Thorin Class
e b[ ] sigma Sigma
e « s ; ebody» arrays Arr
e ( e0 , ... , en-1) (: etype)? tuple Tuple
e s ; ebody packs Pack
e e # eindex extract Extract
e e # 𝖨 extract via field "𝖨" Extract
e .ins ( etuple , eindex , evalue ) insert Insert
s eshape | S : eshape shape -

Precedence

Expressions nesting is disambiguated according to the following precedence table (from strongest to weakest binding):

Level Operator Description Associativity
1 L : e type ascription of a literal -
2 e # e extract left-to-right
3 e e application left-to-right
3 e @ e application making implicit arguments explicit left-to-right
4 Π b domain of a dependent function type -
5 e e function type right-to-left
Note
The domain of a dependent function type binds slightly stronger than . This has the effect that
Π T: * → T → T
has the expected binding like this:
(Π T: *) → (T → T)
Otherwise, would be consumed by the domain:
Π T: (* → (T → T)) ↯

Summary: Functions & Types

The following table summarizes the different tokens used for functions declarations, expressions, and types:

Declaration Expression Type
.lam .lm
λ
|~|
Π
.con .cn .Cn
.fun .fn .Fn

Declarations

The following function declarations are all equivalent:

.lam f(T: *)((x y: T), return: T → ⊥)@(.ff): ⊥ = return x;
.con f(T: *)((x y: T), return: .Cn T) = return x;
.fun f(T: *) (x y: T): T = return x;

Note that all partial evaluation filters default to .tt except for .con/.cn/.fun/.fn.

Expressions

The following function expressions are all equivalent. What is more, since they are bound by a let declaration, they have the exact same effect as the function declarations above:

.let f = λ (T: *)((x y: T), return: T → ⊥)@(.ff): ⊥ = return x;
.let f = .lm (T: *)((x y: T), return: T → ⊥) : ⊥ = return x;
.let f = .cn (T: *)((x y: T), return: .Cn T) = return x;
.let f = .fn (T: *) (x y: T): T = return x;

Applications

The following expressions for applying f are also equivalent:

f .Nat ((23, 42), .cn res: .Nat = use(res))
.ret res = f .Nat $ (23, 42); use(res)

Function Types

Finally, the following function types are all equivalent and denote the type of f above.

Π [T:*][[T, T], T → ⊥] → ⊥
.Cn [T:*][[T, T], .Cn T]
.Fn [T:*] [T, T] → T

Scoping

Thorin uses lexical scoping where all names live within the same namespace - with a few exceptions noted below.

Note
The grammar tables above also indiciate which constructs open new scopes (and close them again) with thiss annotation in the Comments column.

Underscore

The symbol _ is special and never binds to an entity. For this reason, _ can be bound over and over again within the same scope (without effect). Hence, using the symbol _ will always result in a scoping error.

Pis

Note
Only
Π x: e → e
introduces a new scope whereas
x: e → e
is a syntax error. If the variable name of a Pi's domain is elided and the domain is a sigma, its elements will be imported into the Pi's scope to make these elements available in the Pi's codomain:
Π [T: *, U: *] → [T, U]

Annex

Annex names are special and live in a global namespace.

Field Names of Sigmas

Named elements of mutable sigmas are available for extracts/inserts.

Note
These names take precedence over the usual scope. In the following example, i refers to the first element i of X and not to the i introduced via .let:
.let i = 1_2;
Π X: [i: .Nat, j: .Nat] → f X#i;
Use parentheses to refer to the .let-bounded i:
.let i = 1_2;
Π X: [i: .Nat, j: .Nat] → f X#(i);