Notation

We use the following notation:

Notation	Meaning
`a`	literally the terminal token `a`
[`ab`]	matches `a` or `b`
[`a`-`c`]	matches `a` - `c`
a*	zero or more repetitions of "a"
a+	one or more repetitions of "a"
a?	"a" is optional
a `,` ... `,` a	`,`-separated list of zero or more "a" elements; may contain a trailing `,` at the end of the list

Lexical Structure

Thorin files are UTF-8 encoded and lexed from left to right.

Note: The maximal munch strategy resolves any ambiguities in the lexical rules below. For Example, <<< is lexed as << and <.

Terminals

The actual terminals are specified in the following tables.

Primary Terminals

The grammatical rules will directly reference these primary terminals. Secondary terminals are ASCII-only tokens that represent the same lexical element as its corresponding primary token. For example, the lexer doesn't care, if you use ⊥ or .bot. Both tokens are identified as ⊥.

Primary Terminals	Secondary Terminals	Comment
`(` `)` `[` `]` `{` `}`		delimiters
`‹` `›` `«` `»`	`<<` `>>` `<` `>`	UTF-8 delimiters
`→` `⊥` `⊤` `★` `□` `λ` `Π`	`->` `.bot` `.top` `*` `.lm` `\|~\|`	further UTF-8 tokens
`=` `,` `;` `.` `#` `:` `%` `@`		further tokens
`<eof>`		marks the end of the file

In addition you can use ⟨, ⟩, ⟪, and ⟫ as an alternative for ‹, ›, «, and ».

Keywords

In addition the following keywords are terminals:

Terminal	Comment
`.module`	starts a module
`.import`	imports another Thorin file
`.plugin`	like `.import` and additionally loads the compiler plugin
`.ax`	axiom
`.let`	let declaration
`.con`	continuation declaration
`.fun`	function declaration
`.lam`	lambda declaration
`.ret`	ret expression
`.cn`	continuation expression
`.fn`	function expression
`.lm`	lambda expression
`.Pi`	Pi declaration
`.Sigma`	Sigma declaration
`.extern`	marks function as external
`.ins`	thorin::Insert expression
`.insert`	alias for `.ins`
`.Nat`	thorin::Nat
`.Idx`	thorin::Idx
`.Type`	thorin::Type
`.Univ`	thorin::Univ
`.ff`	alias for `0₂`
`.tt`	alias for `1₂`

Terminal	Alias	Terminal	Alias
`.tt` `ff`	`0₂` `1₂`	`.Bool`	`.Idx .i1`
`.i1`	`2`	`.I1`	`.Idx .i1`
`.i8`	`0x100`	`.I8`	`.Idx .i8`
`.i16`	`0x1'0000`	`.I16`	`.Idx .i16`
`.i32`	`0x1'0000'0000`	`.I32`	`.Idx .i32`
`.i64`	`0`	`.I64`	`.Idx .i64`

All keywords start with a . to prevent name clashes with identifiers.

Other Terminals

The following terminals comprise more complicated patterns:

Terminal	Regular Expression	Comment
𝖨	sym	identifier
A	`%` sym `.` sym (`.` sym)?	Annex name
L	dec+	unsigned decimal literal
L	0b bin+	unsigned binary literal
L	0o oct+	unsigned octal literal
L	0x hex+	unsigned hexadecimal literal
L	sign dec+	signed decimal literal
L	sign 0b bin+	signed binary literal
L	sign 0o oct+	signed octal literal
L	sign 0x hex+	signed hexadecimal literal
L	sign? dec+ eE sign dec+	floating-point literal
L	sign? dec+ `.` dec* (eE sign dec+)?	floating-point literal
L	sign? dec* `.` dec+ (eE sign dec+)?	floating-point literal
L	sign? 0x hex+ pP sign dec+	floating-point hexadecimal literal
L	sign? 0x hex+ `.` hex* pP sign dec+	floating-point hexadecimal literal
L	sign? 0x hex* `.` hex+ pP sign dec+	floating-point hexadecimal literal
X_n	dec+ sub+_n	index literal of type `.Idx n`
X_n	dec+ `_` dec+_n	index literal of type `.Idx n`
C	`'`(a \| esc)`'`	character literal; a ∊ ASCII ∖ {`\`, `'`}
S	`"`(a \| esc)*`"`	string literal; a ∊ ASCII ∖ {`\`, `"`}

The previous table resorts to the following definitions as shorthand:

Name	Definition	Comment
0b	`0` [ `bB` ]	prefix for binary literals
0o	`0` [ `oO` ]	prefix for octal literals
0x	`0` [ `xX` ]	prefix for hexadecimal literals
bin	[ `01` ]	binary digit
oct	[ `0`-`7` ]	octal digit
dec	[ `0`-`9` ]	decimal digit
sub	[ `₀`-`₉` ]	subscript digit (always decimal)
hex	[ `0`-`9a`-`fA`-`F` ]	hexadecimal digit
eE	[ `e` `E` ]	exponent in floating point literals
pP	[ `p` `P` ]	exponent in floating point hexadecimal literals
sign	[ `+` `-` ]
sym	[ `_a`-`zA`-`Z` ][ `._0`-`9a`-`zA`-`Z` ]*	symbol
esc	[ `\'\"\0\a\b\f\n\r\t\v` ]	escape sequences

So, sym refers to the shorthand rule while 𝖨 refers to the terminal that is identical to sym. However, the terminal A also uses the shorthand rule sym.

Comments

In addition, the following comments are available:

/* ... */: multi-line comment
//: single-line comment
///: single-line comment for Markdown output:
- Single-line /// xxx comments will put xxx directly into the Markdown output. You can put an optional space after the /// that will be elided in the Markdown output.
- Everything else will be put verbatim within a fenced code block.

Grammar

Thorin's grammar is defined as a context-free grammar that consists of the terminals defined above as well as the nonterminals and productions defined below. The start symbol is "m" (module).

The follwing tables summarizes the main nonterminals:

Nonterminal	Meaning
m	module
d	declaration
p	`()`-style pattern
b	`[]`-style pattern
e	expression

The following tables comprise all production rules:

Module

LHS	RHS	Comment	Thorin Class
m	dep* d*	module	World
dep	`.import` S `;`	import
dep	`.plugin` S `;`	plugin

Declarations

LHS	RHS	Comment	Thorin Class
d	`.let` (p \| A) `=` e `;`	let	-
d	`.lam` n (`.`? p)+ (`:` e_codom)? ( `=` d* e)? `;`	lambda declaration^s	Lam
d	`.con` n (`.`? p)+ ( `=` d* e)? `;`	continuation declaration^s	Lam
d	`.fun` n (`.`? p)+ (`:` e_ret)? ( `=` d* e)? `;`	function declaration^s	Lam
d	`.Pi` n (`:` e_type)? (`=` e)? `;`	Pi declaration	Pi
d	`.Sigma` n (`:` e_type )? (`,` L_arity)? (`=` b_{[ ]})? `;`	sigma declaration	Sigma
d	`.ax` A `:` e_type (`(` sa `,` ... `,` sa `)`)? (`,` 𝖨_normalizer)? (`,` L_curry)? (`,` L_trip)? `;`	axiom	Axiom
n	𝖨 \| A	identifier or annex name	`fe::Sym`/Annex
sa	𝖨 (`=` 𝖨 `,` ... `,` 𝖨)?	subtag with aliases

Note: An elided type of a .Pi or .Sigma declaration defaults to *.

Patterns

Patterns allow you to decompose a value into its components like in Standard ML or other functional languages.

LHS	RHS	Comment
p	`? 𝖨 (`:` e_type )?	identifier `()`-pattern
p	(`? 𝖨 `::`)? `(` d* (p \| g) `,` ... `,` d* (p \| g) `)`	`()`-`()`-tuple pattern^s
p	(`? 𝖨 `::`)? b_{[ ]}	`[]`-`()`-tuple pattern
b	(`? 𝖨 `:`)? e_type	identifier `[]`-pattern
b	(`? 𝖨 `::`)? b_{[ ]}	`[]`-`[]`-tuple pattern
b_{[ ]}	`[` d* (b \| g) `,` ... `,` d* (b \| g) `]`	`[]`-tuple pattern^s
g	𝖨+ `:` e	group

()-style vs []-style

There are

p: parenthesis-style patterns (()-style), and
b: bracket-style patterns ([]-style) .

The main difference is that

(a, b, c) means (a: ?, b: ?, c: ?) whereas
[a, b, c] means [_: a, _: c, _: d] while
(a: A, b: B, c: C) is the same as [a: A, b: B, C: C].

You can switch from a ()-style pattern to a []-pattern but not vice versa. For this reason there is no rule for a ()-[]-pattern.

Groups

Tuple patterns allow for groups:

(a b c: .Nat, d e: .Bool) means (a: .Nat, b: .Nat, c: .Nat, d: .Bool, e: .Bool).
[a b c: .Nat, d e: .Bool] means [a: .Nat, b: .Nat, c: .Nat, d: .Bool, e: .Bool].

You can introduce an optional name for the whole tuple pattern:

.let abc::(a, b, c) = (1, 2, 3);

This will bind

a to 1,
b to 2,
c to 3, and
abc to (1, 2, 3).

Here is another example:

Π.Tas::[T: *, as: .Nat][%mem.M, %mem.Ptr Tas] → [%mem.M, T]

Rebind

Finally, you can put a ` in front of an identifier of a ()-style pattern to (potentially) rebind a name to a different value. This is particularly useful, when dealing with memory:

.let (`mem, ptr) = %mem.alloc (I32, 0) mem;
.let `mem        = %mem.store (mem, ptr, 23:I32);
.let (`mem, val) = %mem.load (mem, ptr);

Expressions

Kinds & Builtin Types

LHS	RHS	Comment	Thorin Class
e	`.Univ`	universe: type of a type level	Univ
e	`.Type` e	type of level e	Type
e	`*`	alias for `.Type (0:.Univ)`	Type
e	`□`	alias for `.Type (1:.Univ)`	Type
e	`.Nat`	natural number	Nat
e	`.Idx`	builtin of type `.Nat → *`	Idx
e	`.Bool`	alias for `.Bool`	Idx

Literals & Co.

LHS	RHS	Comment	Thorin Class
e	L (`:` e_type)?	literal	Lit
e	X_n	literal of type `.Idx n`	Lit
e	`.ff`	alias for `0_2`	Lit
e	`.tt`	alias for `1_2`	Lit
e	C	character literal of type `.I8`	Lit
e	S	string tuple of type `«n; .I8»`	Tuple
e	`⊥` (`:` e_type)?	bottom	Bot
e	`⊤` (`:` e_type)?	top	Top
e	n	identifier or annex name	`fe::Sym`/Annex
e	`{` d* e `}`	block^s	-

Note

An elided type of

a literal defaults to .Nat,
a bottom/top defaults to *.

Functions

LHS	RHS	Comment	Thorin Class
e	e_dom `→` e_codom	function type	Pi
e	`Π` `.`? b (`.`? b_{[ ]})* (`:` e_codom)?	dependent function type^s	Pi
e	`.Cn` `.`? b (`.`? b_{[ ]})*	continuation type^s	Pi
e	`.Fn` `.`? b (`.`? b_{[ ]})* (`:` e_ret)?	returning continuation type^s	Pi
e	`λ` (`.`? p)+ (`→` e_codom)? `=` d* e	lambda expression^s	Lam
e	`.cn` (`.`? p)+ `=` d* e	continuation expression^s	Lam
e	`.fn` (`.`? p)+ (`→` e_codom)? `=` d* e	function expression^s	Lam
e	e e	application	App
e	e `@` e	application making implicit arguments explicit	App
e	`.ret` p `=` e `$` e `;` d* e	ret expresison	App

Tuples

LHS	RHS	Comment	Thorin Class
e	b_{[ ]}	sigma	Sigma
e	`«` s `;` e_body`»`	array^s	Arr
e	`(` e₀ `,` ... `,` e_n-1`)` (`:` e_type)?	tuple	Tuple
e	`‹` s `;` e_body`›`	pack^s	Pack
e	e `#` e_index	extract	Extract
e	e `#` 𝖨	extract via field "𝖨"	Extract
e	`.ins` `(` e_tuple `,` e_index `,` e_value `)`	insert	Insert
s	e_shape \| S `:` e_shape	shape	-

Precedence

Expressions nesting is disambiguated according to the following precedence table (from strongest to weakest binding):

Level	Operator	Description	Associativity
1	L `:` e	type ascription of a literal	-
2	e `#` e	extract	left-to-right
3	e e	application	left-to-right
3	e `@` e	application making implicit arguments explicit	left-to-right
4	`Π` b	domain of a dependent function type	-
5	e `→` e	function type	right-to-left

Note: The domain of a dependent function type binds slightly stronger than →. This has the effect that
Π T: * → T → T

has the expected binding like this:
(Π T: *) → (T → T)

Otherwise, → would be consumed by the domain:
Π T: (* → (T → T)) ↯

Summary: Functions & Types

The following table summarizes the different tokens used for functions declarations, expressions, and types:

Declaration	Expression	Type
`.lam`	`.lm` `λ`	`\|~\|` `Π`
`.con`	`.cn`	`.Cn`
`.fun`	`.fn`	`.Fn`

Declarations

The following function declarations are all equivalent:

.lam f(T: *)((x y: T), return: T → ⊥)@(.ff): ⊥ = return x;
.con f(T: *)((x y: T), return: .Cn T)          = return x;
.fun f(T: *) (x y: T): T                       = return x;

Note that all partial evaluation filters default to .tt except for .con/.cn/.fun/.fn.

Expressions

The following function expressions are all equivalent. What is more, since they are bound by a let declaration, they have the exact same effect as the function declarations above:

.let f =   λ (T: *)((x y: T), return: T → ⊥)@(.ff): ⊥ = return x;
.let f = .lm (T: *)((x y: T), return: T → ⊥)      : ⊥ = return x;
.let f = .cn (T: *)((x y: T), return: .Cn T)          = return x;
.let f = .fn (T: *) (x y: T): T                       = return x;

Applications

The following expressions for applying f are also equivalent:

f .Nat ((23, 42), .cn res: .Nat = use(res))

.ret res = f .Nat $ (23, 42); use(res)

Function Types

Finally, the following function types are all equivalent and denote the type of f above.

 Π  [T:*][[T, T], T → ⊥] → ⊥
.Cn [T:*][[T, T], .Cn T]
.Fn [T:*] [T, T] → T

Scoping

Thorin uses lexical scoping where all names live within the same namespace - with a few exceptions noted below.

Note: The grammar tables above also indiciate which constructs open new scopes (and close them again) with this^s annotation in the Comments column.

Underscore

The symbol _ is special and never binds to an entity. For this reason, _ can be bound over and over again within the same scope (without effect). Hence, using the symbol _ will always result in a scoping error.

Pis

Note: Only
Π x: e → e

introduces a new scope whereas
x: e → e

is a syntax error. If the variable name of a Pi's domain is elided and the domain is a sigma, its elements will be imported into the Pi's scope to make these elements available in the Pi's codomain:
Π [T: *, U: *] → [T, U]

Annex

Annex names are special and live in a global namespace.

Field Names of Sigmas

Named elements of mutable sigmas are available for extracts/inserts.

Note: These names take precedence over the usual scope. In the following example, i refers to the first element i of X and not to the i introduced via .let:
.let i = 1_2;

Π X: [i: .Nat, j: .Nat] → f X#i;

Use parentheses to refer to the .let-bounded i:
.let i = 1_2;

Π X: [i: .Nat, j: .Nat] → f X#(i);

Table of Contents

Notation

Lexical Structure

Terminals

Primary Terminals

Keywords

Other Terminals

Comments

Grammar

Module

Declarations

Patterns

()-style vs []-style

Groups

Rebind

Expressions

Kinds & Builtin Types

Literals & Co.

Functions

Tuples

Precedence

Summary: Functions & Types

Declarations

Expressions

Applications

Function Types

Scoping

Underscore

Pis

Annex

Field Names of Sigmas