MimIR 0.1
MimIR is my Intermediate Representation
Loading...
Searching...
No Matches
Mim Language Reference

Notation

We use the following notation:

Notation Meaning
a literally the terminal token a
[ab] matches a or b
[a-c] matches a - c
a* zero or more repetitions of "a"
a+ one or more repetitions of "a"
a? "a" is optional
a , ... , a ,-separated list of zero or more "a" elements; may contain a trailing , at the end of the list

Lexical Structure

Mim files are UTF-8 encoded and lexed from left to right.

Note
The maximal munch strategy resolves any ambiguities in the lexical rules below. For Example, <<< is lexed as << and <.

Terminals

The actual terminals are specified in the following tables.

Primary Terminals

The grammatical rules will directly reference these primary terminals. Secondary terminals are ASCII-only tokens that represent the same lexical element as its corresponding primary token. For example, the lexer doesn't care, if you use or .bot. Both tokens are identified as .

Primary Terminals Secondary Terminals Comment
( ) [ ] { } delimiters
« » << >> < > UTF-8 delimiters
λ -> .bot .top * lm further UTF-8 tokens
= , ; . # : % @ further tokens
<eof> marks the end of the file

In addition you can use , , , and as an alternative for , , «, and ».

Keywords

In addition the following keywords are terminals:

Terminal Comment
.module starts a module
import imports another Mim file
plugin like import and additionally loads the compiler plugin
axm axiom
let let declaration
con continuation declaration
fun function declaration
lam lambda declaration
ret ret expression
cn continuation expression
fn function expression
lm lambda expression
Sigma Sigma declaration
extern marks function as external
ins mim::Insert expression
insert alias for ins
Nat mim::Nat
Idx mim::Idx
Type mim::Type
Univ mim::Univ
ff alias for 0₂
tt alias for 1₂
Terminal Alias Terminal Alias
tt ff 0₂ 1₂ Bool Idx i1
i1 2 I1 Idx i1
i8 0x100 I8 Idx i8
i16 0x1'0000 I16 Idx i16
i32 0x1'0000'0000 I32 Idx i32
i64 0 I64 Idx i64

All keywords start with a . to prevent name clashes with identifiers.

Other Terminals

The following terminals comprise more complicated patterns:

Terminal Regular Expression Comment
𝖨 sym identifier
A % sym . sym (. sym)? Annex name
L dec+ unsigned decimal literal
L 0b bin+ unsigned binary literal
L 0o oct+ unsigned octal literal
L 0x hex+ unsigned hexadecimal literal
L sign dec+ signed decimal literal
L sign 0b bin+ signed binary literal
L sign 0o oct+ signed octal literal
L sign 0x hex+ signed hexadecimal literal
L sign? dec+ eE sign dec+ floating-point literal
L sign? dec+ . dec* (eE sign dec+)? floating-point literal
L sign? dec* . dec+ (eE sign dec+)? floating-point literal
L sign? 0x hex+ pP sign dec+ floating-point hexadecimal literal
L sign? 0x hex+ . hex* pP sign dec+ floating-point hexadecimal literal
L sign? 0x hex* . hex+ pP sign dec+ floating-point hexadecimal literal
Xn dec+ sub+n index literal of type Idx n
Xn dec+ _ dec+n index literal of type Idx n
C '(a | esc)' character literal; a ∊ ASCII ∖ {\, '}
S "(a | esc)\*" string literal; a ∊ ASCII ∖ {\, "}

The previous table resorts to the following definitions as shorthand:

Name Definition Comment
0b 0 [ bB ] prefix for binary literals
0o 0 [ oO ] prefix for octal literals
0x 0 [ xX ] prefix for hexadecimal literals
bin [ 01 ] binary digit
oct [ 0-7 ] octal digit
dec [ 0-9 ] decimal digit
sub [ - ] subscript digit (always decimal)
hex [ 0-9a-fA-F ] hexadecimal digit
eE [ e E ] exponent in floating point literals
pP [ p P ] exponent in floating point hexadecimal literals
sign [ + - ]
sym [ _a-zA-Z ][ ._0-9a-zA-Z ]* symbol
esc [ \'\"\0\a\b\f\n\r\t\v ] escape sequences

So, sym refers to the shorthand rule while 𝖨 refers to the terminal that is identical to sym. However, the terminal A also uses the shorthand rule sym.

Comments

In addition, the following comments are available:

  • /* ... */: multi-line comment
  • //: single-line comment
  • ///: single-line comment for Markdown output:
    • Single-line /// xxx comments will put xxx directly into the Markdown output. You can put an optional space after the /// that will be elided in the Markdown output.
    • Everything else will be put verbatim within a fenced code block.

Grammar

Mim's grammar is defined as a context-free grammar that consists of the terminals defined above as well as the nonterminals and productions defined below. The start symbol is "m" (module).

The follwing tables summarizes the main nonterminals:

Nonterminal Meaning
m module
d declaration
p ()-style pattern
b []-style pattern
e expression

The following tables comprise all production rules:

Module

LHS RHS Comment MimIR Class
m dep* d* module World
dep import S ; import
dep plugin S ; plugin

Declarations

LHS RHS Comment MimIR Class
d let (p | A) = e ; let -
d lam n (.? p)+ (: ecodom)? ( = d* e)? ; lambda declarations Lam
d con n (.? p)+ ( = d* e)? ; continuation declarations Lam
d fun n (.? p)+ (: eret)? ( = d* e)? ; function declarations Lam
d Sigma n (: etype )? (, Larity)? (= b[ ])? ; sigma declaration Sigma
d axm A : etype (( sa , ... , sa ))?
(, 𝖨normalizer)? (, Lcurry)? (, Ltrip)? ;
axiom Axiom
n 𝖨 | A identifier or annex name fe::Sym/Annex
sa 𝖨 (= 𝖨 , ... , 𝖨)? subtag with aliases
Note
An elided type of a .Pi or Sigma declaration defaults to *.

Patterns

Patterns allow you to decompose a value into its components like in Standard ML or other functional languages.

LHS RHS Comment
p `? 𝖨 (: etype )? identifier ()-pattern
p (`? 𝖨 ::)? ( d* (p | g) , ... , d* (p | g) ) ()-()-tuple patterns
p (`? 𝖨 ::)? b[ ] []-()-tuple pattern
b (`? 𝖨 :)? etype identifier []-pattern
b (`? 𝖨 ::)? b[ ] []-[]-tuple pattern
b[ ] [ d* (b | g) , ... , d* (b | g) ] []-tuple patterns
g 𝖨+ : e group

()-style vs []-style

There are

  • p: parenthesis-style patterns (()-style), and
  • b: bracket-style patterns ([]-style) .

The main difference is that

  • (a, b, c) means (a: ?, b: ?, c: ?) whereas
  • [a, b, c] means [_: a, _: c, _: d] while
  • (a: A, b: B, c: C) is the same as [a: A, b: B, C: C].

You can switch from a ()-style pattern to a []-pattern but not vice versa. For this reason there is no rule for a ()-[]-pattern.

Groups

Tuple patterns allow for groups:

  • (a b c: Nat, d e: Bool) means (a: Nat, b: Nat, c: Nat, d: Bool, e: Bool).
  • [a b c: Nat, d e: Bool] means [a: Nat, b: Nat, c: Nat, d: Bool, e: Bool].

You can introduce an optional name for the whole tuple pattern:

let abc::(a, b, c) = (1, 2, 3);

This will bind

  • a to 1,
  • b to 2,
  • c to 3, and
  • abc to (1, 2, 3).

Here is another example:

{T: *, as: Nat}::Tas [%mem.M, %mem.Ptr Tas] → [%mem.M, T]

Rebind

Finally, you can put a ` in front of an identifier of a ()-style pattern to (potentially) rebind a name to a different value. This is particularly useful, when dealing with memory:

let (`mem, ptr) = %mem.alloc (I32, 0) mem;
let `mem = %mem.store (mem, ptr, 23:I32);
let (`mem, val) = %mem.load (mem, ptr);

Expressions

Kinds & Builtin Types

LHS RHS Comment MimIR Class
e Univ universe: type of a type level Univ
e Type e type of level e Type
e * alias for Type (0:Univ) Type
e alias for Type (1:Univ) Type
e Nat natural number Nat
e Idx builtin of type Nat → * Idx
e Bool alias for Bool Idx

Literals & Co

LHS RHS Comment MimIR Class
e L (: etype)? literal Lit
e Xn literal of type Idx n Lit
e ff alias for 0_2 Lit
e tt alias for 1_2 Lit
e C character literal of type I8 Lit
e S string tuple of type «n; I8» Tuple
e (: etype)? bottom Bot
e (: etype)? top Top
e n identifier or annex name fe::Sym/Annex
e { d* e } blocks -
Note
An elided type of
  • a literal defaults to Nat,
  • a bottom/top defaults to *.

Functions

LHS RHS Comment MimIR Class
e edom ecodom function type Pi
e Π .? b (.? b[ ])* (: ecodom)? dependent function types Pi
e Cn .? b (.? b[ ])* continuation types Pi
e Fn .? b (.? b[ ])* (: eret)? returning continuation types Pi
e λ (.? p)+ ( ecodom)? = d* e lambda expressions Lam
e cn (.? p)+ = d* e continuation expressions Lam
e fn (.? p)+ ( ecodom)? = d* e function expressions Lam
e e e application App
e e @ e application making implicit arguments explicit App
e ret p = e $ e ; d* e ret expresison App

Tuples

LHS RHS Comment MimIR Class
e b[ ] sigma Sigma
e « s ; ebody» arrays Arr
e ( e0 , ... , en-1) (: etype)? tuple Tuple
e s ; ebody packs Pack
e e # eindex extract Extract
e e # 𝖨 extract via field "𝖨" Extract
e ins ( etuple , eindex , evalue ) insert Insert
s eshape | S : eshape shape -

Precedence

Expressions nesting is disambiguated according to the following precedence table (from strongest to weakest binding):

Level Operator Description Associativity
1 L : e type ascription of a literal -
2 e # e extract left-to-right
3 e e application left-to-right
3 e @ e application making implicit arguments explicit left-to-right
4 e e function type right-to-left

Summary: Functions & Types

The following table summarizes the different tokens used for functions declarations, expressions, and types:

Declaration Expression Type
lam lm
λ
->
con cn Cn
fun fn Fn

Declarations

The following function declarations are all equivalent:

lam f(T: *)((x y: T), return: T → ⊥)@(ff): ⊥ = return x;
con f(T: *)((x y: T), return: Cn T) = return x;
fun f(T: *) (x y: T): T = return x;

Note that all partial evaluation filters default to tt except for con/cn/fun/fn.

Expressions

The following function expressions are all equivalent. What is more, since they are bound by a let declaration, they have the exact same effect as the function declarations above:

let f = λ (T: *)((x y: T), return: T → ⊥)@(ff): ⊥ = return x;
let f = lm (T: *)((x y: T), return: T → ⊥) : ⊥ = return x;
let f = cn (T: *)((x y: T), return: Cn T) = return x;
let f = fn (T: *) (x y: T): T = return x;

Applications

The following expressions for applying f are also equivalent:

f Nat ((23, 42), cn res: Nat = use(res))
ret res = f Nat $ (23, 42); use(res)

Function Types

Finally, the following function types are all equivalent and denote the type of f above.

[T:*] [[T, T], T → ⊥] → ⊥
Cn [T:*] [[T, T], Cn T]
Fn [T:*] [T, T] → T

Scoping

Mim uses lexical scoping where all names live within the same namespace - with a few exceptions noted below.

Note
The grammar tables above also indiciate which constructs open new scopes (and close them again) with thiss annotation in the Comments column.

Underscore

The symbol _ is special and never binds to an entity. For this reason, _ can be bound over and over again within the same scope (without effect). Hence, using the symbol _ will always result in a scoping error.

Annex

Annex names are special and live in a global namespace.

Field Names of Sigmas

Named elements of mutable sigmas are available for extracts/inserts.

Note
These names take precedence over the usual scope. In the following example, i refers to the first element i of X and not to the i introduced via let:
let i = 1_2;
[i: Nat, j: Nat]::X → f X#i;

Use parentheses to refer to the let-bounded i:

let i = 1_2;
[i: Nat, j: Nat]::X → f X#(i);