MimIR 0.1
MimIR is my Intermediate Representation
|
We use the following notation:
Notation | Meaning |
---|---|
a | literally the terminal token a |
[a b ] | matches a or b |
[a -c ] | matches a - c |
a* | zero or more repetitions of "a" |
a+ | one or more repetitions of "a" |
a? | "a" is optional |
a , ... , a | , -separated list of zero or more "a" elements; may contain a trailing , at the end of the list |
Mim files are UTF-8 encoded and lexed from left to right.
<<<
is lexed as <<
and <
.The actual terminals are specified in the following tables.
The grammatical rules will directly reference these primary terminals. Secondary terminals are ASCII-only tokens that represent the same lexical element as its corresponding primary token. For example, the lexer doesn't care, if you use ⊥
or .bot
. Both tokens are identified as ⊥
.
Primary Terminals | Secondary Terminals | Comment |
---|---|---|
( ) [ ] { } | delimiters | |
‹ › « » | << >> < > | UTF-8 delimiters |
→ ⊥ ⊤ ★ □ λ | -> .bot .top * lm | further UTF-8 tokens |
= , ; . # : % @ | further tokens | |
<eof> | marks the end of the file |
In addition you can use ⟨
, ⟩
, ⟪
, and ⟫
as an alternative for ‹
, ›
, «
, and »
.
In addition the following keywords are terminals:
Terminal | Comment |
---|---|
.module | starts a module |
import | imports another Mim file |
plugin | like import and additionally loads the compiler plugin |
axm | axiom |
let | let declaration |
con | continuation declaration |
fun | function declaration |
lam | lambda declaration |
ret | ret expression |
cn | continuation expression |
fn | function expression |
lm | lambda expression |
Sigma | Sigma declaration |
extern | marks function as external |
ins | mim::Insert expression |
insert | alias for ins |
Nat | mim::Nat |
Idx | mim::Idx |
Type | mim::Type |
Univ | mim::Univ |
ff | alias for 0₂ |
tt | alias for 1₂ |
Terminal | Alias | Terminal | Alias |
---|---|---|---|
tt ff | 0₂ 1₂ | Bool | Idx i1 |
i1 | 2 | I1 | Idx i1 |
i8 | 0x100 | I8 | Idx i8 |
i16 | 0x1'0000 | I16 | Idx i16 |
i32 | 0x1'0000'0000 | I32 | Idx i32 |
i64 | 0 | I64 | Idx i64 |
All keywords start with a .
to prevent name clashes with identifiers.
The following terminals comprise more complicated patterns:
Terminal | Regular Expression | Comment |
---|---|---|
𝖨 | sym | identifier |
A | % sym . sym (. sym)? | Annex name |
L | dec+ | unsigned decimal literal |
L | 0b bin+ | unsigned binary literal |
L | 0o oct+ | unsigned octal literal |
L | 0x hex+ | unsigned hexadecimal literal |
L | sign dec+ | signed decimal literal |
L | sign 0b bin+ | signed binary literal |
L | sign 0o oct+ | signed octal literal |
L | sign 0x hex+ | signed hexadecimal literal |
L | sign? dec+ eE sign dec+ | floating-point literal |
L | sign? dec+ . dec* (eE sign dec+)? | floating-point literal |
L | sign? dec* . dec+ (eE sign dec+)? | floating-point literal |
L | sign? 0x hex+ pP sign dec+ | floating-point hexadecimal literal |
L | sign? 0x hex+ . hex* pP sign dec+ | floating-point hexadecimal literal |
L | sign? 0x hex* . hex+ pP sign dec+ | floating-point hexadecimal literal |
Xn | dec+ sub+n | index literal of type Idx n |
Xn | dec+ _ dec+n | index literal of type Idx n |
C | ' (a | esc)' | character literal; a ∊ ASCII ∖ {\ , ' } |
S | " (a | esc)\*" | string literal; a ∊ ASCII ∖ {\ , " } |
The previous table resorts to the following definitions as shorthand:
Name | Definition | Comment |
---|---|---|
0b | 0 [ b B ] | prefix for binary literals |
0o | 0 [ o O ] | prefix for octal literals |
0x | 0 [ x X ] | prefix for hexadecimal literals |
bin | [ 0 1 ] | binary digit |
oct | [ 0 -7 ] | octal digit |
dec | [ 0 -9 ] | decimal digit |
sub | [ ₀ -₉ ] | subscript digit (always decimal) |
hex | [ 0 -9 a -f A -F ] | hexadecimal digit |
eE | [ e E ] | exponent in floating point literals |
pP | [ p P ] | exponent in floating point hexadecimal literals |
sign | [ + - ] | |
sym | [ _ a -z A -Z ][ . _ 0 -9 a -z A -Z ]* | symbol |
esc | [ \' \" \0 \a \b \f \n \r \t \v ] | escape sequences |
So, sym refers to the shorthand rule while 𝖨 refers to the terminal that is identical to sym. However, the terminal A also uses the shorthand rule sym.
In addition, the following comments are available:
/* ... */
: multi-line comment//
: single-line comment///
: single-line comment for Markdown output:/// xxx
comments will put xxx
directly into the Markdown output. You can put an optional space after the ///
that will be elided in the Markdown output.Mim's grammar is defined as a context-free grammar that consists of the terminals defined above as well as the nonterminals and productions defined below. The start symbol is "m" (module).
The follwing tables summarizes the main nonterminals:
Nonterminal | Meaning |
---|---|
m | module |
d | declaration |
p | () -style pattern |
b | [] -style pattern |
e | expression |
The following tables comprise all production rules:
LHS | RHS | Comment | MimIR Class |
---|---|---|---|
m | dep* d* | module | World |
dep | import S ; | import | |
dep | plugin S ; | plugin |
LHS | RHS | Comment | MimIR Class |
---|---|---|---|
d | let (p | A) = e ; | let | - |
d | lam n (. ? p)+ (: ecodom)? ( = d* e)? ; | lambda declarations | Lam |
d | con n (. ? p)+ ( = d* e)? ; | continuation declarations | Lam |
d | fun n (. ? p)+ (: eret)? ( = d* e)? ; | function declarations | Lam |
d | Sigma n (: etype )? (, Larity)? (= b[ ])? ; | sigma declaration | Sigma |
d | axm A : etype (( sa , ... , sa ) )? ( , 𝖨normalizer)? (, Lcurry)? (, Ltrip)? ; | axiom | Axiom |
n | 𝖨 | A | identifier or annex name | fe::Sym /Annex |
sa | 𝖨 (= 𝖨 , ... , 𝖨)? | subtag with aliases |
.Pi
or Sigma
declaration defaults to *
.Patterns allow you to decompose a value into its components like in Standard ML or other functional languages.
LHS | RHS | Comment |
---|---|---|
p | ` ? 𝖨 (: etype )? | identifier () -pattern |
p | (` ? 𝖨 :: )? ( d* (p | g) , ... , d* (p | g) ) | () -() -tuple patterns |
p | (` ? 𝖨 :: )? b[ ] | [] -() -tuple pattern |
b | (` ? 𝖨 : )? etype | identifier [] -pattern |
b | (` ? 𝖨 :: )? b[ ] | [] -[] -tuple pattern |
b[ ] | [ d* (b | g) , ... , d* (b | g) ] | [] -tuple patterns |
g | 𝖨+ : e | group |
There are
()
-style), and[]
-style) .The main difference is that
(a, b, c)
means (a: ?, b: ?, c: ?)
whereas[a, b, c]
means [_: a, _: c, _: d]
while(a: A, b: B, c: C)
is the same as [a: A, b: B, C: C]
.You can switch from a ()
-style pattern to a []
-pattern but not vice versa. For this reason there is no rule for a ()
-[]
-pattern.
Tuple patterns allow for groups:
(a b c: Nat, d e: Bool)
means (a: Nat, b: Nat, c: Nat, d: Bool, e: Bool)
.[a b c: Nat, d e: Bool]
means [a: Nat, b: Nat, c: Nat, d: Bool, e: Bool]
.You can wrap a pattern into an alias pattern:
This will bind
a
to 1
,b
to 2
,c
to 3
, andabc
to (1, 2, 3)
.Here is another example:
A let
and ret
expression allows you to rebind the same name to a different value: This is particularly useful, when dealing with memory:
LHS | RHS | Comment | MimIR Class |
---|---|---|---|
e | Univ | universe: type of a type level | Univ |
e | Type e | type of level e | Type |
e | * | alias for Type (0:Univ) | Type |
e | □ | alias for Type (1:Univ) | Type |
e | Nat | natural number | Nat |
e | Idx | builtin of type Nat → * | Idx |
e | Bool | alias for Bool | Idx |
LHS | RHS | Comment | MimIR Class |
---|---|---|---|
e | L (: etype)? | literal | Lit |
e | Xn | literal of type Idx n | Lit |
e | ff | alias for 0_2 | Lit |
e | tt | alias for 1_2 | Lit |
e | C | character literal of type I8 | Lit |
e | S | string tuple of type «n; I8» | Tuple |
e | ⊥ (: etype)? | bottom | Bot |
e | ⊤ (: etype)? | top | Top |
e | n | identifier or annex name | fe::Sym /Annex |
e | { d* e } | blocks | - |
Nat
,*
.LHS | RHS | Comment | MimIR Class |
---|---|---|---|
e | edom → ecodom | function type | Pi |
e | Π . ? b (. ? b[ ])* (: ecodom)? | dependent function types | Pi |
e | Cn . ? b (. ? b[ ])* | continuation types | Pi |
e | Fn . ? b (. ? b[ ])* (: eret)? | returning continuation types | Pi |
e | λ (. ? p)+ (→ ecodom)? = d* e | lambda expressions | Lam |
e | cn (. ? p)+ = d* e | continuation expressions | Lam |
e | fn (. ? p)+ (→ ecodom)? = d* e | function expressions | Lam |
e | e e | application | App |
e | e @ e | application making implicit arguments explicit | App |
e | ret p = e $ e ; d* e | ret expresison | App |
LHS | RHS | Comment | MimIR Class |
---|---|---|---|
e | b[ ] | sigma | Sigma |
e | « s ; ebody» | arrays | Arr |
e | ( e0 , ... , en-1) (: etype)? | tuple | Tuple |
e | ‹ s ; ebody› | packs | Pack |
e | e # eindex | extract | Extract |
e | e # 𝖨 | extract via field "𝖨" | Extract |
e | ins ( etuple , eindex , evalue ) | insert | Insert |
s | eshape | S : eshape | shape | - |
Expressions nesting is disambiguated according to the following precedence table (from strongest to weakest binding):
Level | Operator | Description | Associativity |
---|---|---|---|
1 | L : e | type ascription of a literal | - |
2 | e # e | extract | left-to-right |
3 | e e | application | left-to-right |
3 | e @ e | application making implicit arguments explicit | left-to-right |
4 | e → e | function type | right-to-left |
The following table summarizes the different tokens used for functions declarations, expressions, and types:
Declaration | Expression | Type |
---|---|---|
lam | lm λ | -> |
con | cn | Cn |
fun | fn | Fn |
The following function declarations are all equivalent:
Note that all partial evaluation filters default to tt
except for con
/cn
/fun
/fn
.
The following function expressions are all equivalent. What is more, since they are bound by a let declaration, they have the exact same effect as the function declarations above:
The following expressions for applying f
are also equivalent:
Finally, the following function types are all equivalent and denote the type of f
above.
Mim uses lexical scoping where all names live within the same namespace - with a few exceptions noted below.
The symbol _
is special and never binds to an entity. For this reason, _
can be bound over and over again within the same scope (without effect). Hence, using the symbol _
will always result in a scoping error.
Annex names are special and live in a global namespace.
Named elements of mutable sigmas are available for extracts/inserts.
i
refers to the first element i
of X
and not to the i
introduced via let
:Use parentheses to refer to the let
-bounded i
: