MimIR 0.1
MimIR is my Intermediate Representation
Loading...
Searching...
No Matches
Phases

At a high level, a phase is a compiler transformation or analysis step that runs in isolation. Phases are meant to do one thing at a time and to compose in a straightforward sequence. See also the Rewriting Guide, since several phase families are built directly on top of Rewriter.

Overview

A Phase is a Stage with a single entry point, run(), which wraps the actual implementation in start().

Phase

Phase is the minimal base class.

A phase has:

  • a name or annex,
  • access to the current World,
  • a run() wrapper for logging and verification,
  • a todo() accessor backed by the internal todo_ flag for fixed-point iteration.

A key design point is the internal todo_ flag exposed through todo():

Note
A phase sets todo_ = true if its work discovers that another round is needed. PhaseMan uses this to drive fixed-point pipelines.

Typical Shape

A custom phase usually derives from Phase and implements start():

class MyPhase : public mim::Phase {
public:
MyPhase(World& world)
: Phase(world, "my_phase") {}
private:
void start() override {
// do work here
}
};
As opposed to a Pass, a Phase does one thing at a time and does not mix with other Phases.
Definition phase.h:27
virtual void start()=0
Actual entry.

Run it with:

virtual void run()
Entry point and generates some debug output; invokes Phase::start.
Definition phase.cpp:13

Analysis

Analysis is the base class for phases that inspect the current world using the Rewriter traversal machinery. It inherits from both Phase and Rewriter, but unlike RWPhase, it rewrites into the same world. In practice, this means Rewriter is used as a structured, graph-aware traversal over ordinary MimIR terms.

Note
An Analysis based on Rewriter has a domain of ordinary Defs.

This is often convenient because analysis information can itself be represented as regular MimIR terms. As a result, existing IR machinery applies automatically, including:

  • hash-consing / canonical sharing,
  • built-in normalizations,
  • and other simplifications already provided by the World.

So if your abstract domain fits naturally into MimIR, you can often encode it directly as Def and reuse the existing IR infrastructure.

An Analysis visits:

  1. all registered annex roots, then
  2. all external mutables.

During the first part, mim::Analysis::is_bootstrapping() is true. During the second part, it is false.

Typical usage:

  • override rewrite(), rewrite_imm(), rewrite_mut(), or node-specific rewrite hooks,
  • compute analysis information while traversing reachable IR,
  • store that information in side tables,
  • set todo_ = true if new information was discovered and another iteration is needed.

Reset Between Iterations

If an analysis participates in a fixed-point loop, it should be ready to run multiple times. The base reset() clears the rewriter state and resets todo_ for the next round.

RWPhase

RWPhase is the base class for phases that rebuild the current world into a new one, thereby eliminating garbage. This is the standard base class for optimization phases that structurally transform IR.

It inherits from both Phase and Rewriter, but here the two worlds differ:

Note
To avoid confusion, direct world() access is deleted. Use:

Execution Model

An RWPhase runs in three conceptual steps:

  1. optionally perform a fixed-point analysis on the old world,
  2. rewrite reachable old Defs into the new world:
    1. rewrite annex roots,
    2. rewrite external mutables;
  3. swap the old and new worlds.

After the swap, the rewritten world becomes the current one.

Note
An RWPhase follows the standard whole-world transformation pattern: analyze the old program, rebuild the transformed program, then replace the old world.

Cleanup

Cleanup is simply an RWPhase with no custom rewrites. Because an RWPhase reconstructs only what is reachable from the world roots, rebuilding automatically eliminates dead and unreachable code.

Bootstrapping

Like Analysis, RWPhase processes annex roots before externals.

While annexes are being rewritten, mim::RWPhase::is_bootstrapping() is true.

This matters because annexes may depend on one another. During bootstrapping, rewrites that require other annexes to already exist in the new world may need to be deferred or skipped.

Optional Pre-Analysis

An RWPhase may be given an associated Analysis. If so, analyze() runs that analysis to a fixed point before rewriting begins.

This is a common pattern:

  • the analysis computes facts on the old world,
  • the rewrite uses those facts to produce the new world.

If no analysis is needed, analyze() can simply return false.

Typical Shape

class MyRWPhase : public mim::RWPhase {
public:
MyRWPhase(World& world)
: RWPhase(world, "my_rw_phase") {}
private:
const Def* rewrite_imm_App(const App* app) override {
// customize rebuilding here
return Rewriter::rewrite_imm_App(app);
}
};
Rewrites the RWPhase::old_world into the RWPhase::new_world and swaps them afterwards.
Definition phase.h:111

Run it with:

PhaseMan

PhaseMan organizes several phases into a pipeline.

It can run them:

  • once, in sequence, or
  • repeatedly to a fixed point.

A fixed-point PhaseMan reruns the whole pipeline as long as at least one phase sets todo_ = true.

Between iterations, each phase is recreated from its original configuration. This keeps phase-local state from leaking across rounds unless the phase explicitly recomputes it.

Note
PhaseMan is the orchestration layer for classical phase pipelines.

Typical Usage

auto phases = mim::Phases();
phases.emplace_back(std::make_unique<PhaseA>(world));
phases.emplace_back(std::make_unique<PhaseB>(world));
man.apply(/*fixed_point=*/true, std::move(phases));
man.run();
Organizes several Phases in a a pipeline.
Definition phase.h:263
std::deque< std::unique_ptr< Phase > > Phases
Definition phase.h:22

Use a fixed-point pipeline when phases expose new optimization opportunities for one another.

ClosedMutPhase

ClosedMutPhase is a traversal helper for phases that visit all reachable, closed mutables in the world.

A mutable is relevant here if it is:

  • reachable,
  • closed, i.e. it has no free variables,
  • optionally non-empty, depending on elide_empty.

This is useful for local analyses or transformations that are naturally phrased as:

Note
For every reachable closed mutable, inspect or process it.

You override visit(M*), where M defaults to Def but may be restricted to a particular mutable subtype.

Typical Shape

class MyClosedPhase : public mim::ClosedMutPhase<Lam> {
public:
MyClosedPhase(World& world)
: ClosedMutPhase(world, "my_closed_phase", /*elide_empty=*/true) {}
private:
void visit(Lam* lam) override {
// process each reachable closed Lam
}
};
Transitively visits all reachable, closed mutables in the World.
Definition phase.h:294
virtual void visit(M *)=0

NestPhase

NestPhase builds on ClosedMutPhase and computes a Nest for each visited mutable.

Use it when your phase needs a structured view of nested control or binding structure rather than just the raw mutable.

Instead of overriding visit(M*), override:

visit(const Nest&)

This is convenient for analyses that reason about nesting, dominance-like structure, or hierarchical regions.

Example: SCCP

#pragma once
#include <mim/phase.h>
namespace mim {
/// SCCP - Sparse Conditional Constant Propagation - but propagates **arbitrary expressions**.
/// @see [Constant propagation with conditional branches](https://dl.acm.org/doi/pdf/10.1145/103135.103136)
///
/// Lattice per Lam::var:
/// ```
/// ⊤ ← Keep as is
/// |
/// Expr ← Whole expression is propagated (vertically) through var
/// |
/// ⊥
/// ```
class SCCP : public RWPhase {
private:
class Analysis : public mim::Analysis {
public:
Analysis(World& world)
: mim::Analysis(world, "SCCP::Analyzer") {}
auto& lattice() { return lattice_; }
private:
Def* rewrite_mut(Def*) final;
const Def* propagate(const Def* var, const Def* def);
const Def* rewrite_imm_App(const App* app) final;
Def2Def lattice_;
};
public:
SCCP(World& world)
: RWPhase(world, "SCCP", &analysis_)
, analysis_(world) {}
private:
const Def* lattice(const Def* def) { return analysis_.lattice()[def]; }
const Def* rewrite_imm_App(const App* old_app) final;
Analysis analysis_;
Lam2Lam lam2lam_;
};
} // namespace mim
This Phase will recursively Rewriter::rewrite.
Definition phase.h:66
Definition ast.h:14
DefMap< const Def * > Def2Def
Definition def.h:76
LamMap< Lam * > Lam2Lam
Definition lam.h:221

The provided Sparse Conditional Constant Propagation (SCCP) implementation is a good example of the intended phase structure. Its architecture is:

  • an inner Analysis computes propagation facts on the old world,
  • an outer RWPhase uses those facts to rebuild a simplified new world.
Note
The implementation propagates not only constants but also arbitrary expressions.

Analysis

#include "sccp.h"
namespace mim {
Def* SCCP::Analysis::rewrite_mut(Def* mut) {
map(mut, mut);
if (auto var = mut->has_var()) {
map(var, var);
if (mut->isa<Lam>())
for (auto var : mut->tvars()) {
map(var, var);
lattice_[var] = var;
}
}
for (auto d : mut->deps())
rewrite(d);
return mut;
}
const Def* SCCP::Analysis::propagate(const Def* var, const Def* def) {
auto [i, ins] = lattice_.emplace(var, def);
if (ins) {
todo_ = true;
DLOG("propagate: {} → {}", var, def);
return def;
}
auto cur = i->second;
if (!cur || def->isa<Bot>() || cur == def || cur == var) return cur;
todo_ = true;
if (cur->isa<Bot>()) return i->second = def;
return i->second = var; // top
}
const Def* SCCP::Analysis::rewrite_imm_App(const App* app) {
if (auto lam = app->callee()->isa_mut<Lam>(); isa_optimizable(lam)) {
auto n = app->num_targs();
auto abstr_args = absl::FixedArray<const Def*>(n);
auto abstr_vars = absl::FixedArray<const Def*>(n);
// propagate
for (size_t i = 0; i != n; ++i) {
auto abstr = rewrite(app->targ(i));
abstr_vars[i] = propagate(lam->tvar(i), abstr);
abstr_args[i] = abstr;
}
// set new abstract var
auto abstr_var = world().tuple(abstr_vars);
map(lam->var(), abstr_var);
lattice_[lam->var()] = abstr_var;
if (!lookup(lam))
for (auto d : lam->deps())
rewrite(d);
return world().app(lam, abstr_args);
}
return mim::Analysis::rewrite_imm_App(app);
}
} // namespace mim
Base class for all Defs.
Definition def.h:252
A function.
Definition lam.h:110
#define DLOG(...)
Vaporizes to nothingness in Debug build.
Definition log.h:95
auto lookup(const C &container, const K &key)
Yields pointer to element (or the element itself if it is already a pointer), if found and nullptr ot...
Definition util.h:100
Lam * isa_optimizable(Lam *lam)
These are Lams that are.
Definition lam.h:352
TExt< false > Bot
Definition lattice.h:171
@ Lam
Definition def.h:115

The SCCP analysis associates each lambda variable with a lattice value:

  • bottom: no useful information yet,
  • a concrete expression: this value can be propagated,
  • top: keep the variable as-is (a Def maps to itself).

In the implementation, this lattice is stored as a Def2Def map. A nice aspect here is that the propagated value is itself a regular Def. This illustrates the benefit of building analysis on top of Rewriter: the abstract domain can live directly inside MimIR, so canonicalization and normalization come for free.

The analysis traverses the old world and updates the lattice when it sees applications of optimizable lambdas. If new information is discovered, it sets todo_ = true, causing the analysis to rerun until stable. This is a textbook use of Analysis:

  • walk the old IR,
  • collect facts,
  • iterate to a fixed point.

Transformation

#include "sccp.h"
namespace mim {
const Def* SCCP::rewrite_imm_App(const App* old_app) {
if (auto old_lam = old_app->callee()->isa_mut<Lam>()) {
if (auto l = lattice(old_lam->var()); l && l != old_lam->var()) {
todo_ = true;
size_t num_old = old_lam->num_tvars();
Lam* new_lam;
if (auto i = lam2lam_.find(old_lam); i != lam2lam_.end())
new_lam = i->second;
else {
// build new dom
auto new_doms = DefVec();
for (size_t i = 0; i != num_old; ++i) {
auto old_var = old_lam->var(num_old, i);
auto abstr = lattice(old_var);
if (old_var == abstr) new_doms.emplace_back(rewrite(old_lam->dom(num_old, i)));
}
// build new lam
size_t num_new = new_doms.size();
auto new_vars = absl::FixedArray<const Def*>(num_old);
new_lam = new_world().mut_lam(new_doms, rewrite(old_lam->codom()))->set(old_lam->dbg());
lam2lam_[old_lam] = new_lam;
// build new var
for (size_t i = 0, j = 0; i != num_old; ++i) {
auto old_var = old_lam->var(num_old, i);
auto abstr = lattice(old_var);
if (old_var == abstr) {
auto v = new_lam->var(num_new, j++);
new_vars[i] = v;
} else {
new_vars[i] = rewrite(abstr); // SCCP propagate
}
}
map(old_lam->var(), new_vars);
new_lam->set(rewrite(old_lam->filter()), rewrite(old_lam->body()));
}
// build new app
size_t num_new = new_lam->num_vars();
auto new_args = absl::FixedArray<const Def*>(num_new);
for (size_t i = 0, j = 0; i != num_old; ++i) {
auto old_var = old_lam->var(num_old, i);
auto abstr = lattice(old_var);
if (old_var == abstr) new_args[j++] = rewrite(old_app->targ(i));
}
return map(old_app, new_world().app(new_lam, new_args));
}
}
return Rewriter::rewrite_imm_App(old_app);
}
} // namespace mim
Vector< const Def * > DefVec
Definition def.h:78
@ App
Definition def.h:115

Once the lattice is stable, the outer SCCP phase starts rewriting. When it sees an application of a lambda whose parameters have propagated values, it rebuilds a specialized lambda:

  • parameters with known propagated expressions are removed,
  • remaining parameters are kept,
  • the lambda body is rewritten with the propagated values substituted,
  • the call site is rebuilt with only the remaining arguments.

So SCCP follows the standard RWPhase pattern:

  1. analyze the old world,
  2. rewrite into a new world using the computed facts,
  3. swap worlds.

Why This Split Is Useful

Separating SCCP into analysis and rewrite keeps both parts simple:

  • the analysis never mutates or partially rewrites the program,
  • the rewrite does not need to discover facts on the fly,
  • fixed-point logic stays in the analysis stage where it belongs.

This separation is the main design pattern to follow for nontrivial optimizations.

Note
The complete SCCP example fits into roughly 150 lines of C++ source code. Most of the usual compiler boilerplate is absorbed by the existing Analysis, RWPhase, and Rewriter infrastructure, so the implementation can focus on the optimization itself.

Choosing the Right Base Class

A useful rule of thumb is:

  • derive from Phase if you just need a custom one-off action,
  • derive from Analysis if you want a graph-aware traversal that computes facts on the current world,
  • derive from RWPhase if you want to rebuild the world into a transformed new one,
  • derive from ClosedMutPhase if you want to visit all reachable closed mutables,
  • derive from NestPhase if that visit should come with a computed Nest.

Recommended Design Pattern

For most optimization phases, the preferred structure is:

  1. write an Analysis that computes facts to a fixed point,
  2. write an RWPhase that consumes those facts while rebuilding the world.

This keeps analyses and transformations cleanly separated and fits naturally with MimIR’s rewriting-based infrastructure.

Minimal Examples

A simple whole-world rewrite phase:

class Simplify : public mim::RWPhase {
public:
Simplify(mim::World& world)
: RWPhase(world, "simplify") {}
private:
const mim::Def* rewrite_imm_App(const mim::App* app) override {
// rewrite or simplify selected applications
// fallback:
return Rewriter::rewrite_imm_App(app);
}
};
The World represents the whole program and manages creation of MimIR nodes (Defs).
Definition world.h:31

A simple analysis phase:

class CountMutLams : public mim::Analysis {
public:
CountMutLams(mim::World& world)
: Analysis(world, "count_lams") {}
size_t num_lams = 0;
private:
const mim::Def* rewrite_mut_Lam(mim::Lam* lam) override {
++num_lams;
return mim::Analysis::rewrite_mut_Lam(lam);
}
};

Using both:

CountMutLams analysis(world);
analysis.run();

Compilation Pipelines in M⁠im

You can also expose your custom phases as axioms in Mim via the compile plugin and build your own compilation pipeline. Mim's default compilation pipeline is defined in the opt plugin.

Summary

Phases are MimIR’s main unit of compiler work.

  • Phase is the minimal base abstraction.
  • Analysis is for graph-aware fact collection on the current world.
  • RWPhase is for rewriting the current world into a transformed new one.
  • PhaseMan sequences phases, optionally to a fixed point.
  • ClosedMutPhase and NestPhase are traversal helpers for common whole-world inspections.

The key design idea is that MimIR phases are built around structured traversal and rewriting. For substantial optimizations, the usual pattern is: