- Compile:
makemake goodto enable optimizations and exclude debug flags (when compiling my code with g++, not inmirccompiler)
- Run:
./mirc [src file]- If no source file is specified, default is
mir/hello.mir graph.gvwill automatically be generated, representing the graph of the programmir.swill automatically be generated, being the compiled assembly of the program
- If no source file is specified, default is
- Render abstract syntax graph:
make graph(dot -Tpng -O graph.gv)- Install
dotwithsudo apt install graphviz
- Install
- Assemble the
mir.s:make assemble(gcc -nostdlib -no-pie -o mir.out mir.s)- Install
gccwithsudo apt install build-essential
- Install
core- a "standard library"core/prelude.h- a file with all required definitions for standard library
lang- defines some language structures without any parsing or anythingtoken- lexer/tokenizer breaks up source code into tokensson- parser into sea of nodes astcompile- contains different "compilation targets" (and alsoreg_allocfor some reason =Ь)
Types:
- Signed integer types:
i8,i16, etc - Unsigned integer types:
u8,u16, etc - Floating point types:
f32,f64, etc - Raw pointer type:
type* - Fixed sized array type:
type[size]- Compile time known size, effectively a raw pointer, but semantically is an array and has size
sizeof(type[size]) == sizeof(type*)
- Slice type:
type[]- Effectively
struct Slice { data: type*, len: usize } sizeof(type[size]) == sizeof(type*) + sizeof(usize)
- Effectively
- Struct types:
CustomNamewherestruct CustomName { ... }was declared - Enum type:
CustomEnumwhereenum CustomEnum { Element1, Element2, ... }- Naming elments by
CustomEnum::Element1
- Naming elments by
Flow control/loops:
if condition { ... } else { ... }while condition { ... }
Dereference/Member access/Address of:
- After
struct A { num: u8 }; let i: A = A { 10 };, let addr_i: A* = &i; // address of i gives A pointerlet val_i: A = *i; // dereferencing a raw pointer gives its underlying valuelet num_i: u8 = i.num; // . operator dereferences until it finds an objectlet double_inderection: A** = &addr_i;let num_i: u8 = double_inderection.num; // until it finds an objectlet arr: A[2] = [A { 1 }, A { 2 }];let arr_ptr: A[2]* = &arr;//let deref: u8 = arr_ptr.num; // ERROR: dereferencing a pointer once leads to an array, not a pointer or an objectlet deref: u8 = arr_ptr[0].num; // correct: indexing also dereferences until array is found
Functions:
fn name(arg: type, arg2: type2)->type_return { ... };
Structs:
// declare
struct Name { field1: type1, field2: type2 };
// member functions
impl Name {
fn constructor( ... )->Name { ... };
fn member_fn(this*, ... )->return_type { ... };
fn copy_fn(this, ... )->return_type { ... };
};Operators:
- Binary:
- Arithmetic:
+,-,*,/,%(modulo) - Logical:
||,&&(short circuit) - Bitwise:
|,&,^(xor)
- Arithmetic:
- Unary:
-(arithmetic negate),*(dereference),&(address of),!(logical not),~(bitwise not)
If extra time:
- Allow references to a rvalue:
let i = 0; let j: u8** = &&i; - Union type:
CustomUnionwhereunion CustomUnion { e1: type1, e2: type2, ... } - Reference type:
let i = 10; let j: u8& = &i;- References are immutable;
const* - Can only call const member functions of objects:
fn const_member_fn(this&, ... )->return_type { ... };
- References are immutable;
- Unsized array type:
type..ortype[?]- Acts like an array, but does not have size. Basically any C/C++ array pointer
- Global level expression - The entire expression (including
;) that starts in global scope - Top level expression - The entire expression (including
;) that starts in any non-global scope - Primary expression - a sequence that produces a value. May or may not contain operators, function calls, nested scopes (and as a result more nested top level expressions), etc
- Term - a single symbol with any postfix operations applied (member access
.or index[...])
- Buffer overflow
- Use after free
- Double free
- Out of bounds write
- Integer overflow
In chapter 2 Node implementation for peephole:
if (!(this instanceof ConstantNode) && type.isConstant()) {
kill(); // Kill `this` because replacing with a Constant
return new ConstantNode(type).peephole();
}It's possible for StartNode to be killed, as some thing like Start <- Const(10) <- OpNeg will kill OpNeg, then Const(10) and then Start. Because it has no uses. But we're about to give it a use - Const(-10). So that's strange. The problem is present in chapter 3 too, I think.
When parsing loops, the inner scope is created with 'sentinel' values for variables, meaning they point to the head scope. That is made so that when modifying a variable, a 'lazy phi' is created. When doing any variable lookup, a lazy phi needs to be created. Example:
x = 0
while {
use(x) // read x before any write in this iteration
if {
x = x + 1 // only sometimes updated
}
use(x)
}
While using x for the first time, a lazy phi was not created yet, so it will forever point to the original value of x if a lazy phi is not created on the first lookup. Now, we don't know if x will ever change in the loop, but we still need to create a lazy phi. Because it's accessing, not assigning, that will use the lazy phi value.
When parsing a loop:
initial_scope
...
break_scope {
while (...)
continue_scope {
if(...) {
a++;
continue;
}
if(...) {
a++;
break;
}
}
}
merge(initial_scope, break_scope.pop)
condition can have side effects and rebindings of names
break scope = the active scope after break is called OR while naturally finishes
continue scope = the active scope after continue is called OR while loop loops
if no break or continue are present, the scope from after the entire while body was parsed is merged with the *scope
when continue is called, the current state of the scope will be merged with the next beginning of the loop scope
Expensive operations can be scheduled early even if the most common path doesn't use them.
So there's a little quick with how I parse my code rn. When I have
# ...
if(condition) {
return 10;
};
# ...it actually parses it as NodeIf's CtrlProj having 2 ctrl outputs... which is a little strange, as both are true control outputs. It works ok for now, since the first one will be the return, the true control output. But that's still a little strange and I should really fix it soon.