Skip to content

IvanGav/Mir

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mir

Compiling/Running

  • Compile: make
    • make good to enable optimizations and exclude debug flags (when compiling my code with g++, not in mirc compiler)
  • Run: ./mirc [src file]
    • If no source file is specified, default is mir/hello.mir
    • graph.gv will automatically be generated, representing the graph of the program
    • mir.s will automatically be generated, being the compiled assembly of the program
  • Render abstract syntax graph: make graph (dot -Tpng -O graph.gv)
    • Install dot with sudo apt install graphviz
  • Assemble the mir.s: make assemble (gcc -nostdlib -no-pie -o mir.out mir.s)
    • Install gcc with sudo apt install build-essential

Code hierarchy

  • core - a "standard library"
    • core/prelude.h - a file with all required definitions for standard library
  • lang - defines some language structures without any parsing or anything
  • token - lexer/tokenizer breaks up source code into tokens
  • son - parser into sea of nodes ast
  • compile - contains different "compilation targets" (and also reg_alloc for some reason =Ь)

Features (planned, and may be completely changed)

Types:

  • Signed integer types: i8, i16, etc
  • Unsigned integer types: u8, u16, etc
  • Floating point types: f32, f64, etc
  • Raw pointer type: type*
  • Fixed sized array type: type[size]
    • Compile time known size, effectively a raw pointer, but semantically is an array and has size
    • sizeof(type[size]) == sizeof(type*)
  • Slice type: type[]
    • Effectively struct Slice { data: type*, len: usize }
    • sizeof(type[size]) == sizeof(type*) + sizeof(usize)
  • Struct types: CustomName where struct CustomName { ... } was declared
  • Enum type: CustomEnum where enum CustomEnum { Element1, Element2, ... }
    • Naming elments by CustomEnum::Element1

Flow control/loops:

  • if condition { ... } else { ... }
  • while condition { ... }

Dereference/Member access/Address of:

  • After struct A { num: u8 }; let i: A = A { 10 };,
  • let addr_i: A* = &i; // address of i gives A pointer
  • let val_i: A = *i; // dereferencing a raw pointer gives its underlying value
  • let num_i: u8 = i.num; // . operator dereferences until it finds an object
  • let double_inderection: A** = &addr_i;
  • let num_i: u8 = double_inderection.num; // until it finds an object
  • let arr: A[2] = [A { 1 }, A { 2 }];
  • let arr_ptr: A[2]* = &arr;
  • //let deref: u8 = arr_ptr.num; // ERROR: dereferencing a pointer once leads to an array, not a pointer or an object
  • let deref: u8 = arr_ptr[0].num; // correct: indexing also dereferences until array is found

Functions:

  • fn name(arg: type, arg2: type2)->type_return { ... };

Structs:

// declare
struct Name { field1: type1, field2: type2 };
// member functions
impl Name {
  fn constructor( ... )->Name { ... };
  fn member_fn(this*, ... )->return_type { ... };
  fn copy_fn(this, ... )->return_type { ... };
};

Operators:

  • Binary:
    • Arithmetic: +, -, *, /, % (modulo)
    • Logical: ||, && (short circuit)
    • Bitwise: |, &, ^ (xor)
  • Unary: - (arithmetic negate), * (dereference), & (address of), ! (logical not), ~ (bitwise not)

If extra time:

  • Allow references to a rvalue: let i = 0; let j: u8** = &&i;
  • Union type: CustomUnion where union CustomUnion { e1: type1, e2: type2, ... }
  • Reference type: let i = 10; let j: u8& = &i;
    • References are immutable; const*
    • Can only call const member functions of objects: fn const_member_fn(this&, ... )->return_type { ... };
  • Unsized array type: type.. or type[?]
    • Acts like an array, but does not have size. Basically any C/C++ array pointer

Terminology

  • Global level expression - The entire expression (including ;) that starts in global scope
  • Top level expression - The entire expression (including ;) that starts in any non-global scope
  • Primary expression - a sequence that produces a value. May or may not contain operators, function calls, nested scopes (and as a result more nested top level expressions), etc
  • Term - a single symbol with any postfix operations applied (member access . or index [...])

Biggest memory related problems (general)

  • Buffer overflow
  • Use after free
  • Double free
  • Out of bounds write
  • Integer overflow

Notes on Simple

StartNode can die...

In chapter 2 Node implementation for peephole:

if (!(this instanceof ConstantNode) && type.isConstant()) {
  kill(); // Kill `this` because replacing with a Constant
  return new ConstantNode(type).peephole();
}

It's possible for StartNode to be killed, as some thing like Start <- Const(10) <- OpNeg will kill OpNeg, then Const(10) and then Start. Because it has no uses. But we're about to give it a use - Const(-10). So that's strange. The problem is present in chapter 3 too, I think.

Explain

Lazy phi creation on variable lookup

When parsing loops, the inner scope is created with 'sentinel' values for variables, meaning they point to the head scope. That is made so that when modifying a variable, a 'lazy phi' is created. When doing any variable lookup, a lazy phi needs to be created. Example:

x = 0
while {
  use(x)        // read x before any write in this iteration
  if {
    x = x + 1  // only sometimes updated
  }
  use(x)
}

While using x for the first time, a lazy phi was not created yet, so it will forever point to the original value of x if a lazy phi is not created on the first lookup. Now, we don't know if x will ever change in the loop, but we still need to create a lazy phi. Because it's accessing, not assigning, that will use the lazy phi value.

Ramble ramble

When parsing a loop:

initial_scope
...
break_scope {
  while (...)
  continue_scope {
    if(...) {
      a++;
      continue;
    }
    if(...) {
      a++;
      break;
    }
  }
}
merge(initial_scope, break_scope.pop)

condition can have side effects and rebindings of names break scope = the active scope after break is called OR while naturally finishes continue scope = the active scope after continue is called OR while loop loops

if no break or continue are present, the scope from after the entire while body was parsed is merged with the *scope

when continue is called, the current state of the scope will be merged with the next beginning of the loop scope

Um

Expensive operations can be scheduled early even if the most common path doesn't use them.

So there's a little quick with how I parse my code rn. When I have

# ...
if(condition) {
  return 10;
};
# ...

it actually parses it as NodeIf's CtrlProj having 2 ctrl outputs... which is a little strange, as both are true control outputs. It works ok for now, since the first one will be the return, the true control output. But that's still a little strange and I should really fix it soon.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors