Skip to content

Add cross-assembler and linker for FLUX bytecode#23

Open
SuperInstance wants to merge 1 commit intomainfrom
superz/cross-assembler
Open

Add cross-assembler and linker for FLUX bytecode#23
SuperInstance wants to merge 1 commit intomainfrom
superz/cross-assembler

Conversation

@SuperInstance
Copy link
Copy Markdown
Owner

@SuperInstance SuperInstance commented Apr 13, 2026

Cross-assembler with 100+ opcodes, label resolution (@Label and name: syntax), macros (#define, #ifdef, .include), multiple output formats (binary, hex, JSON, Intel HEX, Python list), linker with symbol resolution and relocation, binary patcher, and ELF header generation.

Features

  • Cross-assembler: Two-pass assembly with 100+ opcodes across 8 categories (control flow, integer/float arithmetic, bitwise, comparison, stack, SIMD, A2A protocol, trust/capabilities)
  • Label resolution: Both name: and @label syntax with forward references
  • Macro preprocessor: #define, #ifdef/#ifndef/#else/#endif, #undef, .set, .include
  • Output formats: Raw binary, hex string, JSON with metadata, Intel HEX, Python list
  • Linker: Object file serialization (FLUXOBJ format), multi-file linking, symbol resolution, relocation tables
  • BinaryPatcher: Post-assembly binary patching with undo support
  • Branch aliases: BEQ, BNE, BLT, BGE, BGT, BLE for common conditional jumps
  • Arithmetic aliases: ADD, SUB, MUL, DIV, MOD, NEG, NOT, AND, OR, XOR, SHL, SHR
  • Comment styles: ;, //, and # (non-preprocessor) comment support
  • Data directives: .byte, .word, .dword, .ascii, .asciz, .fill, .align, .org

Tests

96 tests covering all features — errors, macros, assembler, patcher, linker, ELF headers, and integration.


Staging: Open in Devin

- Full cross-assembler with 100+ opcodes, label resolution (@Label and name:), two-pass assembly
- Macro preprocessor: #define, #ifdef/#ifndef/#else/#endif, .set, .include
- Multiple output formats: binary, hex, JSON, Intel HEX, Python list
- Linker with object file serialization, symbol resolution, relocation table
- BinaryPatcher for post-assembly binary patching
- ELF-like header generation
- Branch aliases: BEQ/JE, BNE/JNE, BLT/JL, BGE/JGE, BGT/JG, BLE/JLE
- Arithmetic aliases: ADD, SUB, MUL, DIV, MOD, NEG, NOT, AND, OR, XOR, SHL, SHR
- # comment support (non-preprocessor lines)
- Data directives: .byte, .word, .dword, .ascii, .asciz, .fill, .align, .org
- SIMD vector ops, A2A protocol ops, trust/capability ops, float ops
- 96 tests covering all features
Copy link
Copy Markdown

@beta-devin-ai-integration beta-devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 7 potential issues.

View 7 additional findings in Devin Review.

Staging: Open in Devin


# Build record: :LLAAAATT[DD...]CC
# LL = byte count, AAAA = address, TT = record type (00=data)
checksum = chunk_size + (addr >> 8) & 0xFF + addr & 0xFF + 0x00
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Intel HEX checksum calculation incorrect due to operator precedence

The checksum calculation at src/flux/asm/cross_assembler.py:582 uses chunk_size + (addr >> 8) & 0xFF + addr & 0xFF + 0x00. In Python, + has higher precedence than &, so this evaluates as (chunk_size + (addr >> 8)) & (0xFF + addr) & (0xFF) instead of the intended chunk_size + ((addr >> 8) & 0xFF) + (addr & 0xFF). Verified: for addr=16, chunk_size=2, the buggy expression yields 2 while the correct value is 18. This produces corrupt Intel HEX output for any data larger than 16 bytes.

Suggested change
checksum = chunk_size + (addr >> 8) & 0xFF + addr & 0xFF + 0x00
checksum = chunk_size + ((addr >> 8) & 0xFF) + (addr & 0xFF) + 0x00
Staging: Open in Devin

Was this helpful? React with 👍 or 👎 to provide feedback.

Debug

Playground

shdr_offset: int, timestamp: float) -> bytes:
"""Build the ELF64-like file header."""
ident = bytearray(16)
ident[0:4] = FLUX_MAGIC
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 FLUX_MAGIC is 5 bytes but assigned to 4-byte slice, causing 65-byte header

FLUX_MAGIC = b"\x7fFLUX" is 5 bytes, but ident[0:4] = FLUX_MAGIC at src/flux/asm/elf_header.py:243 assigns it into a 4-byte slice. Python's bytearray slice assignment resizes the array, making ident 17 bytes instead of 16. This cascades: header[0:16] = ident makes header 65 bytes instead of 64. The rest of generate() assumes a 64-byte header for offset calculations (phdr_offset = header_size = 64), so all program headers and section data are misaligned by 1 byte in the output binary.

Suggested change
ident[0:4] = FLUX_MAGIC
ident[0:5] = FLUX_MAGIC
Staging: Open in Devin

Was this helpful? React with 👍 or 👎 to provide feedback.

Debug

Playground

Comment thread src/flux/asm/macros.py
elif line.startswith(".set"):
self._handle_set(line, loc)
elif line.startswith(".include"):
self._handle_include(line, loc, filename)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 .include directive discards preprocessed content — included files have no effect

_handle_include (line 231) returns the preprocessed included file content as a str, but _handle_directive (line 128) calls it without capturing the return value: self._handle_include(line, loc, filename). Since _handle_directive returns None and the caller in preprocess() does continue after calling it, the included file's assembly lines are completely lost. Only side effects on self.macros persist — no code from the included file is emitted.

Prompt for agents
The _handle_include method returns the preprocessed included file content as a string, but _handle_directive discards the return value at line 128. The output_lines list in the preprocess() method never receives the included content.

To fix this, the architecture needs rethinking. One approach: instead of returning the content, _handle_include should directly append to output_lines. But output_lines is local to preprocess(). Options:
1. Make output_lines an instance variable so _handle_include's recursive preprocess() call can contribute to it.
2. Have _handle_directive return the included content, and have preprocess() check and append it to output_lines.
3. Change _handle_include to not recursively call preprocess() but instead inline the include content into the current preprocess()'s line list.

The simplest fix might be option 2: have _handle_directive return Optional[str], and in preprocess(), capture the return and split/extend output_lines when non-None.
Staging: Open in Devin

Was this helpful? React with 👍 or 👎 to provide feedback.

Debug

Playground

Comment thread src/flux/asm/macros.py
Comment on lines +85 to +87
if stripped.startswith("#") or stripped.startswith(".set ") or stripped.startswith(".include "):
self._handle_directive(stripped, loc, filename)
continue
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 #define, #undef, .set, .include processed inside inactive #ifdef blocks

In preprocess() (lines 85-87), all preprocessor directives are dispatched to _handle_directive without checking self.is_active. The is_active guard at line 90 only protects non-directive lines. This means #define, #undef, .set, and .include inside inactive #ifdef blocks are still executed. Verified: #ifdef UNDEFINED\n#define LEAKED 42\n#endif results in LEAKED being defined even though the conditional block is inactive.

Prompt for agents
In _handle_directive (macros.py:111-129), the conditional directives (#ifdef, #ifndef, #else, #endif) must always be processed to maintain the conditional stack, but #define, #undef, .set, and .include should only be processed when self.is_active is True.

The fix should add a guard in _handle_directive after the conditional directives are handled. For example, after the #else elif block and before the #undef elif block, add:

    elif not self.is_active:
        return  # Skip non-conditional directives in inactive blocks

This ensures the conditional stack is always maintained correctly while preventing side effects from #define/#undef/.set/.include inside inactive blocks.
Staging: Open in Devin

Was this helpful? React with 👍 or 👎 to provide feedback.

Debug

Playground

Comment on lines +322 to +324
elif directive == ".ascii":
match = re.search(r'"([^"]*)"', line)
return len(match.group(1)) if match else 0
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 .ascii/.asciz size estimate doesn't account for escape sequences, corrupting label addresses

_estimate_directive_size (line 322-324) returns the raw string length from the regex match for .ascii/.asciz directives, but _emit_directive processes escape sequences via _unescape_string, which converts two-character sequences like \t and \n into single bytes. This causes Pass 1 label addresses to be larger than the actual emitted byte count in Pass 2. Verified: .ascii "hello\tworld\n" estimates 14 bytes but emits 12, causing subsequent labels to have incorrect offsets (e.g., end label recorded at offset 14 but HALT actually emitted at offset 12).

Suggested change
elif directive == ".ascii":
match = re.search(r'"([^"]*)"', line)
return len(match.group(1)) if match else 0
elif directive == ".ascii":
match = re.search(r'"([^"]*)"', line)
return len(self._unescape_string(match.group(1)).encode("utf-8")) if match else 0
Staging: Open in Devin

Was this helpful? React with 👍 or 👎 to provide feedback.

Debug

Playground

struct.pack_into("<H", header, 56, n_phdrs)
struct.pack_into("<H", header, 58, ELF64_SECTION_HEADER_SIZE)
struct.pack_into("<H", header, 60, n_sections)
struct.pack_into("<H", header, 62, 4) # shndx of .shstrtab (index 4 in our layout)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 shstrtab section index hardcoded as 4 but is actually at index 3

At src/flux/asm/elf_header.py:265, the section header string table index (e_shstrndx) is hardcoded as 4, but all_sections at line 151 is [null(0), code(1), data(2), strtab(3), symtab(4), ...]. The .shstrtab section is at index 3, not 4. This causes ELF loaders/readers to look at the .symtab section instead of .shstrtab for section name resolution.

Suggested change
struct.pack_into("<H", header, 62, 4) # shndx of .shstrtab (index 4 in our layout)
struct.pack_into("<H", header, 62, 3) # shndx of .shstrtab (index 3 in our layout)
Staging: Open in Devin

Was this helpful? React with 👍 or 👎 to provide feedback.

Debug

Playground

Comment on lines +311 to +316
"""Build a string table for section names."""
table = bytearray(b'\x00') # Start with null byte
for name in names:
table.extend(name.encode("utf-8"))
table.append(0x00)
return bytes(table)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 String table has extra leading null byte causing all section name indices to be off by 1

_build_string_table (line 312) starts with a \x00 byte, then iterates through names appending each name + null. Since names starts with "" (the null section), this produces two consecutive null bytes at the start. The _build_section_header name index calculation (line 290-295) iterates through all_names accumulating len(name) + 1 without accounting for the extra initial null byte. Verified: .flux.code is at actual table offset 2, but the computed name_idx is 1 (pointing to an empty string).

Suggested change
"""Build a string table for section names."""
table = bytearray(b'\x00') # Start with null byte
for name in names:
table.extend(name.encode("utf-8"))
table.append(0x00)
return bytes(table)
def _build_string_table(self, names: list[str]) -> bytes:
"""Build a string table for section names."""
table = bytearray()
for name in names:
table.extend(name.encode("utf-8"))
table.append(0x00)
return bytes(table)
Staging: Open in Devin

Was this helpful? React with 👍 or 👎 to provide feedback.

Debug

Playground

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant