Initial impl of BytecodeIonReader by popematt · Pull Request #1136 · amazon-ion/ion-java

popematt · 2025-10-28T23:51:21Z

Issue #, if available:

None

Description of changes:

Starts implementation of BytecodeIonReader

Supports reading all data model values and IVMs.
Adds a skeleton for an EncodingContextManager
Updates instruction_reference.md with some more information about integers and unevaluated macro invocations, and cleans up some formatting.
Adds method to the bytecode debugger to dump a single instruction

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

codecov · 2025-10-28T23:56:42Z

Codecov Report

❌ Patch coverage is 77.61557% with 92 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (ion11@e3e5ab9). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
.../java/com/amazon/ion/bytecode/BytecodeIonReader.kt	77.91%	41 Missing and 31 partials ⚠️
.../com/amazon/ion/bytecode/EncodingContextManager.kt	67.44%	14 Missing ⚠️
...c/main/java/com/amazon/ion/bytecode/ir/Debugger.kt	0.00%	4 Missing ⚠️
...n/java/com/amazon/ion/bytecode/ir/OperationKind.kt	86.66%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@           Coverage Diff            @@
##             ion11    #1136   +/-   ##
========================================
  Coverage         ?   69.20%           
  Complexity       ?     6298           
========================================
  Files            ?      202           
  Lines            ?    25486           
  Branches         ?     4418           
========================================
  Hits             ?    17637           
  Misses           ?     6474           
  Partials         ?     1375

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

austnwil

This all looks pretty solid to me. Nice!

austnwil · 2025-10-29T16:46:35Z

src/main/java/com/amazon/ion/bytecode/BytecodeIonReader.kt

+        private set
+
+    private val context = EncodingContextManager()
+    private var symbolTable: Array<String?> = EncodingContextManager.SYSTEM_SYMBOLS


Why have a symbol table and constant pool both here and in EncodingContextManager? I assume the symbol table here is planned to be removed and the one in the context manager used instead?

I'm still figuring some of that out, but I think that the symbol table and constant pool in the EncodingContextManager will be a class that wraps an array and manages resizing the array, etc.
Whereas the BytecodeIonReader will be granted "unsafe" access to the underlying array in order to reduce overhead in the hot code paths.

austnwil · 2025-10-29T17:07:17Z

src/main/java/com/amazon/ion/bytecode/BytecodeIonReader.kt

+    private var isInStruct = false
+    private var isNextAlreadyLoaded = false
+
+    private val containerStack = _Private_RecyclingStack(10) { ContainerInfo() }


Just noting here that the recycling data structures were removed on main in favor of managing ContainerInfo arrays directly to improve performance for deeply nested containers.

I'm aware. Once we're further along, we can run tests to see if we get similar results here. I'm not surprised that it made a difference in #1101, but another consideration is the object size. I'm trying to keep this class small (64 bytes) so that it can fit in a single cache line. Once it's complete, we can experiment with adding more members to this class to see if the performance improvement from reducing overhead outweighs any potential performance penalty from having a larger object size.

austnwil · 2025-10-29T17:28:59Z

src/main/java/com/amazon/ion/bytecode/BytecodeIonReader.kt

+    private var instruction = 0
+    private var fieldNameIndex = -1
+    private var annotationsIndex = -1
+    private var annotationCount: Byte = 0


Curious why is this a byte? toByte() is called when assigning it and toInt() is called every time it is used.

I'm hoping to keep the memory footprint of the class no more than 64 bytes so that it can fit in a single cache line. This is an easy way to help keep it smaller.

src/main/java/com/amazon/ion/bytecode/BytecodeIonReader.kt

src/test/java/com/amazon/ion/bytecode/BytecodeIonReaderTest.kt

austnwil · 2025-10-29T18:42:54Z

src/main/java/com/amazon/ion/bytecode/ir/instruction_reference.md

+* Generators may be configured to expose some or all macro invocations.
+* Generators should expose macro invocations using the `INVOKE` instruction
+* The `INVOKE` instruction is followed by one argument for each parameter in the signature (making any "no argument" expressions explicit).
+* The arguments are followed by the `END_CONTAINER` instruction. (TODO: Should we have a distinct `END_INVOKE` instruction?)


I think it makes sense to have a separate END_INVOKE if you already have a separate END_TEMPLATE, but I'm pretty indifferent.

I agree, but I've left a TODO because I think we can defer this until we are exposing un-evaluated macro invocations.

austnwil · 2025-10-29T18:47:36Z

src/test/java/com/amazon/ion/bytecode/BytecodeIonReaderTest.kt

+        }
+
+        @Test
+        fun `after reading I_END_OF_INPUT, it should be possible to check for more input`() {


This is not related to this PR, but I was pretty surprised to see that this is supported functionality. This TODO exists in the bytecode reference doc, and if it is possible to read more data after END_OF_INPUT then I would be inclined to rename it to this eventually, since I don't think END_OF_INPUT communicates this well.

* OR rename `END_OF_INPUT` to `NEEDS_DATA`?

Sure, we can rename this in another PR.

austnwil · 2025-10-29T19:50:38Z

src/main/java/com/amazon/ion/bytecode/BytecodeIonReader.kt

+    override fun bigIntegerValue(): BigInteger {
+        val instruction = this.instruction
+        val op = Instructions.toOperation(instruction)
+        var i = bytecodeI
+        val bigInt = when (op) {
+            Operation.OP_INT_I16 -> Instructions.getData(instruction).toBigInteger()
+            Operation.OP_INT_I32 -> bytecode[i].toLong().toBigInteger()
+            Operation.OP_INT_I64 -> {
+                val msb = bytecode[i++].toLong() and 0xFFFFFFFF
+                val lsb = bytecode[i].toLong() and 0xFFFFFFFF
+                ((msb shl 32) or lsb).toBigInteger()
+            }
+            Operation.OP_INT_CP -> {
+                val cpIndex = Instructions.getData(instruction)
+                constantPool[cpIndex] as BigInteger
+            }
+            Operation.OP_INT_REF -> {
+                val length = Instructions.getData(instruction)
+                val position = bytecode[bytecodeI]
+                generator.readBigIntegerReference(position, length)
+            }


I know we haven't landed on the exact null behavior for these yet, but I feel like it is reasonable to at least support null.int here. According to your research all the current implementations exhibit this behavior (same for null.float)

You can ignore this though if you want to wait for a complete solution to these null questions.

Suggested change

override fun bigIntegerValue(): BigInteger {

val instruction = this.instruction

val op = Instructions.toOperation(instruction)

var i = bytecodeI

val bigInt = when (op) {

Operation.OP_INT_I16 -> Instructions.getData(instruction).toBigInteger()

Operation.OP_INT_I32 -> bytecode[i].toLong().toBigInteger()

Operation.OP_INT_I64 -> {

val msb = bytecode[i++].toLong() and 0xFFFFFFFF

val lsb = bytecode[i].toLong() and 0xFFFFFFFF

((msb shl 32) or lsb).toBigInteger()

}

Operation.OP_INT_CP -> {

val cpIndex = Instructions.getData(instruction)

constantPool[cpIndex] as BigInteger

}

Operation.OP_INT_REF -> {

val length = Instructions.getData(instruction)

val position = bytecode[bytecodeI]

generator.readBigIntegerReference(position, length)

}

override fun bigIntegerValue(): BigInteger? {

val instruction = this.instruction

val op = Instructions.toOperation(instruction)

var i = bytecodeI

val bigInt = when (op) {

Operation.OP_INT_I16 -> Instructions.getData(instruction).toBigInteger()

Operation.OP_INT_I32 -> bytecode[i].toLong().toBigInteger()

Operation.OP_INT_I64 -> {

val msb = bytecode[i++].toLong() and 0xFFFFFFFF

val lsb = bytecode[i].toLong() and 0xFFFFFFFF

((msb shl 32) or lsb).toBigInteger()

}

Operation.OP_INT_CP -> {

val cpIndex = Instructions.getData(instruction)

constantPool[cpIndex] as BigInteger

}

Operation.OP_INT_REF -> {

val length = Instructions.getData(instruction)

val position = bytecode[bytecodeI]

generator.readBigIntegerReference(position, length)

}

Operation.OP_NULL_INT -> null

I'm going to leave nulls for later.

Co-authored-by: Austin Williams <austnwil@amazon.com>

Initial impl of BytecodeIonReader

49c2366

popematt requested review from austnwil and tgregg October 28, 2025 23:51

austnwil approved these changes Oct 29, 2025

View reviewed changes

Apply suggestions from code review

9c6ace1

Co-authored-by: Austin Williams <austnwil@amazon.com>

popematt merged commit 6493fc4 into amazon-ion:ion11 Oct 31, 2025
29 of 36 checks passed

Conversation

popematt commented Oct 28, 2025

Uh oh!

codecov bot commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

austnwil left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Oct 28, 2025 •

edited

Loading