Jawk currently supports only standard awk-style simulated multidimensional arrays via SUBSEP and comma subscripts (a[i,j]). The frontend explicitly rejects chained brackets (a[i][j]), so true arrays of arrays are not available.
Reference:
We want to add gawk-style arrays of arrays, with these constraints:
- The feature is optional, but enabled by default through
AwkSettings.allowArraysOfArrays = true.
- This is a compilation flag only.
AwkParser should either accept and compile chained-bracket syntax or reject it.
AVM and JRT should always be able to execute nested-array tuples. Runtime behavior should not branch on the setting unless that turns out to be impossible.
Expected behavior
- Keep existing
a[i,j] / SUBSEP behavior unchanged.
- When the flag is enabled, support
a[i][j], a[i][j][k], and mixed forms such as a[i][j,k].
- Chained brackets must represent true nested maps.
a[i][j] must not be compiled as a[i SUBSEP j].
- Reusing a slot as both scalar and array should follow gawk-style behavior: if
a[i] currently holds a scalar, a[i][j] should fail unless the scalar slot is deleted first.
- Array-valued operations should work on nested references too, not only top-level array variables. At minimum this needs to cover reads, writes,
in, for (k in ...), delete, ++/--, compound assignment, sub/gsub third argument, getline targets, and any built-in or extension path that accepts array values.
- When the flag is disabled, keep rejecting chained-bracket syntax and preserve current behavior.
Implementation outline
- Add
allowArraysOfArrays to AwkSettings, default it to true, and update ImmutableAwkSettings.
- Thread the setting into
Awk.compile(...) and Awk.compileExpression(...) so AwkParser sees the compilation flag.
- Update
AwkParser so SYMBOL can parse chained bracket expressions instead of throwing Use [a,b,c,...] instead of [a][b][c]... for multi-dimensional arrays.
- Keep
ArrayIndexAst / APPLY_SUBSEP for comma subscripts inside a single bracket pair only. Do not use them for chained brackets.
- Introduce tuple/AST support for nested array lvalues. Today many code paths assume the base of an array reference is an
IDAst; they need to work when the base is another array reference that evaluates to a Map.
- Reuse the runtime pieces that already work on generic
Map values (DEREF_ARRAY, IS_IN, AssocArray, JRT.createAwkMap(...)) and add generic assign/delete/inc/dec/sub/getline helpers where the current tuple set is too root-array-specific.
- Audit frontend sites that currently insist on a plain array name, especially
InExpressionAst, ForInStatementAst, DeleteStatementAst, assignment/compound assignment, pre/post inc/dec, getline, sub/gsub, split, and user-defined function array-parameter checks.
- Add regression coverage with
AwkTestSupport. Existing gawk fixture files in src/test/resources/gawk (mdim1*, mdim6*, typeof4.awk, etc.) are useful references, but Jawk tests should be behavior-based because Jawk does not currently expose gawk’s typeof() / isarray() helpers.
Acceptance criteria
BEGIN { a[1][2] = 42; print a[1][2] } prints 42.
BEGIN { a[1][2,3] = 42; print a[1][2,3] } works, and the inner comma subscript still uses SUBSEP.
BEGIN { a[1,2] = 42; print a[1,2] } still behaves exactly as today.
BEGIN { a[1] = 5; print a[1][2] } fails with an array/scalar type error until delete a[1] happens.
BEGIN { a[1][2] = 7; print ((2 in a[1]) ? "yes" : "no") } reports membership correctly.
BEGIN { a[1][2] = 7; delete a[1][2]; print (2 in a[1]) } removes the nested element correctly.
for (k in a[1]) iterates a subarray.
split("x y", a[1]) and equivalent array-accepting runtime paths work if a[1] is used as a subarray.
- Turning
allowArraysOfArrays off makes a[1][2] a parser error again.
Out of scope for this issue unless needed to unblock implementation
- Adding gawk-only introspection helpers such as
typeof() or isarray().
- Separately implementing
length(array) / length(subarray) semantics.
Jawk currently supports only standard awk-style simulated multidimensional arrays via
SUBSEPand comma subscripts (a[i,j]). The frontend explicitly rejects chained brackets (a[i][j]), so true arrays of arrays are not available.Reference:
We want to add gawk-style arrays of arrays, with these constraints:
AwkSettings.allowArraysOfArrays = true.AwkParsershould either accept and compile chained-bracket syntax or reject it.AVMandJRTshould always be able to execute nested-array tuples. Runtime behavior should not branch on the setting unless that turns out to be impossible.Expected behavior
a[i,j]/SUBSEPbehavior unchanged.a[i][j],a[i][j][k], and mixed forms such asa[i][j,k].a[i][j]must not be compiled asa[i SUBSEP j].a[i]currently holds a scalar,a[i][j]should fail unless the scalar slot is deleted first.in,for (k in ...),delete,++/--, compound assignment,sub/gsubthird argument,getlinetargets, and any built-in or extension path that accepts array values.Implementation outline
allowArraysOfArraystoAwkSettings, default it totrue, and updateImmutableAwkSettings.Awk.compile(...)andAwk.compileExpression(...)soAwkParsersees the compilation flag.AwkParsersoSYMBOLcan parse chained bracket expressions instead of throwingUse [a,b,c,...] instead of [a][b][c]... for multi-dimensional arrays.ArrayIndexAst/APPLY_SUBSEPfor comma subscripts inside a single bracket pair only. Do not use them for chained brackets.IDAst; they need to work when the base is another array reference that evaluates to aMap.Mapvalues (DEREF_ARRAY,IS_IN,AssocArray,JRT.createAwkMap(...)) and add generic assign/delete/inc/dec/sub/getline helpers where the current tuple set is too root-array-specific.InExpressionAst,ForInStatementAst,DeleteStatementAst, assignment/compound assignment, pre/post inc/dec,getline,sub/gsub,split, and user-defined function array-parameter checks.AwkTestSupport. Existing gawk fixture files insrc/test/resources/gawk(mdim1*,mdim6*,typeof4.awk, etc.) are useful references, but Jawk tests should be behavior-based because Jawk does not currently expose gawk’stypeof()/isarray()helpers.Acceptance criteria
BEGIN { a[1][2] = 42; print a[1][2] }prints42.BEGIN { a[1][2,3] = 42; print a[1][2,3] }works, and the inner comma subscript still usesSUBSEP.BEGIN { a[1,2] = 42; print a[1,2] }still behaves exactly as today.BEGIN { a[1] = 5; print a[1][2] }fails with an array/scalar type error untildelete a[1]happens.BEGIN { a[1][2] = 7; print ((2 in a[1]) ? "yes" : "no") }reports membership correctly.BEGIN { a[1][2] = 7; delete a[1][2]; print (2 in a[1]) }removes the nested element correctly.for (k in a[1])iterates a subarray.split("x y", a[1])and equivalent array-accepting runtime paths work ifa[1]is used as a subarray.allowArraysOfArraysoff makesa[1][2]a parser error again.Out of scope for this issue unless needed to unblock implementation
typeof()orisarray().length(array)/length(subarray)semantics.