Skip to content

Add optional gawk-style arrays of arrays (a[i][j]) #438

@bertysentry

Description

@bertysentry

Jawk currently supports only standard awk-style simulated multidimensional arrays via SUBSEP and comma subscripts (a[i,j]). The frontend explicitly rejects chained brackets (a[i][j]), so true arrays of arrays are not available.

Reference:

We want to add gawk-style arrays of arrays, with these constraints:

  • The feature is optional, but enabled by default through AwkSettings.allowArraysOfArrays = true.
  • This is a compilation flag only. AwkParser should either accept and compile chained-bracket syntax or reject it.
  • AVM and JRT should always be able to execute nested-array tuples. Runtime behavior should not branch on the setting unless that turns out to be impossible.

Expected behavior

  • Keep existing a[i,j] / SUBSEP behavior unchanged.
  • When the flag is enabled, support a[i][j], a[i][j][k], and mixed forms such as a[i][j,k].
  • Chained brackets must represent true nested maps. a[i][j] must not be compiled as a[i SUBSEP j].
  • Reusing a slot as both scalar and array should follow gawk-style behavior: if a[i] currently holds a scalar, a[i][j] should fail unless the scalar slot is deleted first.
  • Array-valued operations should work on nested references too, not only top-level array variables. At minimum this needs to cover reads, writes, in, for (k in ...), delete, ++/--, compound assignment, sub/gsub third argument, getline targets, and any built-in or extension path that accepts array values.
  • When the flag is disabled, keep rejecting chained-bracket syntax and preserve current behavior.

Implementation outline

  • Add allowArraysOfArrays to AwkSettings, default it to true, and update ImmutableAwkSettings.
  • Thread the setting into Awk.compile(...) and Awk.compileExpression(...) so AwkParser sees the compilation flag.
  • Update AwkParser so SYMBOL can parse chained bracket expressions instead of throwing Use [a,b,c,...] instead of [a][b][c]... for multi-dimensional arrays.
  • Keep ArrayIndexAst / APPLY_SUBSEP for comma subscripts inside a single bracket pair only. Do not use them for chained brackets.
  • Introduce tuple/AST support for nested array lvalues. Today many code paths assume the base of an array reference is an IDAst; they need to work when the base is another array reference that evaluates to a Map.
  • Reuse the runtime pieces that already work on generic Map values (DEREF_ARRAY, IS_IN, AssocArray, JRT.createAwkMap(...)) and add generic assign/delete/inc/dec/sub/getline helpers where the current tuple set is too root-array-specific.
  • Audit frontend sites that currently insist on a plain array name, especially InExpressionAst, ForInStatementAst, DeleteStatementAst, assignment/compound assignment, pre/post inc/dec, getline, sub/gsub, split, and user-defined function array-parameter checks.
  • Add regression coverage with AwkTestSupport. Existing gawk fixture files in src/test/resources/gawk (mdim1*, mdim6*, typeof4.awk, etc.) are useful references, but Jawk tests should be behavior-based because Jawk does not currently expose gawk’s typeof() / isarray() helpers.

Acceptance criteria

  • BEGIN { a[1][2] = 42; print a[1][2] } prints 42.
  • BEGIN { a[1][2,3] = 42; print a[1][2,3] } works, and the inner comma subscript still uses SUBSEP.
  • BEGIN { a[1,2] = 42; print a[1,2] } still behaves exactly as today.
  • BEGIN { a[1] = 5; print a[1][2] } fails with an array/scalar type error until delete a[1] happens.
  • BEGIN { a[1][2] = 7; print ((2 in a[1]) ? "yes" : "no") } reports membership correctly.
  • BEGIN { a[1][2] = 7; delete a[1][2]; print (2 in a[1]) } removes the nested element correctly.
  • for (k in a[1]) iterates a subarray.
  • split("x y", a[1]) and equivalent array-accepting runtime paths work if a[1] is used as a subarray.
  • Turning allowArraysOfArrays off makes a[1][2] a parser error again.

Out of scope for this issue unless needed to unblock implementation

  • Adding gawk-only introspection helpers such as typeof() or isarray().
  • Separately implementing length(array) / length(subarray) semantics.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions