Skip to content

Gawk Compatibility IT Harness for Jawk #450

@bertysentry

Description

@bertysentry

Summary

  • Treat gawk’s test machinery as metadata, not as an executable dependency.
    Makefile.am defines test groups and handwritten exceptions, Gentests turns most declarative entries into Maketests, and Maketests is the closest thing to a machine-readable test manifest.
  • Do not base Jawk’s compatibility suite on make check or shelling out to gawk’s Makefiles.
    On this repo’s current Windows environment, that path is brittle because the harness assumes Unix tools and shell behavior (make, cmp, sed, grep, /dev/fd, locale exports, shebang execution, shell scripts).
  • Recommended v1 approach:
    1. reimport a clean gawk snapshot into src/it/resources/gawk
    2. parse vendored Maketests in Java
    3. run parsed cases through JUnit 4 + Failsafe using AwkTestSupport
    4. explicitly skip unsupported handwritten/shell-heavy cases instead of trying to emulate all of GNU make

Implementation Changes

  • Maven layout:
    • Move compatibility integration tests and resources to src/it/java and src/it/resources.
    • Update pom.xml to add src/it/java as a test source root and src/it/resources as a test resource root.
    • Keep Failsafe as the runner for compatibility tests; switch includes from the current class names to **/*IT.java.
    • Update formatter and license plugin config so src/it/java is included.
  • Resource source of truth:
    • Replace the altered vendored gawk files with a fresh import from C:\Dev\gawk\test.
    • Store the imported snapshot under src/it/resources/gawk.
    • Preserve Makefile.am, Gentests, Maketests, all .awk, .ok, .in*, and support files exactly as imported.
    • Do not read C:\Dev\gawk\test at test runtime; that path is only for the one-time reimport workflow.
  • Harness design:
    • Add one parameterized JUnit 4 suite, e.g. GawkCompatibilityIT, backed by a small parser/data model:
      • GawkCase: test name, script mode (awk or sh), CLI flags, stdin file, expected output file, locale, skip reason.
      • GawkMaketestsParser: parse only the generated Maketests rule shapes that Jawk needs.
    • Parse these rule features from Maketests:
      • -f $@.awk
      • CLI flags such as --posix, --lint, --lint-old, --traditional, --sandbox, --csv, --re-interval, -M
      • stdin redirection from *.in
      • locale directives emitted before the AWK command
      • expected output file selection between .ok and -mpfr.ok
      • shell-script targets as a distinct unsupported type
    • Execution policy:
      • Use AwkTestSupport.cliTest(...) in-process, not java -jar jawk-standalone.jar.
      • Feed *.in files to stdin, not as filename operands, unless a manual override says otherwise.
      • Convert parsed locale directives to Jawk CLI --locale when possible.
      • Any case requiring mutable process env, shell redirection beyond stdin, file-descriptor tricks, pipelines, profile outputs, or shell scripts is marked skipped in v1.
  • Scope boundaries for v1:
    • Include all cases that can be derived from Maketests and represented with the existing Jawk CLI model plus trivial mapping.
    • Exclude handwritten Makefile.am targets that are not represented in Maketests.
    • Exclude .sh-driven targets and Unix-heavy manual rules from v1 rather than partially emulating them.
    • Do not add an external-process runner in v1.
    • Do not add a general environment override layer in Jawk runtime code for v1.
  • Reporting/stability:
    • Add one explicit skip manifest in test resources or Java constants so skipped cases are intentional and reviewable.
    • Make the suite assert parser coverage against Maketests:
      discovered cases = runnable cases + explicitly skipped cases.
    • Keep current compatibility docs, but update the description to reflect that gawk compatibility tests now live under src/it and are metadata-driven from the vendored gawk snapshot.

Public / Interface Changes

  • No production API changes.
  • Build/test interface changes:
    • Failsafe becomes the owner of src/it/java compatibility tests.
    • Compatibility suite naming changes to *IT.
    • Vendored gawk resources move from src/test/resources/gawk to src/it/resources/gawk.

Test Plan

  • Parser tests:
    • parse a standard generated AWK rule with stdin
    • parse rules with locale directives
    • parse rules with option flags (--posix, --traditional, --sandbox, --csv, --re-interval)
    • parse MPFR expected-output switching and mark those cases skipped in v1 unless Jawk has matching support
    • identify shell-script targets as unsupported
  • Integration tests:
    • run a small stable subset of parsed gawk cases end-to-end through GawkCompatibilityIT
    • verify stdin-vs-operand behavior with known cases where current GawkTest is wrong
    • verify skipped cases are reported as skipped, not silently dropped
    • verify Failsafe picks up only src/it compatibility suites
  • Acceptance criteria:
    • mvn test does not run the gawk compatibility suite
    • mvn verify runs the gawk compatibility suite through Failsafe
    • the suite is driven from vendored Maketests, not by scanning *.awk
    • Linux CI is the authoritative pass-rate environment
    • Windows can execute the parser and portable subset without requiring make

Assumptions and Defaults

  • Source of truth after this work is the vendored snapshot under src/it/resources/gawk.
  • The initial reimport source is C:\Dev\gawk\test, but runtime tests do not depend on that checkout.
  • Linux CI is authoritative; Windows may skip shell/locale-heavy cases.
  • Direct Makefile execution and java -jar jawk-standalone.jar are both deferred.
  • v1 optimizes for low code and maintainability, not full fidelity to every handwritten gawk test rule.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions