Skip to content

Conversation

@asnare
Copy link
Contributor

@asnare asnare commented May 22, 2025

This PR continues from #189 by updating it with the following changes:

  • It introduces a specific JsonValue type alias that represents the maximum bounds of the values that can be saved for an installation.
  • It removes support for Any and object as type annotations on data classes that we marshal. The existing tests for these have been updated to use JsonValue instead.
  • Support for marshalling raw list and dict fields is deprecated. Saving these used to work, but loading would fail. I've chosen to deal with this by:
    • Issuing a DeprecationWarning if they're hit when saving, with instructions to use list[T] or dict[T] instead.
    • During loading, a specific error is thrown indicating how it can be fixed.
  • Documents the restrictions and limitations on .save() for what it can actually handle.
  • Adds a marker for an improvement whereby during marshalling the type hints should actually be checked against the values that are being saved.
  • Fixed some tests that were accidentally passing.
  • Tightens some of the type-hints we had in place where we declared Any but really only accepted a subset.

This is sufficient to allow the use-case that we need in databrickslabs/lakebridge#1488, and is preferred to the approach in #189 because it forces developers to fix their code:

  • Declaring JSONValue as a field will let the type-checker tell them if a value is assigned that does not conform. (The fix is for the client code to validate the values.)
  • Similarly during loading the type-checker forces the caller to check and handle the possible types that might be present.

One of the bugs fixed in #189 relates to automatic conversion between JSON values: for example, from 123 to "123" or vice-versa. Some tests were (incorrectly) relying on this, as were some tests in a downstream project (remorph). There might be other places in production code where the previous behaviour is relied upon. As such, the next release that includes this code (or that from #189) should bump the minor version and not just be a patch release.

ericvergnaud and others added 29 commits March 13, 2025 15:56
…tabrickslabs/blueprint into support-marshalling-of-object-and-any

* 'support-marshalling-of-object-and-any' of github.com:databrickslabs/blueprint:
  Initial test coverage for configuration marshalling/unmarshalling (#226)
  Fixed Blueprint Install (#225)
* main:
  Fix argument interpolation in colorised logs (#233)
  Ensure the non-colorized logs also include timestamps with second-lev… (#235)
  Test coverage for the customised logger setup (#232)
  Type hints for existing installation tests (#231)

# Conflicts:
#	tests/unit/test_installation.py
* main:
  Release v0.10.2 (#239)
  Ensure that App logger emits `DEBUG` events if the CLI is invoked with `--debug` (#238)
  Consistent exception formatting in logs (#237)
  Ensure the names of logger levels are consistent (#234)
  Fix logger name abbreviation fails if the logger name contains `..` (#236)
…nfiguration files.

This is a convenience value, and represents the maximum bounds for a value that can be natively serialised/deserialised using the YAML format that we use.
It was being handled in two-places, which wasn't necessary.
JSON represents the bounds of what can be serialised.
JSONValue can be used as a drop-in replacement.
@asnare asnare self-assigned this May 22, 2025
@asnare asnare force-pushed the support-marshalling-json branch from c416539 to 8ddfd96 Compare May 23, 2025 09:17
@asnare asnare marked this pull request as draft May 23, 2025 11:32
asnare added 2 commits May 26, 2025 12:35
…eserialized.

Previously the field would be reinitialized with its default value, if possible.
@asnare asnare marked this pull request as ready for review May 26, 2025 13:40
Reduces complexity of the main unmarshalling function.
@sundarshankar89 sundarshankar89 merged commit cfd0522 into main May 27, 2025
9 of 11 checks passed
@sundarshankar89 sundarshankar89 deleted the support-marshalling-json branch May 27, 2025 04:33
sundarshankar89 added a commit that referenced this pull request May 27, 2025
* Marshalling: allow JSON-like fields ([#241](#241)). The library has undergone significant changes to improve its marshalling functionality, code readability, and maintainability. A new `JsonValue` type alias has been introduced to represent the maximum bounds of values that can be saved for an installation, and support for `Any` and `object` as type annotations on data classes has been removed. The library now issues a `DeprecationWarning` when saving raw `list` and `dict` fields, and raises a specific error during loading, instructing users to use `list[T]` or `dict[T]` instead. Various methods, including `_marshal_generic_list`, `_marshal_raw_list`, `_marshal_generic_dict`, and `_marshal_raw_dict`, have been updated to handle the serialization of lists and dictionaries, while the `_unmarshal` method now handles the deserialization of unions, lists, and dictionaries. Additionally, the library has been updated to provide more informative error messages, and several tests have been added to cover various scenarios, including generic dict and list JSON values, bool in union, and raw list and dict deprecation. The `Installation` class, `MockInstallation` class, and `Paths` class have also been updated with new methods, type hints, and custom initialization to improve code flexibility and maintainability.
@sundarshankar89 sundarshankar89 mentioned this pull request May 27, 2025
sundarshankar89 added a commit that referenced this pull request May 27, 2025
* Marshalling: allow JSON-like fields
([#241](#241)). The
library has undergone significant changes to improve its marshalling
functionality, code readability, and maintainability. A new `JsonValue`
type alias has been introduced to represent the maximum bounds of values
that can be saved for an installation, and support for `Any` and
`object` as type annotations on data classes has been removed. The
library now issues a `DeprecationWarning` when saving raw `list` and
`dict` fields, and raises a specific error during loading, instructing
users to use `list[T]` or `dict[T]` instead. Various methods, including
`_marshal_generic_list`, `_marshal_raw_list`, `_marshal_generic_dict`,
and `_marshal_raw_dict`, have been updated to handle the serialization
of lists and dictionaries, while the `_unmarshal` method now handles the
deserialization of unions, lists, and dictionaries. Additionally, the
library has been updated to provide more informative error messages, and
several tests have been added to cover various scenarios, including
generic dict and list JSON values, bool in union, and raw list and dict
deprecation. The `Installation` class, `MockInstallation` class, and
`Paths` class have also been updated with new methods, type hints, and
custom initialization to improve code flexibility and maintainability.
github-merge-queue bot pushed a commit to databrickslabs/lakebridge that referenced this pull request May 28, 2025
This PR continues from #1488, introducing the following:

- The `--debug` handling introduced in #1488 is removed. This was a
workaround for a problem solved upstream in #238.
- The logging has been adjusted to reflect the updates we made in #1576.
- A test from #1568 is fixed; something on this branch leads the version
information to include a timestamp, which needs to be ignored by that
test.
- Some existing (installation) tests were incorrectly mocking the
install-state: the values should always be strings, but the tests were
placing integers in the fixture. Some upstream changes on
#189 change the behaviour in this area.

### Implementation details

~This pull-request depends on databrickslabs/blueprint#241, and the
project dependencies have been updated to depend on them. This is a
temporary change to ensure that CI can work. Before merging a release of
blueprint is required, and the released version needs to be used
instead.~ [Done]

### Caveats/things to watch out for when reviewing:

Please check the changes to the `transpile` command.

### Linked issues

Follows #1488.

### Functionality

- modified existing command: `databricks labs remorph transpile` (from
#1488)

### Tests

- existing unit tests
- added integration tests

---------

Co-authored-by: Eric Vergnaud <eric.vergnaud@databricks.com>
Co-authored-by: sundar.shankar <sundar.shankar@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants