Skip to content

Conversation

@extemporalgenome
Copy link
Contributor

@extemporalgenome extemporalgenome commented Jul 29, 2025

This is meant to be read commit-by-commit.

This refers to reference cycles in data structures which are passed into the json package for encoding, for example a struct which has a pointer to itself, a slice or map of any values which refer back to the containing slice/map, or any combination of these.

Cycle detection had been implemented (but only for cycles involving normal pointers), but was effectively broken/non-functional due to a call-chain subtlety.

This also expands cycle detection to cover maps and slices, as the stdlib implementation covers these cases as well.

Side note: I had alternatively explored using the global codec cache itself as the type-cycle detector. That altogether felt like a much more elegant solution than the bespoke "seen" map which had already been in use (and which this PR continues to use), and would have the benefit of sharing more generated codecs and reducing memory use in some cases, but did involve a larger refactor than I had been willing to do at the time.

@extemporalgenome extemporalgenome force-pushed the kgillette/2025-07/fix-cycle-detection branch from 0c1b278 to e7e3ee1 Compare July 29, 2025 18:51
This is necessary for cycle detection to work,
which had been implemented, yet was broken.

Also introduce cachedCodec helper function.
Also set UnsupportedValueError.Value (better stdlib compat).
@extemporalgenome extemporalgenome force-pushed the kgillette/2025-07/fix-cycle-detection branch 3 times, most recently from 657600e to 722d2e3 Compare November 26, 2025 01:11
@extemporalgenome extemporalgenome force-pushed the kgillette/2025-07/fix-cycle-detection branch 2 times, most recently from 62c840e to 9aaaa4b Compare November 30, 2025 20:12
func cachedCodec(t reflect.Type) codec {
cache := cacheLoad()

c, found := cache[typeid(t)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thread safety for this access, or is "single thread assumed" documented somewhere higher up?

Can you fix the typo on line 62 - "Marshal"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This concurrency behavior was already in place within this json package, and the decision is maintained within this PR despite some minor refactoring around the concept.

Essentially the "global cache" is stored in an atomic.Pointer wrapping a map. To access the cache, the map is atomically loaded but then read normally. To update the cache, we first create a shallow copy of the map (cached values are never modified), then set our new key(s), then atomically store the new map in place of the old one.

Thus reads concurrent with writes are okay: the reader already has what it needs from whichever map it had loaded.
Concurrent writes are also okay: if both goroutines needed to handle different types which hadn't already been in the map, both will build and use codecs for the immediate operation they're performing, and then the last writer will "win" the race, keeping its particular codec in the cache for next time. The type that lost the race will get regenerated in a future call and eventually end up durably in the cache.

The code comments aptly refer to this as an eventually consistent cache, because over time, with enough repeat calls, there will be fewer and fewer benign races each time (any type that wins a race never needs to participate in the race again), and eventually every needed type will end up in the cache.

e.refSeen = nil
cycleMapPool.Put(m)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth adding some tests for this commit? (Or maybe coming in a later commit?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't add bespoke tests here because the borrowed stdlib tests provide very good coverage over this behavior. That's pretty easy because this PR was specifically developed against those stdlib tests, including the behavior in this specific commit.

That said, it looks like map cycles have no test coverage. I will add tests for that.

// =============================================================================
// Copyright 2010 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Want to share what file these were taken from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/golang/go/blob/f22d37d574836ed4f1b42e283dda55dbd4d80aca/src/encoding/json/stream_test.go#L27-L47

The strategy has been (I didn't come up with it, but am continuing it):

  1. Use as many stdlib json tests as possible (e.g. the tests the stdlib encoding/json package uses for itself). Copy those test files more or less intact.
  2. Copy whatever test helper types/functions are needed for the above intact copying to work without modifications.

This golang_shim_test.go serves to fulfill point 2.

math.NaN(),
math.Inf(-1),
math.Inf(1),
type SamePointerNoCycle struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you confirm that these tests broke with the behavior on tip before you applied your patches? Ie, that the patches fix the tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I sure did. Essentially the main branch of segmentio/encoding/json fails all of the stdlib cycle tests. It tried to implement detection of cycles involving regular pointers, but failed at that goal, and since there hadn't been any meaningful tests covering cycle detection, that broken feature went unnoticed.

This PR fixes not only cycle detection involving pointers, but also adds (working) support for data cycles involving maps and slices, as well as codec generation support for recursively defined slices, e.g. type T []T. There may be other hypothetical cases that aren't yet covered, but as of this PR, segmentio/encoding/json has parity with the stdlib encoding/json (at least in terms of passing all the relevant stdlib tests) for any kind of cycle.

@kevinburkesegment
Copy link
Contributor

I don't see any obvious problems; I'd probably prefer for there to be more tests to detect this condition, given the trickiness of the changes and how critical this logic is.

I was also looking for any logic that e.g. stops after 2000 iterations, if for some reason the cycle detection fails, to avoid blowing up the stack, but maybe this was already present in the code

@extemporalgenome
Copy link
Contributor Author

There isn't any test validating "does it stop after 1000 iterations, no more no less?"

I'm not convinced that such a test has a lot of value, since the particular number isn't especially important (if it were 1003 or 997 or 9999, the high level behavior would be the same).

There is, however, definitely good coverage (from the stdlib) verifying that cycling values don't cause stack overflows/infinite-recursion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants