Skip to content

API: Pt.1, add ResponsiveStoreFactory and store options #50

Closed
ableegoldman wants to merge 2 commits intomainfrom
api-Pt1-ResponsiveStoreFactory
Closed

API: Pt.1, add ResponsiveStoreFactory and store options #50
ableegoldman wants to merge 2 commits intomainfrom
api-Pt1-ResponsiveStoreFactory

Conversation

@ableegoldman
Copy link
Copy Markdown
Contributor

First up: start migration to a static factory class similar to Kafka Streams' Stores. Stubbed everything in the new file for now to focus on the API, will follow up with all the guts and pipes once we get consensus here

@ableegoldman ableegoldman requested review from agavra and rodesai July 1, 2023 04:51
Comment thread kafka-client/src/main/java/dev/responsive/kafka/store/StoreUtils.java Outdated
Comment thread kafka-client/src/main/java/dev/responsive/kafka/store/StoreUtils.java Outdated
// required configs
private final String name;
private final long windowSize;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use the builder pattern here so that we know objects of this class are immutable once they're built?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have anything against it, it's just really awkward when mixed with generics (at least to me, maybe a true java expert like @agavra would have a better idea)

That said we only need generic for the key/value serdes, and technically we only need those for the StoreBuilders since that requires the serdes up front, whereas Streams puts the key/value serde setters directly on the Materialized class. It's just down to a really awkward asymmetry in Streams (which I would guess is itself the fallout of some insane serde inheritance logic in the DSL).

Point is, I was hoping to smooth over that weird asymmetry but we don't need to, and can use the builder pattern by just moving the serdes to the StoreBuilder parameters. I'll see what that looks like

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Point is, I was hoping to smooth over that weird asymmetry but we don't need to, and can use the builder pattern by just moving the serdes to the StoreBuilder parameters. I'll see what that looks like

Make sense. I'll take a trade of a bigger list of parameters for immutability

private Serde<K> keySerde;
private Serde<V> valueSerde;

/**
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use the builder pattern here so that we know objects of this class are immutable once they're built?


// optional configs
private boolean truncateChangelog = false;
private Long ttlMs = null;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what lifetime does this refer to (all keys? change log topic? changelog topic tombstones?)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well technically those should all be the same, no? (with maybe a little extra time added as a buffer for the actual changelog retention, like what Streams does with the windowstore.additional.changelog.ms config or whatever it's called)

I didn't actually add any setters for this so it's not exposed at the moment anyway, but if we're thinking of offering/exposing this in the near future it would be nice to include it when laying the groundwork so it's easy to switch on when ready

Of course this is kind of getting ahead of things so I can take it out 🤷‍♀️

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lol are you seeing the emoji come up as "🤷 ♀️ "? Looks like the gender-based emojis broke (it the previous it shows up correctly as the woman-shrugging) 😂

Copy link
Copy Markdown
Contributor

@rodesai rodesai Jul 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well technically those should all be the same, no?

Not following. Each of these seem like different properties to me:

  • Record TTL: A semantic property of the store that defines a lifetime for each entry (in stream time)
  • Retention Time: An implementation detail of a store that defines how long records are stored (in wall clock time)
  • Changelog retention: A property of the changelog Kafka topic (for topics that use the delete policy). We could still set this for topics with infinite Record TTL if the user is willing to accept data loss if no flush happens for tha time period.
  • Tombstone retention: A property of the changelog Kafka topic (for topics that use the compact policy)
  • (Related but not exactly TTL) Cleanup policy: the cleanup policy of the Kafka topic

I'm not suggesting we expose or not expose all these as configurable by the user. I was just wondering which of these the ttlMs refers to. Or maybe it doesn't refer to any of these exactly? Basically - what does this config mean 😄 ?

Lol are you seeing the emoji come up as "🤷 ♀️ "? Looks like the gender-based emojis broke (it the previous it shows up correctly as the woman-shrugging) 😂

Yep that's what it looks like to me. They should fix that. It's messed up that it fails in that particular way.

Comment thread kafka-client/src/main/java/dev/responsive/kafka/api/ResponsiveStoreFactory.java Outdated
Copy link
Copy Markdown
Contributor

@agavra agavra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I'm on to your idea of making things as close to Streams as possible, I'm really on to that idea 😆 See inline suggestions, but I think we can get rid of the Options classes altogether.

Comment thread kafka-client/src/main/java/dev/responsive/kafka/api/ResponsiveStoreFactory.java Outdated
import org.apache.kafka.streams.processor.StateStore;
import org.apache.kafka.streams.state.WindowStore;

public class WindowStoreOptions<K, V> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the utility here, but this seems like something that should just be directly in streams, right? I think we should avoid extra convenience things like this in responsive-clients and instead contribute them back to streams directly so that we don't have to update these if streams changes

the general motivation here is to have a smaller surface area of API we need to maintain and it also makes migration for the customer easier since they can just replace Stores with RepsonsiveStoreFactory (I also think we should rename the class ResponsiveStores so everything feels familiar)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be better as a "face to face" conversation because we might both feel strongly here, but I have to (respectively 😋) disagree about avoiding convenience in favor of replicating the existing Stores API. I don't think I can stress enough just how incredibly awkward, frustrating, and difficult to understand the Stores factory and surrounding context really is.

First of all, there are a million different methods on Stores and more are added all the time -- each new store type means (at the least) one supplier and a corresponding builder, and the docs aren't really clear enough about when to use which. There's also a lot of gotchas, for example there are timestamped and "normal" variants of each supplier and each builder, and you have to think carefully about which suppliers can and can't be passed into which builders.

But (besides stream-stream joins -- see other comment) there's actually no top-level usage for store suppliers -- they are solely for configuring either a StoreBuilder (for PAPI) or Materialized (for DSL) which isn't even part of the Stores API. Why not just offer a store factory that provides the two actual store provider-types, and save users the headache of figuring out which store supplier to use and then what to actually do with it?

Regarding the Options classes, the benefit here is that users don't have to figure out the value of all those parameters and can ignore everything they don't want to override and let us figure out the correct values. This can honestly be infuriating when trying to customize a window store, because for some operators the parameters are flexible while others expect a certain value and only throw an exception at runtime if you gave the wrong thing. And it's not obvious or even easy to figure out what the right value is -- especially stream-stream joins. I had to work it out through trial-and-error because the error message doesn't even tell you (I actually did contribute a PR to improve this but haven't had time to finish fixing up the tests )

So it's hard to figure out which store suppliers to use, how to hook them into your actual Topology depending on DSL vs PAPI, and hard to figure out the value of parameters that you don't need to care about. This would already be bad even if it was "familiar", but sure, some users are already doing this and could substitute our stores for whatever they had before. But a lot of users don't customize their stores at all right now! So migrating to Responsive will be their first experience with this mess, and even though we just stuck to Kafka Streams's API they're going to view it as a problem with our API, because we're the ones forcing them into this. This is the ultimate reason I want to present a cleaner API (and why I wrote this manifesto 😅 )

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, just to put the whole argument out there, it's nice to wrap configs in one options class for compatibility and our ability to add new optional features without rapid expansion of the store factory API. John actually proposed a new grammar for Streams based on this idea, but it never went anywhere (partly pushback from Matthias, partly it was going to be a huge project and disruption for users to migrate)

On that note...with all that said, I of course agree wholeheartedly with your opening line:

this seems like something that should just be directly in streams, right?

But getting a good clean API in Streams is going to take forever. It's almost 100% touching on the public API which is the nittiest part of every KIP discussion so that will drag out, it'll involve many large PRs that will add nontrivial review load and won't be prioritized since it's purely cosmetic within AK Streams, and the surface area of the whole thing is huge so there will be a lot of compatibility concerns about whether this is necessary (a fair question imo).

I definitely DO want to clean up the Streams API, but imo this is probably one of the most difficult things to contribute back. Whereas we can benefit from it right away, like next week. Maybe this is overly pessimistic based on how the grammar proposal from John went, but you can probably imagine how this would go. So I just don't think it makes sense for us pass on improving the user experience until it gets into Streams

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologize if that came off as too aggressive -- I'm not frustrated with you, I'm frustrated with Kafka Streams's stores 😂

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely let's chat face-to-face on this one. Clearly there's a lot of history here I'm not aware of - I 100% agree that this proposal is better than what's there in streams, that was not in question (at least on face value, admittedly I'm ignorant about the nuance of what's in streams today). It's a shame we can't just contribute this to Streams...

I think what it boils down to is this comment:

This would already be bad even if it was "familiar", but sure, some users are already doing this and could substitute our stores for whatever they had before. But a lot of users don't customize their stores at all right now!

If we're confident many people don't customize their stores at all, then that's a good argument to provide a better API for us.

I'm thinking specifically of our first use case, where they do use the stores and they'll need to change their code structure a little to use the new API - not the end of the world, but every change increases the migration guide length (both literal word count and time to migrate). To me, that is a critical metric.

The other thing to consider is that we don't have docs or examples published anywhere - there's a lot of public material out there about Kafka Streams, which we'd have to replicate somehow.

Copy link
Copy Markdown
Contributor

@rodesai rodesai Jul 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my 2c having just waded into this and still catching up on the state store apis - I'm very torn here. I have managed not to really look at the state store API in all my years at Confluent, and after trying to take a pass over it just now my head is spinning. What you've proposed here is way cleaner @ableegoldman.

I think at the end of the day both the use case where an existing user doesn't customize state stores and does customize state stores are important for us to solve. I doubt we can just write off either scenario as being sufficiently rare.

What's the cost of keeping around something that mirrors the existing API, while also offering our own better API for users that just want the basic stores? Is it really that much extra overhead? It might be worth it to keep the migration experience palatable.

Also, I'm skeptical about trying to get our API into Streams. We're probably going to spend more time bikeshedding than just keeping the API up to date.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the cost of keeping around something that mirrors the existing API, while also offering our own better API for users that just want the basic stores? Is it really that much extra overhead?

I think that should be possible, especially b/c our new API can just delegate the old one, and then we get the best of both worlds. It is more API surface area and more mental overhead but I think optimizing for ease of transition for both of those use cases makes sense.

* @return a store supplier that can be used to build a window store for stream-stream joins
* with the given name that uses Responsive's storage for its backend
*/
public static WindowBytesStoreSupplier streamJoinedStore(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: same comment as below, unless there's something specific to Responsive here I think it's better to just contribute this back to streams and stick with the existing stream APIs here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main context for this is in my super-long manifesto comment above, but to clarify why this is the only one to return a StoreSupplier, it's because the StreamJoined options class in Streams is, awkwardly, the only place raw you can/must use a raw StoreSupplier to plug in a custom store (rather than Materialized or StoreBuilder). I don't remember all the details of why, but basically these have special window stores. Also the parameters are pre-fixed and users currently have to figure out what to plug in themselves, so we may as well provide a nice API where they can just pass in their join definition and let us work out the right values for them

Comment thread kafka-client/src/main/java/dev/responsive/kafka/api/ResponsiveStoreFactory.java Outdated

// required configs
private final String name;
private final boolean isGlobal;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect you're creating this because with the current API we need to know whether or not the store is global up front. I don't think we actually need this though, we can get rid of ResponsiveGlobalStore and have only one ResponsiveStore that has two different internal implementations and we check whether the StateStoreContext that is passed in is instanceof GlobalProcessorContextImpl (or taskId().partition() == -1 if we don't want to use instanceof) to initialize the "global" version of it

The benefit is that we can keep the ResponsiveStoreFactory API identical to Stores and it's one less thing for the user to worry about.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to push back a bit on replicating the existing Stores exactly, but when it comes to the global store thing I agree 100%. It would be way too easy for someone to miss this and pass in a non-global store by accident, and it's clunky. So if you found a way to figure out which kind of store it is implicitly, I'm all for it!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do want to make sure the basic API is done ASAP so I'll just remove the isGlobal stuff altogether for now. Would you mind filing a ticket for refactoring the global store implementation based on this idea to check the type during init?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I can take this Action Item.

@ableegoldman ableegoldman force-pushed the api-Pt1-ResponsiveStoreFactory branch from f74d981 to 1fae66b Compare July 2, 2023 03:43
* @return a store builder that can be used to build a key-value store with the given options
* that uses Responsive's storage for its backend
*/
public static <K, V> StoreBuilder<KeyValueStore<K, V>> keyValueStoreBuilder(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what will we do if someone tries to set something on one of these builders that conflicts with the passed in option? (e.g. withLoggingEnabled with some topic config that conflicts like setting compact when truncateChangelongs is true)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see from pt3 we're throwing an exception. Very nice.

@ableegoldman
Copy link
Copy Markdown
Contributor Author

Closing this since we're going in a different direction with the API for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants