Skip to content
This repository was archived by the owner on Nov 24, 2025. It is now read-only.

Blueprint/refetch#5910

Merged
rawlinp merged 7 commits intoapache:masterfrom
tcfdev:blueprint/refetch
Jun 22, 2021
Merged

Blueprint/refetch#5910
rawlinp merged 7 commits intoapache:masterfrom
tcfdev:blueprint/refetch

Conversation

@tcfdev
Copy link
Copy Markdown
Collaborator

@tcfdev tcfdev commented Jun 3, 2021

What does this PR (Pull Request) do?

This is a blueprint proposal to solve a few issues, however the primary one is:

Which Traffic Control components are affected by this PR?

This blueprint has the potential to impact:

  • CDN in a Box
  • Documentation
  • Traffic Control Clients (Go)
  • Traffic Ops
  • Traffic Ops ORT
  • Traffic Portal
  • CI tests

What is the best way to verify this PR?

No verification necessary for the blueprint. Just review the file blueprints/refetch_invalidation.md

Comment thread traffic_portal/test/integration/README.md Outdated
Comment thread go.mod Outdated
Comment thread blueprints/refetch-invalidation.md Outdated
Comment thread blueprints/refetch-invalidation.md Outdated
Comment thread blueprints/refetch-invalidation.md Outdated
Comment thread blueprints/refetch-invalidation.md Outdated
Comment thread blueprints/refetch-invalidation.md Outdated
Comment thread blueprints/refetch-invalidation.md Outdated
Comment thread blueprints/refetch-invalidation.md Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear, if the value of invalidationType is malformed or of the wrong data type, it will result in a 400, else, if it is missing, or any value other than refetch or refresh will always default to refresh, correct?

Copy link
Copy Markdown
Collaborator Author

@tcfdev tcfdev Jun 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh excellent question. What are your thoughts? I suppose I would probably make this a bit more strict.

To be explicitly correct, the caller would have to send either refetch, refresh, or nothing at all. By that I mean they would not include the invalidationType key at all (which would then default to refresh). If they did include the key and the value was an empty string, that would result in a 400. In fact, any other value would result in a 400. I think this is probably the best approach. The response can contain a helpful message to why it was rejected to assist any callers in the future.

Should it be case insensitive?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think if the user includes the invalidationType key and gives a blank string, that could probably be treated as refresh as well, but I'm new to invalidation jobs so maybe @ocket8888 would know better.
The following four types should be accepted IMO:
1.) refresh
2.) refetch
3.) blank string -> defaults to refresh
4.) invalidationType key absent -> defaults to refresh
Anything else is treated as a 400.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You absolutely can do that - I would suggest instead that the field not be optional, and the client creates objects exactly as they desire them to exist with no room for surprises.

Copy link
Copy Markdown
Member

@mitchell852 mitchell852 Jun 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, surprises are not good imo. so i guess 400 if invalidationType != refresh or refetch. and because you are adding this to a new major version of the api (4.0), you have the luxury of adding the new required field to the request...but tbh, if everyone likes the idea of a default value, i'm good with that too :)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just because we have a new major version we can add this to doesn't mean we should just make every new change a breaking change. I don't think making this a new required field adds enough value to warrant that since we can easily provide a default instead.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've modified this to be explicitly either refresh or refetch. All other submissions, including missing/omitted, will be reject as a client error (400 - Bad Request).

@tcfdev tcfdev force-pushed the blueprint/refetch branch from 5ce9a7b to 6a52fa0 Compare June 3, 2021 17:18
Comment thread blueprints/refetch-invalidation.md Outdated
Comment thread blueprints/refetch-invalidation.md Outdated
Comment thread blueprints/refetch-invalidation.md Outdated
#### Data Model / Database Impact


The current column `parameters` will now contain a cskv (comma separated key value) string. Currently it only stores the `TTL` for the invalidation request:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which table is this in?

Why not add a new column rather than storing multiple values in the same column?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree. stuffing lots of values in a column seems problematic going forward. can't you just reuse the keyword column of the job table? currently, keyword is PURGE for EVERY job. maybe keyword should be REFRESH and REFETCH instead and do a database migration to change all existing PURGE values to REFRESH?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pivoted a bit on this original idea. My first attempt was to change as little as possible, reusing the parameters column. However, I believe we all agree that explicity and purposefully is a better approach. I've taken the approach of explicit columns (and datatypes). I also added a recommendation which I think we should implement that performs a great deal of clean up for the jobs table. We are carrying a decent amount of future-proofing along with us that doesn't appear to be utilized anywhere.

Definitely let me know if this newer proposal is lacking in clarity.


> The removal of these columns will impact the `UserInvalidationJob` struct. Even though the only endpoint utilizing this struct (v1.1) is deprecated, it is still in use.

### ORT/T3C Impact
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please describe how the construction of the regexrevalidate.config file will change given that this an external interface of Traffic Control?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a bit more of a description. There isn't much more to the construction of T3C's regexrevalidate.config file. This is already implemented, we are allowing it to be explicitly controlled via ATC's APIs, interfaces, and clients.

Essentially, it remains the same format, but with an optional third field for each non-comment entry.

Comment thread blueprints/refetch-invalidation.md Outdated

### ORT/T3C Impact

The `regexrevalidatedotconfig.go` in _lib/tc-atscfg_ currently has a function called `filterJobs` that is responsible for parsing the _parameters_ information from the job. Right now, that is only parsing the **TTL**, however it would then also need to parse the optional, extra field of **invalidationType**.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What considerations are made for TrafficServer version dependencies?

Is MISS supported by all ATS versions? If not, what happens when an operator sets a REFETCH job on a set of caches that do not support that invalidation type?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like there are two options when MISS is unavailable based on ATS version: ignore it or use refresh. Can you suggest which would be better?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not an ATS expert, so I can't make a solid suggestion here.

I think whatever happens probably needs to be bubbled up to the operator. You don't want to silently ignore or silently do a stale-refresh when you expect a Miss-Refetch.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, when I need an ATS and/or ORT/t3c/atstccfg expert I ask @rob05c - lets see if he has an opinion

Copy link
Copy Markdown
Member

@rob05c rob05c Jun 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's definitely not supported on older versions. It's a new feature of the regex_revalidate plugin.

  1. Unfortunately, there's not an easy way to bubble it up. We don't really have a mechanism to send data from caches to TO, and even if we did, it'd be pretty unwieldy to collect that from every single cache.

    We do have the ATS version in a Parameter in TO, we could hard-code it. But someone might backport the plugin changeset to an older version, and then ATC just won't allow using it. So that's not a great solution.

  2. the "refetch" feature is only necessary or useful if the Origin server is fundamentally broken and completely violating the HTTP Spec. Without "refetch," ATS does an IMS request. So, the only reason it's ever needed is when a Tenant's Origin is responding with a 304 when the asset actually is modified. In complete violation of RFC 7234 and every HTTP Caching specification. But of course, we deal with horribly broken Origins every day. But I do think we should make that clear in the docs, and maybe the UI.

  3. To work correctly, every single Cache has to support it. Which is mostly fine, since CDN operators don't usually operate with widely varying ATS versions. Some do, but I think the majority have a single version and maybe some newer Canaries.

With those 3 things in mind, I think the best solution might be to make it a config Feature Flag in Traffic Ops/+Portal, with Documentation that they must have version X of regex_revalidate. As an additional safety, we should also make t3c check and log an error (which operators can then Alarm on).

It's not as good as bubbling it up, but considering all those things, that's the best solution I'm seeing at the moment.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like you're saying you set the flag in TO, then ORT/t3c/atstccfg looks at it and determines based on that what to do with jobs that use "refetch" - which makes sense, because otherwise I suppose I don't know how t3c could "check and log an error"

Yes, that is how use_reval_pending works currently and is how I imagine a feature flag for this feature would work as well.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I said "feature flag" I was thinking the TO configuration file; but a global Parameter also works.

I thought it'd be like: you make a new job, if it's using "refetch" then TO checks if the feature flag is allowed. If it isn't, you can't make that job

That's what I was thinking, but also having a check in t3c-apply as a fail-safe, if it ever does get a job and the installed regex_revalidate is too old to understand it, it would log an error (but probably still do the reval, because an IMS reval is better than nothing and doesn't hurt if they wanted a no-IMS reval).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to not allow creating those jobs at all, if the feature-flag isn't set. Because by the time it gets to the cache, it's too late: the user put in jobs, that they expected to work, and now we have an error we can't reasonably propagate up to them that the job won't work.

But we already knew it wasn't going to work, we have the flag in TO. We should be able to return an error from the /jobs POST.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bearing in mind, these won't be intermittent errors. The endpoint will always return an error if the flag isn't set.

That's a good reason to put the flag in the Parameter instead of the config file: the Param is already available in the API. Which lets UIs like the Portal fetch it, to decide whether to hide that extra button from users, so they never even see the option that will always error.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added some language specifically stating the addition of a global parameter, defaulting to rejecting refetch POSTS until specfically set to allow.

Comment thread blueprints/refetch-invalidation.md Outdated

## Problem Description

Currently, within ATC, there is a concept of Invalidation Jobs. These Invalidation Jobs give a user the ability to queue an invalidation for a resource, primarily based on regular expressions. The invalidation is gathered and treated as though there was a cache **STALE**, allowing the cache to query the origin server to **REFRESH** the resource. However, should the cache policy still be incorrect or misconfigured, the resource could be updated on the origin server, but the cache will still receive a 304 - Not Modified HTTP status response.
Copy link
Copy Markdown
Member

@mitchell852 mitchell852 Jun 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so i don't think the cache queries the "origin server" per se. instead it queries the "parent server" which in some cases will be a parent mid (if the cache is an edge) and the origin server (if the cache is a mid) and is simply the "parent server" when a topology is imployed by a delivery service (where there can be N tiers).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, this is a really good point and observation. My understanding is that this all revolves entirely around misconfigured origins, and even more specifically around origins we can't "fix". If it was an issue within the CDN topology, would we be able to manually intervene in the chain?

Copy link
Copy Markdown
Contributor

@ocket8888 ocket8888 Jun 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the purpose for introducing the feature, but as I understand it the implementation will have the same effects on any requests made by a cache server to its parent for the affected content - regardless of whether or not it's an origin.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified the original description to reference a CDN rather than a cache. Also added clarity that this could be a parent along the chain.

Comment thread blueprints/refetch-invalidation.md Outdated
##### Read
When displaying the information, the **Invalidation Requests** table current shows the `Parameters` field, so the display will be straight forward with no manipulation.

However, we derive and calculate the expiration field based on the TTL. This code will need to be modified to account for the additional information contained in the `Parameters` field.
Copy link
Copy Markdown
Member

@mitchell852 mitchell852 Jun 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh i think the parameters column was a poor design (there should have simply been a TTL column) and we should probably not follow that approach. imo i would suggest hijacking the keyword column and maybe renaming it to invalidation_type or simply type with 2 acceptable values: refresh and refetch

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, any jobs that don't have a numeric TTL parameter will fail to operate properly; I think a migration could reasonably parse it and change it to a numeric TTL column, since if the migration fails it means ORT is failing in some regard.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've modified the proposal to be more explicit, rather than the first attempt (which was to make the most minimal changes). Relish the feedback.

{
"alerts":[
{
"text":"Invalidation request created for http://amc-linear-origin.local.tld/path/.*\\.jpeg, start:2021-06-02 15:23:21.348 +0000 UTC end 2021-06-03 15:23:21.348 +0000 UTC",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about changing the response text to "Invalidation request (refetch) created for..."

Copy link
Copy Markdown
Collaborator Author

@tcfdev tcfdev Jun 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some clarity to the alert message response below.

Comment thread blueprints/refetch-invalidation.md Outdated
This struct now contains the `InvalidationType *string` field.
```go
type InvalidationJobInput struct {
DeliveryService *interface{} `json:"deliveryService"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a *interface{} in APIv3 to give a transition period between using Delivery Service ID (APIv2 and earlier) to XMLID (hopefully all versions moving forward) - so this could be a string. It's not really related to the changes we're making, but it's something that could be cleaned up since we're making structure changes anyway.

That would also eliminate the need for the unexported dsid and the hoops you need to jump through for that.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good catch, and I believe it should be documented at the very least. Same with the TTL below as well. I'm happy to add this as part of the refactor effort, especially since I expect more - than + :)

Comment thread blueprints/refetch-invalidation.md Outdated
```go
type InvalidationJobInput struct {
DeliveryService *interface{} `json:"deliveryService"`
Regex *string `json:"regex"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regex isn't actually optional, right? Can this just be a string instead of a *string?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another good observation. What is actually optional? Any of the fields? We can make them all explicitly required at this time as well if you believe that's a benefit at this time.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think any of the properties were optional, but it's easier to tell for strings than other things if a value was actually given. For example "regex": "" is just as invalid as "regex": null or a missing "regex" property, and Go's parser can't tell the difference between those cases for a string property. However, for something numeric, Go can't tell the difference between 0, null and undefined values for an int property, but sometimes 0 is a valid value, so you have to use *int. It's a very annoying and pedantic balancing act you have to perform with the number of things you have to check for nil all the time versus what the Go parser can and can't distinguish between.

Historically, though, the reason these fields are all pointers is because originally this was going to be the first endpoint to allow PATCH requests, where every property is optional by definition. That didn't pan out because there was prerequisite work that needed to be done to support PATCH safely, but I guess I never changed the fields back from pointers.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool info on the PATCH. That makes sense. I've gone ahead and made both DeliveryService and Regex nonoptional (required for brevity?).

Comment thread blueprints/refetch-invalidation.md Outdated
type InvalidationJobInput struct {
DeliveryService *interface{} `json:"deliveryService"`
Regex *string `json:"regex"`
InvalidationType *string `json:"invalidationType,omitempty"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like optional properties, but also why omitempty? That won't actually be allowed to be empty in the database, right?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed omitempty. I originally thought we could aim for backwards compatibility. Also, whether it was an explicit column in the DB I figured it would default to NULL. However, with the newer changes this will be explicit and no longer optional.

Comment thread blueprints/refetch-invalidation.md Outdated
Regex *string `json:"regex"`
InvalidationType *string `json:"invalidationType,omitempty"`
StartTime *Time `json:"startTime"`
TTL *interface{} `json:"ttl"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does anyone really like being able to put in "duration strings"? I don't think that's as useful as I thought it was. JSON and Javascript numbers are IEEE double-precision floating point numbers, which means that as long as we don't need jobs to last longer than 290 years this could just be a time.Duration which is a duration in nanoseconds.

That would also eliminate the need for the unexported ttl and the hoops you need to jump through for that.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API call implementations (whether go clients or TP), from what I can gather, are sending up JS numbers. We can certainly enforce a numeric data type here, and even if we stick with the hours format, it would still be clearer than empty interface.

Comment thread blueprints/refetch-invalidation.md Outdated
@rawlinp rawlinp added the blueprint feature requirements / implementation details label Jun 8, 2021
Taylor Frey added 4 commits June 14, 2021 10:56
RFC3339 for Time format
Explain impact for T3C
Explicitly set 'refresh' or 'refetch' for all API interaction
Minor formatting fixes
Updated InvdalidationJobInput fields
iIncorporate global Parameter check
Add clarity around parent vs origin
@rawlinp rawlinp merged commit dd1a333 into apache:master Jun 22, 2021
@tcfdev tcfdev mentioned this pull request Aug 18, 2021
4 tasks
@tcfdev tcfdev mentioned this pull request Nov 3, 2021
4 tasks
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

blueprint feature requirements / implementation details

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants