Skip to content

cachekey: capture cache key elements from headers#3995

Merged
gtenev merged 1 commit intoapache:masterfrom
gtenev:cachekey_caputure_elements
Aug 14, 2018
Merged

cachekey: capture cache key elements from headers#3995
gtenev merged 1 commit intoapache:masterfrom
gtenev:cachekey_caputure_elements

Conversation

@gtenev
Copy link
Copy Markdown
Contributor

@gtenev gtenev commented Jul 20, 2018

--capture-header=<headername>:<capture_definition>
captures elements from header <headername> using <capture_definition>
and adds them to the cache key.

@gtenev gtenev added the Plugins label Jul 20, 2018
@gtenev gtenev added this to the 9.0.0 milestone Jul 20, 2018
@gtenev gtenev self-assigned this Jul 20, 2018
Comment thread plugins/cachekey/configs.cc Outdated

static void
setPattern(MultiPattern &multiPattern, const char *arg)
setPattern(MultiPattern *multiPattern, const char *arg)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code doesn't handle the multiPattern == nullptr case. If you assume it is never null, then why not leave it as a reference?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, makes sense.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the new callers gets multiPattern from a value in a std::map. We cannot store references as values in a map, so the signature here was changed to use pointers instead. We might want to add the null-check just to be safe against future changes to the code.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@d2r I believe we can go either way, that is why I did not think it is worth the time arguing.

Initially I wrote it as a reference for the exact reasons mentioned by @a-canary.

But you are right too, the new caller gets a pointer value from the map and this is why I changed to to a pointer. Well, it might be also because I don't like dereferencing a pointer when passing by reference much (may be because I learned C before C++ 😊) but the same way I also don't care for certain colors that others like a lot.

The main point is to make sure we don't dereference nullptr. It can be checked in the caller, in the called function or both. Two callers will never dereference nullptr by nature, for the one who "can try to do it" I made sure it doesn't (otherwise I have a bug)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. I was clarifying for @a-canary why it is the signature changed from reference to pointer. I think we both suggested a null-check. If instead we want the caller to ensure multiPattern is not null, then we can just add a small code comment saying multiPattern is assumed to be a non-null value or something like that.

{
bool res = false;
for (auto p : this->_list) {
if (nullptr != p && p->process(subject, result)) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you comment why this is calling Pattern::process() instead of Pattern::capture(). Its unclear to me.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pattern::process() supports capture or capture-and-replace depending on whether a replacement string is specified.

It seemed to me that there might be much more different ways to process a MultiPattern than a Pattern so I thought I would call the corresponding method capture() instead of process() which would be closer to what it does but this seems to be confusing people in a different way.

I will change the MultiPattern::capture() to MultiPattern::process()

Comment thread plugins/cachekey/pattern.cc Outdated
}

bool
MultiPattern::capture(const String subject, StringVector &result) const
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use const String& or StringView?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, it is meant to be const String&

}
}

template <class T>
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think T is any std::container<std:string>. Can you clarify that in comments or param name.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this code this is true, I can see the benefit of the clarification but I am not sure we should limit things. Also this is a local template which helped me process the headers + taking 2 different actions w/o duplicating code and it is not meant to be reused a lot (not part of the "public" interface of the class, etc). If you look through out the plugin code you can see I like lots of comments but it feels overkill in this case.

@a-canary
Copy link
Copy Markdown
Collaborator

a-canary commented Aug 9, 2018

Is that a plan to add Catch or Au tests for this feature?

Copy link
Copy Markdown
Contributor

@d2r d2r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same questions as @a-canary .

Comment thread doc/admin-guide/plugins/cachekey.en.rst Outdated

Cache key would be set to::

/example-cdn.com/80/ClientID:MKIARYMOG51PTCKQ0DLD/path/file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be /clientID:... (lower 'c') in this example?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, fixed!

template <class T>
void processHeader(const String &name_s, const ConfigHeaders &config, T &dst,
void (*fun)(const ConfigHeaders &config, const String &name_s, const String &value_s, T &captures));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is only a helper for the implementation, maybe we don't need to declare it here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO there is nothing wrong in having helper functions as private members of a class. I usually try to have them localized to the unit where they are used but sometimes there are lots of class members used by the function and in this case it feels that the usage would become cumbersome (lots of parameter passing, more chances for mistakes too).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree. Just wondering why we need to add its declaration to a header if no one else is using it. We can leave it if there we want.

Comment thread plugins/cachekey/configs.cc Outdated

static void
setPattern(MultiPattern &multiPattern, const char *arg)
setPattern(MultiPattern *multiPattern, const char *arg)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the new callers gets multiPattern from a value in a std::map. We cannot store references as values in a map, so the signature here was changed to use pointers instead. We might want to add the null-check just to be safe against future changes to the code.

@gtenev
Copy link
Copy Markdown
Contributor Author

gtenev commented Aug 14, 2018

@a-canary, @d2r, literally all cachekey features were added "test-first" (TDD) until the point where we killed TSQA, the fixes and features that were added after that time (which is relatively small set, may be < 5%) don't have corresponding tests. I had plans but never had a chance to convert the test-bench to AuTest.

Now after the cachekey TSQA tests were removed from the source tree, no-one new will know that they exist and it seems getting the TDD back on track is stuck with me, will try to do it soon.

Copy link
Copy Markdown
Contributor

@d2r d2r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review pass done. Couple of minor questions.

As to tests, it would be nice to have them but seeing as we currently do not have any coverage here it could be done in future work.

captures.insert(header);
CacheKeyDebug("adding header '%s: %s'", name.c_str(), value.c_str());
} else {
CacheKeyDebug("failed to find header '%s'", name.c_str());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How noisy will this be? Will this print every time there is a header that is not captured? Maybe it helps enough in debugging that it is worth it. Just don't want to roll logs too frequently if we turn on debugging here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be ok, let's leave it (we can always trim). Here is why I think it should be ok...

When one wants a header to be included or captured from the request into the cache key one would most likely expect the header or the capture to result in non-empty string on all requests, otherwise the cache key would become somehow unstructured (missing elements based on the traffic). So headers specified in --include-header and --capture-header are usually expected in the requests.

For efficiency reason the plugin does not iterate over all headers in the request (at least when the plugin was developed it was not efficient to iterate over headers in ATS, as far as I know it is still the case, see the comments in the code for more details).

The plugin iterates over the --include-header and --capture-header lists instead which in the usual use case will not have big number of headers (compared to the number of header fields in the request). Usually less then 5-10 parameters, 1 or 2 most of the time.

So the noise is not expected to be huge and the thinking is that while debugging the operator should know that an expected header or header capture was not found.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK sounds good. Agree we can trim it later if it is an issue.

itMp->second->capture(value, captures);
CacheKeyDebug("found capture pattern for header '%s'", name.c_str());
} else {
CacheKeyDebug("failed to find header '%s'", name.c_str());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above: Will this get too noisy?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@d2r please see my comment above

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

}

String value_s(value, vlen);
if (!config.toBeRemoved() && !config.toBeSkipped()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was the original logic reversed and the early return removed here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before adding --capture-header list there was only code that handled --include-header list. toBeRemoved() and toBeSkipped() are associated with --capture-header list processing so exiting the function earlier will not give --capture-header list related code to run.

Beforehand considered various ways to add the new --capture-header functionality (parameters/design-wise) and it seems there might be a moment in time when I will redesign or refactor things to generalize various lists handling in the plugin but with this implementation I think we are still in a good shape (unless we find something problematic or not performing well).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me. Thanks.

@d2r
Copy link
Copy Markdown
Contributor

d2r commented Aug 14, 2018

Now after the cachekey TSQA tests were removed from the source tree, no-one new will know that they exist and it seems getting the TDD back on track is stuck with me, will try to do it soon.

@gtenev I did not know about this! It seems we perhaps made a mistake abandoning useful tests without a replacement. Thanks for the note. I mentioned earlier, the test coverage would be nice, but I don't want to block this PR on it.

--capture-header=<headername>:<capture_definition>
captures elements from header <headername> using <capture_definition>
and adds them to the cache key.
@gtenev gtenev requested a review from d2r August 14, 2018 20:59
Copy link
Copy Markdown
Contributor

@d2r d2r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look OK to me. @a-canary ?

@gtenev
Copy link
Copy Markdown
Contributor Author

gtenev commented Aug 14, 2018

[approve ci autest]

Copy link
Copy Markdown
Collaborator

@a-canary a-canary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@gtenev
Copy link
Copy Markdown
Contributor Author

gtenev commented Aug 14, 2018

[approve ci autest]

@gtenev gtenev merged commit d49271b into apache:master Aug 14, 2018
@gtenev gtenev deleted the cachekey_caputure_elements branch August 14, 2018 22:08
@d2r d2r mentioned this pull request Aug 14, 2018
@bryancall bryancall modified the milestones: 9.0.0, 8.0.0 Aug 31, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants