Skip to content

stats: Configure default max_obj_name_length at compile time#2768

Closed
rlenglet wants to merge 1 commit intoenvoyproxy:masterfrom
rlenglet:fix-default-max-obj-name-len
Closed

stats: Configure default max_obj_name_length at compile time#2768
rlenglet wants to merge 1 commit intoenvoyproxy:masterfrom
rlenglet:fix-default-max-obj-name-len

Conversation

@rlenglet
Copy link
Copy Markdown
Contributor

@rlenglet rlenglet commented Mar 9, 2018

stats: Configure default max_obj_name_length at compile time

DEFAULT_MAX_OBJ_NAME_LENGTH was still hard-coded to 60.
Set it to ENVOY_DEFAULT_MAX_OBJ_NAME_LENGTH to make it configurable
at compile time.

Signed-off-by: Romain Lenglet romain@covalent.io
Risk Level: Low

DEFAULT_MAX_OBJ_NAME_LENGTH was still hard-coded to 60.
Set it to ENVOY_DEFAULT_MAX_OBJ_NAME_LENGTH to make it configurable
at compile time.

Signed-off-by: Romain Lenglet <romain@covalent.io>
@rlenglet
Copy link
Copy Markdown
Contributor Author

rlenglet commented Mar 9, 2018

Istio always passes --max-obj-name-len 189, which systematically triggered the assert.
This change allows really setting the default to 189 at compile time, which makes it usable with Istio.

Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I don't think we need a test here, since it would involve a distinct build target in CI, so seems reasonable. Will leave for @ggreenway to merge in case he has any additional input.

@ggreenway
Copy link
Copy Markdown
Member

I'm missing a piece of why this is needed. As I understand it, 'DEFAULT_MAX_OBJ_NAME_LENGTH' is only used by the tests. The value used by any non-test comes from the command-line options, and the default is already set at compile-time. So I don't understand what issue is being fixed here.

@rlenglet
Copy link
Copy Markdown
Contributor Author

rlenglet commented Mar 12, 2018

@ggreenway
Istio's pilot-agent passes the --max-obj-name-len 189 option, which systematically triggers this assert:

RELEASE_ASSERT(max_obj_name_length == configured);

DEFAULT_MAX_OBJ_NAME_LENGTH is used as the default value in RawStatData::maxObjNameLength(). Not just in tests. This means that RawStatData::maxObjNameLength() is somehow called before RawStatData::configure() is called with the option passed on the command line.
That's a bug in itself.

But the other bug, which is fixed in this PR, is that DEFAULT_MAX_OBJ_NAME_LENGTH was not matching ENVOY_DEFAULT_MAX_OBJ_NAME_LENGTH.

@ggreenway
Copy link
Copy Markdown
Member

I think the bug is that RawStatData::maxObjNameLength() is called before it has been configured; I do not think this is the correct fix. Can you please get a backtrace from that RELEASE_ASSERT triggering? I suspect there is a startup ordering problem.

@rlenglet
Copy link
Copy Markdown
Contributor Author

I've not been able to get a backtrace yet.

@rlenglet
Copy link
Copy Markdown
Contributor Author

The problem is that the backtrace would be useless. We need to know what calls RawStatData::maxObjNameLength() before configure() triggers the assert.

@ggreenway
Copy link
Copy Markdown
Member

Oh right, you're correct that the backtrace is useless.

Here's how I would debug:

Add a static bool 'configured_' next to 'max_obj_name_length'. Set it to true in RawStatData::configure(). Then add 'RELEASE_ASSERT(configured_)' in RawStatData::maxObjNameLength(). This build will fail tests, but ignore that. Run it in your environment and get a backtrace from that, and we should get the answer of what is calling it too early.

@ggreenway
Copy link
Copy Markdown
Member

Also, regarding the change in this PR, my clarified objection:

The bug is that both the compile-time and CLI option are not working for setting the max name length. The change in this PR only fixes the compile-time option; I'm pretty sure the CLI option would still not work. I'd like to fix both.

@rlenglet
Copy link
Copy Markdown
Contributor Author

rlenglet commented Mar 12, 2018

I can also try doing this, temporarily:

struct RawStatData {
...
  static size_t maxObjNameLength() {
    size_t res = initializeAndGetMutableMaxObjNameLength(0);
    RELEASE_ASSERT(res > 0);
    return res
  }
...
}

My problem is that I've not able to get backtraces with the other assert, even with a debug build. I need to look deeper into that.

@ggreenway
Copy link
Copy Markdown
Member

Another thought on this issue:

Right now, only users that use a non-default value for this CLI option will hit this issue. If we changed the default value of DEFAULT_MAX_OBJ_NAME_LENGTH to be different from the CLI option, this issue would have been detected sooner, and hopefully tracked back to the commit that introduced the problem very quickly.

@rlenglet
Copy link
Copy Markdown
Contributor Author

Interesting. I can't reproduce the issue with --max-obj-name-len 189 option triggering the assert in master's HEAD. I had found this problem when I branched off commit 508c7baefc5fb8724c412b9ba05d80d79785aea2. So this bug has been fixed between that commit and the current head.

This leaves us with the bug of configuring DEFAULT_MAX_OBJ_NAME_LENGTH != ENVOY_DEFAULT_MAX_OBJ_NAME_LENGTH?

@ggreenway
Copy link
Copy Markdown
Member

DEFAULT_MAX_OBJ_NAME_LENGTH should only matter for the tests, because we always configure it with the CLI value, even if the default CLI value is used (which can be set at compile time). So I don't think any change is needed, if this is now working for you against HEAD.

@rlenglet
Copy link
Copy Markdown
Contributor Author

FYI, here's the stacktrace for the call to maxObjNameLength() before it's configured, in that commit:

  #4 Envoy::Stats::RawStatData::maxObjNameLength() at stats_impl.h:125 (discriminator 1)
  #5 Envoy::Stats::RawStatData::maxNameLength() at stats_impl.h:117
  #6 Envoy::Stats::RawStatData::nameSize() at stats_impl.h:138
  #7 Envoy::Stats::RawStatData::size() at stats_impl.cc:34
  #8 Envoy::SharedMemoryHashSet<Envoy::Stats::RawStatData>::cellOffset(unsigned int) at shared_memory_hash_set.h:327
  #9 Envoy::SharedMemoryHashSet<Envoy::Stats::RawStatData>::numBytes(Envoy::SharedMemoryHashSetOptions const&) at shared_memory_hash_set.h:85
  #10 Envoy::Server::HotRestartImpl::HotRestartImpl(Envoy::Server::Options&) at hot_restart_impl.cc:119 (discriminator 1)
  #11 Envoy::MainCommonBase::MainCommonBase(Envoy::OptionsImpl&, bool) at main_common.cc:49
  #12 Envoy::MainCommon::MainCommon(int, char**, bool) at main_common.cc:92 (discriminator 2)
  #13 std::_MakeUniq<Envoy::MainCommon>::__single_object std::make_unique<Envoy::MainCommon, int&, char**&, bool const&>(int&, char**&, bool const&) at unique_ptr.h:765
  #14 main at main.cc:25

@ggreenway
Copy link
Copy Markdown
Member

ggreenway commented Mar 12, 2018

That makes sense in the commit you referenced. The hot restarter is always created before the call to RawStatData::configure() in that commit. It was fixed later.

Broken version (which you had in your fork):
https://github.com/envoyproxy/envoy/blob/508c7baefc5fb8724c412b9ba05d80d79785aea2/source/exe/main_common.cc

Fixed version:
https://github.com/envoyproxy/envoy/blob/master/source/exe/main_common.cc#L46

@ggreenway
Copy link
Copy Markdown
Member

@rlenglet I think there is nothing to fix here. Are you ok with closing this PR?

@rlenglet
Copy link
Copy Markdown
Contributor Author

Sounds good. Thanks!

@ggreenway
Copy link
Copy Markdown
Member

Also, I tracked down where this got fixed. It was in commit b8ddf40 in unrelated PR #2701.

@ggreenway ggreenway closed this Mar 12, 2018
@rlenglet rlenglet deleted the fix-default-max-obj-name-len branch March 12, 2018 19:48
rlenglet added a commit to cilium/cilium that referenced this pull request Mar 12, 2018
envoyproxy/envoy#2768

Signed-off-by: Romain Lenglet <romain@covalent.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants