Skip to content

cgroup: fix panic in parse memory.numa_stat#2751

Closed
Ace-Tang wants to merge 1 commit into
opencontainers:masterfrom
Ace-Tang:ace
Closed

cgroup: fix panic in parse memory.numa_stat#2751
Ace-Tang wants to merge 1 commit into
opencontainers:masterfrom
Ace-Tang:ace

Conversation

@Ace-Tang
Copy link
Copy Markdown
Contributor

strings.SplitN not always return N fields if not staify, sometimes
cgroup interface add some custom fields make parse memory.numa_stat
fails, it will case panic

Signed-off-by: acetang aceapril@126.com

Fix panic in parse memory.numa_stat, since memory.numa_stat was add some custom fields that make splitN fail
image

Comment thread libcontainer/cgroups/fs/memory.go Outdated

for _, column := range columns {
pagesByNode := strings.SplitN(column, numaStatKeyValueSeparator, numaStatColumnSliceLength)
if len(pagesByNode) != 2 {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/2/numaStatColumnSliceLength/

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update~

strings.SplitN not always return N fields if not staify, sometimes
cgroup interface add some custom fields make parse memory.numa_stat
fails, it will case panic

Signed-off-by: acetang <aceapril@126.com>
@kolyshkin
Copy link
Copy Markdown
Contributor

@Ace-Tang can you show the contents of memory.numa_stat from that system?

For the record, here's kernel doc about memory.numa_stat (from https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt)

5.6 numa_stat

This is similar to numa_maps but operates on a per-memcg basis.  This is
useful for providing visibility into the numa locality information within
an memcg since the pages are allowed to be allocated from any physical
node.  One of the use cases is evaluating application performance by
combining this information with the application's CPU allocation.

Each memcg's numa_stat file includes "total", "file", "anon" and "unevictable"
per-node page counts including "hierarchical_<counter>" which sums up all
hierarchical children's values in addition to the memcg's own value.

The output format of memory.numa_stat is:

total=<total pages> N0=<node 0 pages> N1=<node 1 pages> ...
file=<total file pages> N0=<node 0 pages> N1=<node 1 pages> ...
anon=<total anon pages> N0=<node 0 pages> N1=<node 1 pages> ...
unevictable=<total anon pages> N0=<node 0 pages> N1=<node 1 pages> ...
hierarchical_<counter>=<counter pages> N0=<node 0 pages> N1=<node 1 pages> ...

The "total" count is sum of file + anon + unevictable.

The code was introduced by #2278, @iwankgb PTAL

@kolyshkin
Copy link
Copy Markdown
Contributor

kolyshkin commented Jan 19, 2021

On my (non-NUMA) system it's always 2 fields:
find /sys/fs/cgroup/ -name memory.numa_stat | xargs awk '{print NF}' | grep -v ^2$

Copy link
Copy Markdown
Contributor

@dqminh dqminh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check is reasonable. However, it would be nice if we can add a failing test.

@Ace-Tang
Copy link
Copy Markdown
Contributor Author

The memory.numa_stat interface changes was add by other guys, but it was used online, althrough the interface develop not follow kernel doc, fields is
image

@kolyshkin @dqminh

@kolyshkin
Copy link
Copy Markdown
Contributor

Ah, OK, I was not able to find these kernel patches in the wild -- the most similar one is "sched/numa: introduce per-cgroup NUMA locality info" from Alibaba's Michael Wang (which also adds some non-standard fields to memory.numa_stat). Another showcase of why text-based interfaces are not that good...

I took a look at the code and found that if someone ever adds some line that will contain = to memory.numa_stat file (e.g. a=b), the whole GetStats will return an error. Maybe it makes sense to make this code less fragile / more forgiving to unknown fields while we're at it. I'll look into it.

In the meantime, LGTM

Copy link
Copy Markdown
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kolyshkin
Copy link
Copy Markdown
Contributor

@dqminh can we merge this hotfix as it is? I'm working on more robust parser and will add more tests.

@kolyshkin
Copy link
Copy Markdown
Contributor

@dqminh can we merge this hotfix as it is? I'm working on more robust parser and will add more tests.

Alternatively, PTAL #2755 which has this fix, and more fixes and tests.

@dqminh
Copy link
Copy Markdown
Contributor

dqminh commented Jan 20, 2021

@kolyshkin lets go with #2755 . Thanks @Ace-Tang very much for your fix !

@dqminh dqminh closed this Jan 20, 2021
@Ace-Tang
Copy link
Copy Markdown
Contributor Author

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants