Skip to content

Conversation

@kamalesh-babulal
Copy link
Contributor

It was observed by Evgenii Shatokhin in #755, that when the RCU callback was called on the patched function, from unloaded livepatch module triggered a kernel crash.

This patch implements the approach on Powerpc outlined in #755. With -mcmodel=large, like any other data, function pointers are also loaded relative to the current TOC base and are populated as relocation entries in .toc section. Every function call/passing such function as pointers needs to access them through .toc section + offset.

The simplest approach would be to convert the .toc + offset relocation into a dynamic rela referring to the original function address but the same function is also a patched function. Thus, on PowerPC we need to append a new relocation entry into .rela.toc, duplicating the entry used as a function pointer and converting the newly added rela as dynamic rela.

Also, modify the function to load the new entry by modifying the addend value to newly created dynamic rela.

@kamalesh-babulal
Copy link
Contributor Author

The current patch applies on top of @euspectre patch. I have included the patch from @euspectre tree in this pull request for readability and testing purpose. I will redo the pull request, with my patch alone, in the final version.

toc_dynrela_off = malloc(sizeof(int) * (nr/2));
memset(toc_dynrela_off, 0, sizeof(*toc_dynrela_off));
off_idx = 0;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should check for malloc failure here: if (!toc_dynrela_off) ERROR("malloc") ... also isn't that memset going to only clear the first offset value (ie, toc_dynrela_off[0] and not toc_dynrela_off[0..n]) ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: also maybe the malloc should be sizeof(*toc_dynrela_off) * (nr/2) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. Will include the suggestions in next iteration.

@joe-lawrence
Copy link
Contributor

How much of the added code in kpatch_create_intermediate_sections() should be enclosed in #ifdef __powerpc__ blocks? (Or was this just a prototype for comment?)

@kamalesh-babulal
Copy link
Contributor Author

How much of the added code in kpatch_create_intermediate_sections() should be enclosed in #ifdef powerpc blocks? (Or was this just a prototype for comment?)

It's a prototype for comments. Changes introduced to kpatch_create_intermediate_sections are powerpc specific and should be guarded within #ifdef __powerpc__.

@jpoimboe
Copy link
Member

I'm confused about why we have to add a new .toc entry. Why can't we just convert those .toc.rela function ptr relas to dynrelas?

@kamalesh-babulal
Copy link
Contributor Author

kamalesh-babulal commented Mar 1, 2018

I'm confused about why we have to add a new .toc entry. Why can't we just convert those .toc.rela function ptr relas to dynrelas?

I got little mixed with details and implemented it with duplicating .toc rela. Yes, we do not need the duplicate entry and just converting the .toc to dynrela should be sufficient. I was testing the v2 of this patch (which just converts the .toc to dynrela), against integration test cases and debugging an issue with v2. Will update the finding soon.

@kamalesh-babulal
Copy link
Contributor Author

With #791, function_ptr_rela will need be modified, to something like:

static int function_ptr_rela(const struct rela *rela)
{
        const struct rela *rela1 = toc_rela(rela);
        return (rela1->sym->type == STT_FUNC &&
                   /* Skip switch table on PowerPC */
                   rela1->addend == rela1->sym->sym.st_value &&
                   (rela->type == R_X86_64_32S ||
                   rela->type == R_PPC64_TOC16_HA ||
                   rela->type == R_PPC64_TOC16_LO_DS));
}

Will wait for the comments on #791 and on this pull-request, before re-spinning v3.

continue;
}
}
#else
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The control flow is way too complex and hard to follow.

What if we add a rela->needs_dynrela bool, which gets set in the first loop which increments the nr variable? Then we could move the may_need_dynrela() call there, and that would hopefully vastly simplify this code (e.g., no need for the toc_dynrela_off array or the spaghetti code flow).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clever :). Will re-do the patch based on your suggestion.

@kamalesh-babulal
Copy link
Contributor Author

v3 Changes:

  • Drop the complex looping and array for tracking .toc offsets.
  • Introduce need_dynrela in relocation struct to decide on dynrela.
  • Don't carry const for toc_rela(), returned rela from toc_rela(), will be modified.

Copy link
Member

@jpoimboe jpoimboe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking much better now. I added some more comments.

Also, instead of merging, can you just cherry pick the #755 patch before yours? That way we'll be able to easily merge x86 and ppc support with this PR. Thanks.

}
if (may_need_dynrela(rela))
toc_rela(rela)->need_dynrela = 1;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lot better than before, but I'm still confused by this code. Why are all the toc special cases needed in function_ptr_rela(), may_need_rela(), and here? This code loops through all the sections, including .rela.toc., so can we just do something like

		list_for_each_entry(rela, &sec->relas, list) {
			nr++; /* upper bound on number of kpatch relas and symbols */
			if (may_need_dynrela(rela))
				rela->need_dynrela = 1;
		}

along with removing all the toc_rela() calls in function_ptr_rela() and may_need_rela()?

Copy link
Contributor Author

@kamalesh-babulal kamalesh-babulal Mar 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relocation section '.rela.toc' at offset 0xcc6b0 contains 46 entries:
...
0000000000000128  0000353400000026 R_PPC64_ADDR64         0000000000000000 genl_sk_destructing_waitq + 0
0000000000000130  0000003e00000026 R_PPC64_ADDR64         0000000000000000 .text.netlink_release + 35c
0000000000000138  0000002a00000026 R_PPC64_ADDR64         0000000000000000 .text.deferred_put_nlk_sk + 8

Relocation section '.rela.text.netlink_release' at offset 0xcadf0 contains 44 entries:
...
0000000000000390  000035580000000a R_PPC64_REL24          0000000000000000 __local_bh_enable_ip + 0
0000000000000398  0000007300000032 R_PPC64_TOC16_HA       0000000000000000 .toc + 138
00000000000003a0  0000007300000040 R_PPC64_TOC16_LO_DS    0000000000000000 .toc + 138

When looping with:

        list_for_each_entry(rela, &sec->relas, list) {
			nr++; /* upper bound on number of kpatch relas and symbols */
			if (may_need_dynrela(rela))
				rela->need_dynrela = 1;
		}

netlink_release + 0x3a0 is not a function symbol, but of relocation type (R_PPC64_TOC16_HA/R_PPC64_TOC16_LO_DS)
referring to symbol in .rela.toc + 0x138, We need to operate on both rela->type and rela->sym->sec->rela + rela->addend, where netlink_release + 0x3a0 is checked for rela-type and .text.deferred_put_nlk_sk + 8 for STT_FUNC/switch label, but the need_dynrela should be set only for the .rela.toc symbol.

Without toc_rela() netlink_release + 0x3a0 referring to .text.deferred_put_nlk_sk + 8 symbol's need_dynrela will not be set, failing the check for STT_FUNC in function_ptr_rela() and while parsing .rela.toc section it's not obvious, if .text.deferred_put_nlk_sk + 8 need_dynrela to be set, because we can not determine if it's function ptr.

Alternatively, the above loop can be re-written as:

        list_for_each_entry(rela, &sec->relas, list) {
            nr++; /* upper bound on number of kpatch relas and symbols */
            if (may_need_dynrela(rela))
                toc_rela(rela)->need_dynrela = 1;
        }

With an additional check in may_need_dynrela():

static int may_need_dynrela(const struct rela *rela)
{
        /* Don't over-ride need_dynrela flag if set, required for .toc sym*/
        if (rela->need_dynrela || !rela->sym->sec)
                return 1;
    ...
}

may_need_dynrela() and function_ptr_rela() operates on both rela and rela->sym->sec->rela. We might need to keep toc_rela() in both.

Copy link
Member

@jpoimboe jpoimboe Mar 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. Sorry, I keep forgetting that the R_PPC64_TOC16_HA relocation is for the original section, not the .toc. Can you add the above readelf snippet to a comment above the if (may_need_dynrela(rela)) line explaining why the toc_rela() is getting need_dynrela set? Otherwise I'll keep getting confused :-)

Can you also explain why you need to check rela->need_dynrela before setting toc_rela(rela)->need_dynrela? I'm trying to understand how it would be possible for a ".toc + xxx)" relocation to be converted to a dynrela.

Assuming we do really need that, how about doing the loop like this?

        list_for_each_entry(rela, &sec->relas, list) {
            nr++; /* upper bound on number of kpatch relas and symbols */
            if (!rela->need_dynrela && may_need_dynrela(rela))
                toc_rela(rela)->need_dynrela = 1;
        }

Also a comment for the !strchr(toc_rela(rela)->sym->name, '.') check in may_need_dynrela() would also be helpful.

Another thing that would make it easier for me to follow would be to rename rela1 to rela_toc in function_ptr_rela().

Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the if (may_need_dynrela(rela)) line explaining why the toc_rela() is getting need_dynrela set? Otherwise I'll keep getting confused :-)
Can you also explain why you need to check rela->need_dynrela before setting toc_rela(rela)->need_dynrela? I'm trying to understand how it would be possible for a ".toc + xxx)" relocation to be converted to a dynrela.

:) .rela.toc is superseded by .rela.text.* sections, those might have set the need_dynrela for entries in .rela.toc. Without the !rela->need_dynrela check, may_need_dynrela() for .rela.toc rela* will unset the need_dynrela flag, failing to rela->type check in function_ptr_rela().
.rela.toc rela type is always R_PPC64_ADDR64.

check in may_need_dynrela() would also be helpful.

function_ptr_rela(rela) will return 0, if the rela returned by toc_rela() is NULL. Duplicate check for NULL in may_need_dynrela(), can be avoided.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without the !rela->need_dynrela check, may_need_dynrela() for .rela.toc rela* will unset the need_dynrela flag, failing to rela->type check in function_ptr_rela().

Hm, but if it's already set, how would it get unset? To clarify, I'm talking about your rewritten version of the loop:

        list_for_each_entry(rela, &sec->relas, list) {
            nr++; /* upper bound on number of kpatch relas and symbols */
            if (may_need_dynrela(rela))
                toc_rela(rela)->need_dynrela = 1;
        }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry got little mixed up here, true it will not get unset. Having the check for need_dynrela,
if (!rela->need_dynrela && may_need_dynrela(rela)) will speed the handling of .rela.toc entries but not sure, if its worth the optimization. Arguably .rela.toc is only on PPC64le and an additional check for rela entries for non-toc sections might be an overkill. I will re-spin the patch with

list_for_each_entry(rela, &sec->relas, list) {
            nr++; /* upper bound on number of kpatch relas and symbols */
            if (may_need_dynrela(rela))
                toc_rela(rela)->need_dynrela = 1;
        }

Thanks.

int addend;
int offset;
char *string;
int need_dynrela;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this a bool?

rela->type = GELF_R_TYPE(rela->rela.r_info);
rela->addend = rela->rela.r_addend;
rela->offset = rela->rela.r_offset;
rela->need_dynrela = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this explicit initialization, we already memset the rela struct to zero in ALLOC_LINK.

@kamalesh-babulal
Copy link
Contributor Author

kamalesh-babulal commented Mar 19, 2018

v4 Changes:

  • Incorporated review comments from Josh.
    • Remove the check for .rela.toc section relas.
    • Add PowerPC specific comments for toc_rela() usage.
    • Change rela->need_dynrela from int to bool.

* if exist, to check for nested functions instead of rela->sym.
* Referenced sym holds the function name and rela->name holds
* reference section name.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this comment is really needed. What I really wanted was a comment describing why it's doing the strchr() call with a check for the '.' character in the symbol name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this comment is really needed. What I really wanted was a comment describing why it's doing the strchr() call with a check for the '.' character in the symbol name.

Sorry if I have got it wrong, last paragraph from Evgenii comments above function_ptr_rela() explains why symbol name strchr() is checked for '.', Should that paragraph be moved here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. Yes, I believe that last paragraph belongs here.

Evgenii Shatokhin and others added 2 commits March 21, 2018 09:01
…ctly

A kernel crash happened in __do_softirq() in very rare cases when the
binary patch created from mainline commit be82485fbcbb
("netlink: fix an use-after-free issue for nlk groups") was unloaded.

Investigation has shown that the kernel tried to execute an RCU
callback, deferred_put_nlk_sk(), defined in the patch module after the
module had been unloaded.

The callback was set by the patched variant of netlink_release() and
the address of the patched deferred_put_nlk_sk() was used, rather than
the address of the original function.

Similar problems occur with workqueue functions as well.

As suggested in
dynup#755 (comment),
create-diff-object was modified so that the addresses of the original
functions were used in such situations, at least for x86 systems. A
similar fix for PowerPC was added as well.

Changes in v4:

* '#ifdef __x86_64__' was removed. It is not actually needed right now
because the constants for relocation types are different on different
architectures.

Changes in v3:

* Minor refactoring and a comment explaining what this all is about.
Quite lengthy, but the dynrela-related code is really far from obvious.

Changes in v2:

* Handle the nested functions the same way as before, because they are
unlikely to be used as asynchronous callbacks. Example: cmp() in
bch_cache_show() from drivers/md/bcache/sysfs.c in the kernel 4.4.

As the nested functions are local to the functions they are defined in,
the compiler names them in a similar way to static locals: <name>.<number>.
Currently, we filter out all functions with '.' in their names. If there
are any asynchronous callbacks in the kernel that have a dot in their
names too, they could be handled in the future patches. It is unclear
though, if the callbacks with such names can appear in the kernel.

Signed-off-by: Evgenii Shatokhin <eshatokhin@virtuozzo.com>
It was observed by Evgenii Shatokhin in PR#755, that when the RCU
callback was called on the patched function, from unloaded livepatch
module triggered a kernel crash.

This patch implements the approach on PowerPC outlined in PR#755.
With -mcmodel=large, like any other data, function pointers are also
loaded relative to the current TOC base and are populated as
relocation entries in .toc section. Every function passing a function
pointer as the argument need to load the function address through
.toc section + offset. Convert such .toc + offset relocation into
a dynamic rela, which resolves to original function address, during
module load.

Also move the comment related to nested function check, into
may_need_dynrela().

Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Evgenii Shatokhin <eshatokhin@virtuozzo.com>
Cc: Joe Lawrence <jdl1291@gmail.com>
Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
@kamalesh-babulal
Copy link
Contributor Author

v5 Changes:

  • Replaced PowerPC specific comment related to nested function, with the original comments on why the check is required, as suggested by Josh.

Copy link
Member

@jpoimboe jpoimboe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks good. kpatch_create_intermediate_sections() could use some more cleanups, but I'll propose those in a separate PR.

@joe-lawrence
Copy link
Contributor

👍

@kamalesh-babulal , I went reading back through @euspectre 's PR #755 , and remembered that I had posted this integration patch and test:

function-pointer.patch.txt
function-pointer.test.txt

This should be applicable to PowerPC implementation as well, right? If so, I can add this test in a follow-up PR.

@jpoimboe
Copy link
Member

I'm going to be out next week. Here's my (rough, untested) WIP for refactoring the dynrela code in case anybody's interested: https://github.com/jpoimboe/kpatch/tree/dynrela . I'll try to pick it up again in April.

@kamalesh-babulal
Copy link
Contributor Author

This should be applicable to PowerPC implementation as well, right? If so, I can add this test in a follow-up PR.

@joe-lawrence Yes, It will test the functionality on PowerPC too. Please add this test case to the integration tests. I have been testing this PR with the kernel commit be82485fbcb mentioned in the first comment of PR #755. Can you please merge this PR, so I can rebase and push the patch for PR #793 and other refactoring code too.

I'm going to be out next week. Here's my (rough, untested) WIP for refactoring the dynrela code in case anybody's interested: https://github.com/jpoimboe/kpatch/tree/dynrela . I'll try to pick it up again in April.

@jpoimboe I will try and review/test the patch next week.

@joe-lawrence
Copy link
Contributor

@kamalesh-babulal - okay, sounds good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants