Skip to content

Conversation

@PeterSolMS
Copy link
Contributor

The hypothesis is that when there were no pinned plugs in this GC, saved_pinned_plug_index will be 0, but it's erroneous to refer to pinned_plug_of (saved_pinned_plug_index) in this case.

The fix is to initialize saved_pinned_plug_index to an invalid value instead of 0, and to only refer to pinned_plug_of (saved_pinned_plug_index) if saved_pinned_plug_index has indeed been set in this GC.

It is yet unclear whether the observed AVs are due to this issue, so add instrumentation to narrow down the issue further.

@ghost ghost added the area-GC-coreclr label Oct 19, 2021
@ghost
Copy link

ghost commented Oct 19, 2021

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

The hypothesis is that when there were no pinned plugs in this GC, saved_pinned_plug_index will be 0, but it's erroneous to refer to pinned_plug_of (saved_pinned_plug_index) in this case.

The fix is to initialize saved_pinned_plug_index to an invalid value instead of 0, and to only refer to pinned_plug_of (saved_pinned_plug_index) if saved_pinned_plug_index has indeed been set in this GC.

It is yet unclear whether the observed AVs are due to this issue, so add instrumentation to narrow down the issue further.

Author: PeterSolMS
Assignees: -
Labels:

area-GC-coreclr

Milestone: -

#endif // MAX_LONGPATH

//#define TRACE_GC
#define TRACE_GC
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

guess this needs to be undone?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, this is only for the private build... Peter wanted to log these particular dprintfs for instrumentation in the stress log so we can get them in the dump file.

// to do so.
//#define dprintf(l, x)
#define dprintf(l,x) STRESS_LOG_VA(l,x);
#define dprintf(l,x) {if ((l == 1) || (l == 5555)) {STRESS_LOG_VA(l,x);}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

@Maoni0
Copy link
Member

Maoni0 commented Oct 20, 2021

Peter, so I have the following suggestions for instrumentation -

  1. instead of dprintf 1 and 5555, you could change the dprintf in do_pre_gc and do_post_gc to 5555, there are other things that we log at level 1 (like the amount of different kinds of allocs we do in a gen1 GC (the "dprintf (1, ("older gen's free alloc: %Id->%Id, seg alloc: %Id->%Id, condemned alloc: %Id->%Id"," and etc), those don't seem useful for this purpose.

  2. you could actually save the full pinned plugs info in each gen1 GC - BTW, did you happen to verify that this always happens right after a gen1 GC? I presumed that was the case. in the dump I looked at none of the heaps needed to expand the mark array so far so they all stayed as 0x400 which means each heap's mark stack is < 150k. so we could just save another mark stack array per heap in each gen1 GC. you can do this in compact_phase after we call reset_pinned_queue_bos(). at this point you could copy the mark stack to the saved one. remember to save mark_stack_tos so we know where it ends. in fact, you could even keep multiple copies of the mark stack, as in you could save at multiple stages in that GC, eg, you could also save a copy at the end of compact phase (after we call recover_saved_pinned_info). you could of course do this approach completely without using dprintf's at all. in adjust_limit where you are enabling dprintfs, you could also save that info into a (growable) array on that heap, eg, have a struct that looks like

struct saved_short_obj_in_ac
{
    uint8_t* alloc_ptr;
    uint8_t* alloc_limit;
    uint8_t* old_loc,;
    uint8_t* filler_free_obj_size_location;
    uint8_t* compact_bit_location;
    // whatever else you'd like here
};

this would include all the places where we need to do this filler object but there'd doubtful be many of these (there could be more than the # of pinned plugs though). but the heap is already >10GB so this kind of mem increase is tolerable.

- reduce diagnostic output
- save pinned queue right when compact_phase starts
@PeterSolMS
Copy link
Contributor Author

@Maoni0: I implemented your suggestion 1 and part of your suggestion 2 (saving the pinned plugs).

I did not save additional info in adjust_limit, because that is already in the dprintfs and so the additional risk didn't seem justified.

I looked at the dumps again and at the time of the crash we are either:

  • in a gen 0 GC with a preceding gen 1 GC
  • in a background GC with a preceding gen 1 GC

@Maoni0
Copy link
Member

Maoni0 commented Oct 20, 2021

I did not save additional info in adjust_limit, because that is already in the dprintfs and so the additional risk didn't seem justified.

that's fine; that was a suggestion for if you want to completely eliminate the dprintfs (since they do require setting some configs).

do you want to save the pinned plugs multiple times like I mentioned? might as well since it takes up so little memory. you could do this again after recover_saved_pinned_info because we do update the pinned plug info in compact_phase. other than that it LGTM.

@PeterSolMS
Copy link
Contributor Author

Ok, changed the logic to save before and after compact_phase. While I was at it, I also saved the info for multiple gen 1 GCs because it was easy to do and might turn out to be useful if the problem is not in the very last gen 1 GC after all.

@PeterSolMS PeterSolMS closed this Feb 15, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Mar 17, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants