Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Correct iterator in JIT#24160

Merged
CarolEidt merged 1 commit into
dotnet:masterfrom
franksinankaya:gcc_cleanup_19
Apr 22, 2019
Merged

Correct iterator in JIT#24160
CarolEidt merged 1 commit into
dotnet:masterfrom
franksinankaya:gcc_cleanup_19

Conversation

@franksinankaya
Copy link
Copy Markdown

@franksinankaya franksinankaya commented Apr 22, 2019

Correct iterator in genNumberOperandUse as operand is both used for loop value and also for finding the Operands()

Use of uninitialised value of size 8
GenTree::OperGet() const (gentree.h:366)
GenTreeUseEdgeIterator::GenTreeUseEdgeIterator(GenTree*) (gentree.cpp:8394)
GenTreeOperandIterator::GenTreeOperandIterator(GenTree*) (gentree.h:2291)
GenTree::OperandsBegin() (gentree.cpp:8977)
GenTree::Operands() (gentree.cpp:8987)
CodeGen::genNumberOperandUse(GenTree*, int&) const (codegenlinear.cpp:1209)
CodeGen::genCodeForBBlist() (codegenlinear.cpp:405)
CodeGen::genGenerateCode(void**, unsigned int*) (codegencommon.cpp:2147)

@franksinankaya
Copy link
Copy Markdown
Author

@am11 @janvorli @jkotas

@RussKeldorph
Copy link
Copy Markdown

@dotnet/jit-contrib

Copy link
Copy Markdown

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks!

@CarolEidt
Copy link
Copy Markdown

There are 4 sub-legs of the "Test Pri0 Linux_musl x64 release" that failed with a message like:

running $HELIX_CORRELATION_PAYLOAD/scripts/791cfe552e6f46709583de5f8b9eed97/execute.sh in /home/helixbot/work/d55db4b4-dd9b-4765-bebd-992b2109b242/Work/f37d0d0a-5e50-4c80-9137-1d788f5dc53f/Exec max 1800 seconds

I assume that this rather obtuse message means that it timed out.
cc @dotnet/dnceng

And I don't see a need to wait for the two "arm" legs. Will merge soon unless someone thinks it should wait.

@MattGal
Copy link
Copy Markdown
Member

MattGal commented Apr 22, 2019

@CarolEidt this isn't a (workitem) timeout, if you check out the log underneath that log, there's this:

2019-04-22 04:39:23,333: INFO: dockerhelper(215): write_commands_to_file: Generating Docker execution script
2019-04-22 04:39:33,352: INFO: servicebusrepository(85): renew_workitem_lock: Entering renew_workitem_lock for https://nethelix.servicebus.windows.net/ubuntu.1604.amd64.open/messages/85385471/29daee39-dae6-4fc0-989b-821a30993a3d
2019-04-22 04:39:33,468: INFO: saferequests(90): request_with_retry: Response complete with status code '200'
2019-04-22 04:39:33,469: INFO: servicebusrepository(95): renew_workitem_lock: Renewed work item lock. Status Code: 200
2019-04-22 04:40:18,469: INFO: servicebusrepository(85): renew_workitem_lock: Entering renew_workitem_lock for https://nethelix.servicebus.windows.net/ubuntu.1604.amd64.open/messages/85385471/29daee39-dae6-4fc0-989b-821a30993a3d
2019-04-22 04:40:18,623: INFO: saferequests(90): request_with_retry: Response complete with status code '200'
2019-04-22 04:40:18,624: INFO: servicebusrepository(95): renew_workitem_lock: Renewed work item lock. Status Code: 200
2019-04-22 04:40:23,395: ERROR: executor(521): _execute_command_in_container: Exception: UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=60)
2019-04-22 04:40:23,395: INFO: executor(523): _execute_command_in_container: Finished _execute_command_in_container, exit code: None

... so whatever happened on this machine, the timeout waiting to hear back from the docker daemon occurred in 1 minute in.

It's not something I've seen before, and we can certainly increase this timeout setting on the machines... unsure if it will help:

  • This reproduced on different machines every time it ran, so it's unlikely to be a corrupted machine state (though it might be a "when some other test runs first and messes up the docker daemon" state)
  • It looks similar to UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=60) docker/compose#3927, which seems to be related to large files being mounted.
  • Looking at the payloads they're not terribly big (biggest is ~360 MB, definitely on the larger size of Helix payloads but seemingly normal )
  • Just a guess but it seems plausible that severe CPU loading might cause the same sort of problem, as tests with names like "JIT...." can be CPU intensive.

My thoughts here are if you see this reproduce outside this PR, let me know and we'll do a longer investigation; in the mean time I think the best thing to try is to pull the same docker image and run the same test locally to see if it does anything exceptional in this case.

Copy link
Copy Markdown

@briansull briansull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks Good

@CarolEidt CarolEidt merged commit 9401fa6 into dotnet:master Apr 22, 2019
@franksinankaya franksinankaya deleted the gcc_cleanup_19 branch April 22, 2019 17:09
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants