Skip to content
This repository was archived by the owner on Aug 31, 2023. It is now read-only.

Conversation

@Atry
Copy link
Contributor

@Atry Atry commented Jan 28, 2022

According to https://aws.amazon.com/ec2/instance-types/, r6i.4xlarge doubles the memory in comparison to m5.4xlarge. Hopefully it would reduce the out of memory errors.

@Atry
Copy link
Contributor Author

Atry commented Jan 29, 2022

I have deployed the step functions for test

@Atry
Copy link
Contributor Author

Atry commented Jan 29, 2022

I triggered a previously failed build bin/build-on-aws 2022.01.21 debian-11-bullseye for testing this PR.

@Atry
Copy link
Contributor Author

Atry commented Jan 29, 2022

Sorry, deploying the step functions do not take effect.

I just deployed the lambdas instead, and triggered a previously failed build bin/build-on-aws 2022.01.20 debian-11-bullseye for testing this PR.

@Atry
Copy link
Contributor Author

Atry commented Jan 29, 2022

I triggered a previously failed build bin/build-on-aws 2022.01.21 debian-11-bullseye for testing this PR.

The retry instance type is r6i.4xlarge since I have deployed new lambdas.

截屏2022-01-28 下午5 46 14

@fredemmott
Copy link
Contributor

Nice, the m* used to be the high memory tier, and I assumed that had continued

Given we don't need these jobs to be super fast, we should also try lower options in the r6i range - changing the cores:RAM usage should fix the problem - it's not just a matter of more RAM.

Worth trying r6i.xlarge, and seeing what the build times are like.

--

As a concrete example, we were building with 32GB up until August last year (a4c60f8) until some OOMs started - doubling the cores and doubling the RAM did not fix the problem.

@Atry
Copy link
Contributor Author

Atry commented Jan 29, 2022

截屏2022-01-28 下午7 39 44

There is one retry attempt for the nightly build. The number of retry attempts is less than previous nightly builds.

@Atry
Copy link
Contributor Author

Atry commented Jan 29, 2022

Currently we limited the concurrency in #260.

Let's revert #260 for better CPU utilization

@fredemmott
Copy link
Contributor

There is one retry attempt for the nightly build. The number of retry attempts is less than previous nightly builds.

If it still needs an OOM retry, the problem exists; sometimes, with the previous settings, it succeeds with 0 retries. "Fewer retries" is extremely low signal.

I think the next step is to enable atop to figure out what's going on; I suspect cargo may be parrelising to NCORES in combination with HHVM's own parellization, but we need more data

@Atry
Copy link
Contributor Author

Atry commented Feb 15, 2022

Even though it's a weak signal, we did find less failures recently. Shall we merge this PR?

@Atry Atry requested a review from fredemmott February 15, 2022 00:15
@fredemmott fredemmott merged commit e4c3de7 into master Feb 18, 2022
@Atry Atry deleted the r6i.4xlarge branch February 18, 2022 23:22
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants