Fix armv7 build by updating the base container and pin it #17063

larroy · 2019-12-13T19:35:07Z

larroy · 2019-12-13T19:37:27Z

leezu · 2019-12-14T00:05:00Z

Thanks @larroy. The same update needs to be done for the other arches as the failure mode affects all arches. Either in this PR or separate PRs.

leezu · 2019-12-14T00:16:52Z

ci/docker/Dockerfile.build.armv7

+# The container is pinned for preventing CI failures on updates, swap below to use 
+# the upstream container
+#FROM dockcross/linux-armv7
+FROM mxnetci/dockcross-linux-armv7-pinned


Should this container be given another tag besides latest https://hub.docker.com/r/mxnetci/dockcross-linux-armv7-pinned/tags

What do you mean?

If you only specify the name mxnetci/dockcross-linux-armv7-pinned, instead of mxnetci/dockcross-linux-armv7-pinned:TAG, docker will always pull the version tagged as latest: mxnetci/dockcross-linux-armv7-pinned:latest.
But that version changes whenever someone pushes a new version of dockcross-linux-armv7-pinned to mxnetci. So it becomes impossible to test if an update of dockcross-linux-armv7-pinned breaks the CI. Instead all CI runs will be changed immediately.

If instead you add a tag TAG for the particular version of the mxnetci/dockcross-linux-armv7-pinned container at https://hub.docker.com/r/mxnetci/dockcross-linux-armv7-pinned/tags, our Dockerfiles here on Github can mention the tag mxnetci/dockcross-linux-armv7-pinned:TAG. Then, when pushing a new update of dockcross-linux-armv7-pinned to mxnetci and tagged as TAG2, the CI will continue to use the old version until a PR is opened to update to use TAG2 on github.

I think that's the idea, that we push the container if it can be built, so we don't have to constantly update this file. Another job would push the container to this tag whenever it has been built successfully, so we don't need the :TAG here.

But then there is no need to use mxnetci. Then we can just directly use the dockcross repository.
You mentioned that in the past dockcross sometimes broke our build. I thought we used pinning to fix that?

Mxnetci is a fully automatically handled repository while MXNetcipinned is manually curated. In case there's a security breach, the pinned repo wouldn't be impacted. Since the pinned image usually is no longer available at dockcross since they don't have tagging, the image would then be lost if the image gets deleted by the automatic process. Thus, I'd recommend to stay on the separate pinned repo

The pinned image needs to be maintained automatically. It's a good point with the credentials. We might consider having it in a private pipeline with a different repo to preserve those credentials. Edit: now that I think of it from a security perspective doesn't matter much since we are generating the mxnetci images from an unsafe environment.

larroy · 2019-12-16T18:17:22Z

@mxnet-label-bot add [pr-awaiting-review]

leezu

I withdraw the approval until we reach consensus about if 1) we need to pin images 2) if we do need pinned images, how they should be maintained 3) where the pinned images should live.
In my understanding, if we maintain pinned images automatically, we don't need to pin in the first place and can rely on the upstream repo directly.

leezu

I withdraw the approval until we reach consensus about if 1) we need to pin images 2) if we do need pinned images, how they should be maintained 3) where the pinned images should live.
In my understanding, if we maintain pinned images automatically, we don't need to pin in the first place and can rely on the upstream repo directly.

larroy · 2019-12-17T04:41:31Z

@leezu how do you suggest to proceed to reach consensus? Also wouldn't it be good to have a hotfix first for armv7 then reach consensus later?

leezu · 2019-12-17T04:58:34Z

Maybe consensus is the wrong word. Essentially I'm not clear what's the advantage of using mxnetci/dockcross-X without version tag and automating the updates to mxnetci compared to directly using the upstream dockcross.

leezu · 2019-12-17T05:02:47Z

Regarding Hotfix, armv7 is not sufficient, we need to update all arches. It's straightforward to update, ie. d3571d1 passes CI (part of #17031)

larroy · 2019-12-18T00:17:57Z

Well, why do you think it was pinned in the first place? It happened a few times that dockcross updating their containers broke the CI as we couldn't compile anymore. Otherwise it wouldn't be pinned. Cross compilation is very tricky, so if it starts failing at some point and people don't know how to fix it then the most likely outcome is that the job would be disabled.
]

leezu · 2019-12-18T03:06:22Z

@larroy fair enough, but if you don't pin it inside the code-base but rather by pushing the relevant container to the mxnetcipinned docker repository automatically, the same breakage will occur.
Just the way of intervention would be different (someone needs to roll-back the change in container on mxnetcipinned. That is very opaque for all bystanders, thus may lead to the same outcome is that a failing testcase would be disabled)
Or how would you suggest to avoid it?
My understanding is that the only way to avoid it would be to pin it inside the mxnet codebase as done currently?

larroy · 2019-12-18T18:50:22Z

One possible idea is to push the pinned container only if the build works. Or leave it pinned until we have the need to update and is done manually. What do you think is better? or you have a better idea?

leezu · 2019-12-19T00:13:08Z

Yes, I agree the update should be done automatically when it doesn't break the CI build.
Instead of replicating the checking logic in a separate system and using that to decide whether to push a new container to the mxnetcipinned repo (which then is immediately used by any CI run), we can use the source code to fix the container (via tag) and have a bot to open pull requests whenever the base container is updated.

Then the CI will test the pull request as it normal and it can be merged if it passes.

leezu · 2019-12-23T09:11:45Z

As part of #16753, the docker containers have been updated and switched to the upstream version.
#17151 tracks reintroducing pinning.

Fix armv7 build by updating the base container and pin it

36c581a

Fixes apache#16753

larroy requested review from aaronmarkham and marcoabreu as code owners December 13, 2019 19:35

leezu approved these changes Dec 14, 2019

View reviewed changes

leezu reviewed Dec 14, 2019

View reviewed changes

lanking520 added the pr-awaiting-review PR is waiting for code review label Dec 16, 2019

leezu self-requested a review December 17, 2019 03:54

leezu reviewed Dec 17, 2019

View reviewed changes

leezu suggested changes Dec 17, 2019

View reviewed changes

leezu self-requested a review December 17, 2019 03:56

leezu mentioned this pull request Dec 23, 2019

[CI] Pin edge build docker containers #17151

Open

larroy closed this Jan 8, 2020

Fix armv7 build by updating the base container and pin it #17063

Fix armv7 build by updating the base container and pin it #17063

Uh oh!

Conversation

larroy commented Dec 13, 2019

Uh oh!

larroy commented Dec 13, 2019

Uh oh!

leezu commented Dec 14, 2019

Uh oh!

leezu Dec 14, 2019

Choose a reason for hiding this comment

Uh oh!

larroy Dec 14, 2019

Choose a reason for hiding this comment

Uh oh!

leezu Dec 14, 2019

Choose a reason for hiding this comment

Uh oh!

larroy Dec 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leezu Dec 16, 2019

Choose a reason for hiding this comment

Uh oh!

marcoabreu Dec 16, 2019

Choose a reason for hiding this comment

Uh oh!

larroy Dec 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

larroy commented Dec 16, 2019

Uh oh!

leezu left a comment

Choose a reason for hiding this comment

Uh oh!

leezu left a comment

Choose a reason for hiding this comment

Uh oh!

larroy commented Dec 17, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leezu commented Dec 17, 2019

Uh oh!

leezu commented Dec 17, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

larroy commented Dec 18, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leezu commented Dec 18, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

larroy commented Dec 18, 2019

Uh oh!

leezu commented Dec 19, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leezu commented Dec 23, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

larroy Dec 14, 2019 •

edited

Loading

larroy Dec 16, 2019 •

edited

Loading

larroy commented Dec 17, 2019 •

edited

Loading

leezu commented Dec 17, 2019 •

edited

Loading

larroy commented Dec 18, 2019 •

edited

Loading

leezu commented Dec 18, 2019 •

edited

Loading

leezu commented Dec 19, 2019 •

edited

Loading