Skip to content

Conversation

@omarismail94
Copy link
Contributor

@omarismail94 omarismail94 commented Sep 13, 2020

Dockerfile for development container as alternative to what is listed in:

https://beam.apache.org/contribute/#prerequisites


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang SDK Dataflow Flink Samza Spark Twister2
Go Build Status --- Build Status --- Build Status ---
Java Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status
Build Status
Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
--- Build Status ---
XLang Build Status --- Build Status --- Build Status ---

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website Whitespace Typescript
Non-portable Build Status Build Status
Build Status
Build Status
Build Status
Build Status Build Status Build Status Build Status
Portable --- Build Status --- --- --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@omarismail94
Copy link
Contributor Author

R: @TheNeuralBit

@aaltay aaltay requested a review from TheNeuralBit September 24, 2020 23:54
Copy link
Member

@TheNeuralBit TheNeuralBit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's nice to encapsulate the dev dependencies we need in a Docker container like this, but I'm not sure we should recommend users do it. I think many of our tests will fail because the tests themselves require building and running docker containers, and I don't think that's possible from another docker container, is it?

I started the dev container and tried building the py37 container:

root@c9785ea712eb:/workspaces/beam# ./gradlew :sdks:python:container:py37:docker
...

> Task :sdks:python:container:py37:docker FAILED
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':sdks:python:container:py37:docker'.
> Process 'command 'docker'' finished with non-zero exit value 1

...

BUILD FAILED in 6s
20 actionable tasks: 15 executed, 5 up-to-date

@aaltay
Copy link
Member

aaltay commented Nov 19, 2020

Does #13308 cover the changes from this PR as well? What is the plan between the two open PRs?

@nielsbasjes
Copy link
Contributor

@aaltay
A bit of background: I am trying to make #12962 happen and this turns out to be really hard (Calcite has a LOT of API changes).

What is making it even harder for me is that just building Beam as-is turns out to be a terribly hard thing to get the build environment "just right" without making it impossible to build other projects I work on as well.

So I found this pull request and tried to make it work. Since it didn't work, had some things to fix and it appeared to be a stale pull request; I picked this up and combined it with the work I initiated at the Hadoop end 5 years ago and turned it into #13308

I included "everything" I found in this pull request and incorporated in into mine.

At this point it is still for me not possible to fully build the project, there is always something that breaks.

I could really use some help and suggestions on how to fix it all.

@aaltay
Copy link
Member

aaltay commented Nov 21, 2020

@aaltay
A bit of background: I am trying to make #12962 happen and this turns out to be really hard (Calcite has a LOT of API changes).

What is making it even harder for me is that just building Beam as-is turns out to be a terribly hard thing to get the build environment "just right" without making it impossible to build other projects I work on as well.

So I found this pull request and tried to make it work. Since it didn't work, had some things to fix and it appeared to be a stale pull request; I picked this up and combined it with the work I initiated at the Hadoop end 5 years ago and turned it into #13308

I included "everything" I found in this pull request and incorporated in into mine.

At this point it is still for me not possible to fully build the project, there is always something that breaks.

I could really use some help and suggestions on how to fix it all.

Thank you for the background. I understand this is frustrating. I am not the best person to help, both @omarismail94 and @TheNeuralBit could help you.

@omarismail94 - maybe we can get this PR in first, since it is smaller. And then help with getting the delta from @nielsbasjes's change. What do you think?

@nielsbasjes
Copy link
Contributor

Note that #13308 is not a patch on top of this one. I started with a clean master and copied the Hadoop code and adjusted that with the code of #12837 and the open comments.

@omarismail94
Copy link
Contributor Author

omarismail94 commented Nov 23, 2020

Sorry folks for the delayed response on this; I was OOO during October and forgot when I came back that I had turned off GitHub notifications!

@nielsbasjes - I built the newly committed Dockerfile (I made minor changes), then ran it using:

docker run -it --network=host -v /var/run/docker.sock:/var/run/docker.sock --mount type=bind,source="$PWD",target=/workspaces/beam <IMAGE_ID> bash

I then ran:

./gradlew  :sdks:java:core:compileJava

It compiled without the error you had in your instance. I am not sure what is causing the error on your end. I think it could be one of the system libraries being installed causing some conflict with the Java SDK

stable"
RUN apt-get install -y docker-ce docker-ce-cli containerd.io

# Set the locale
Copy link
Contributor Author

@omarismail94 omarismail94 Nov 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching to UTF-8 as some tests (Java getBytes()) fail if default is used.

Used code here: https://stackoverflow.com/a/28406007/14101188 to implement this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool fix.
Yet to me this sounds like a serious bug in these parts of Beam.
In my mind I would even go for setting it to something that is never used and fix the code to 'always run correctly'.
For now: Let's just make sure the build works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's our intention to not use any APIs that depend on the locale (e.g. getBytes() calls should specify the encoding). This seems like something one of our static analysis plugins could verify and I'm a little surprised its not happening already.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just verified that errorprone will raise a compile error if we use APIs that depend on the locale. I changed an existing call to getBytes():

> Task :sdks:java:core:compileTestJava
/usr/local/google/home/bhulette/working_dir/beam/sdks/java/core/src/test/java/org/apache/beam/sdk/util/ExposedByteArrayInputStreamTest.java:77: warning: [DefaultCharset] Implicit use of the platform default charset, which can result in differing behaviour between JVM executions or incorrect behavior if the encoding of the data source doesn't match expectations.
    assertArrayEquals("ello World!".getBytes(), ret);
                                            ^
    (see http://errorprone.info/bugpattern/DefaultCharset)
  Did you mean 'assertArrayEquals("ello World!".getBytes(UTF_8), ret);' or 'assertArrayEquals("ello World!".getBytes(Charset.defaultCharset()), ret);'?

We can't control if one of our dependencies is doing this though, maybe that's what's going on. We should use this opportunity to file a jira about the specific tests that are sensitive to locale (if one doesn't exist already)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A possible different source of this class of problem is generated code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spoke with @omarismail94 about this offline. It looks the issue is we have getBytes(Charsets.DefaultCharset()) in many places, which errorprone allows (in fact its one of the suggested corrections in the error message).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just changed them all and set up a new PR: #13410

@nielsbasjes
Copy link
Contributor

So I cloned your repo and checkout the devDocker branch.
Then docker build -t devdocker:20201123 .
After this image was built I did as you indicated docker run -it --network=host -v /var/run/docker.sock:/var/run/docker.sock --mount type=bind,source="$PWD",target=/workspaces/beam devdocker:20201123 bash
Then (as indicated on https://beam.apache.org/contribute/ ) I run ./gradlew check

Now I the problem I had indeed does not occur (good!) yet I did run into some other problems.

Some things I found while comparing your setup and mine

  • Yours works better for the Java stuff
  • Several of the website tasks fail.
  • Big difference in the installed tools.
    • Yours is missing gcc (needed for :sdks:java:container:goVet )
    • You install docker-ce (manually pulling in the repo) where I install the ubuntu native docker-io. Which is better?

In addition I think the scripting my version has is really a must have as I found that the .gradle directory is owned by root.

So I propose to make a combination of your image (+gcc) and my scripting to create something people can easily use.

@TheNeuralBit
Copy link
Member

Thank you for the analysis @nielsbasjes!

So I propose to make a combination of your image (+gcc) and my scripting to create something people can easily use.

This sounds like a reasonable approach to me

@omarismail94
Copy link
Contributor Author

Yeah, I think @nielsbasjes 's PR is the way to go! I tested it and it seems to work. I left my comments in #13308

@TheNeuralBit
Copy link
Member

Closing this since #13308 is now merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants