Skip to content

Fix permission problems in docker#11299

Merged
clintropolis merged 7 commits intoapache:masterfrom
FrankChen021:data_dir_permission
Jun 2, 2021
Merged

Fix permission problems in docker#11299
clintropolis merged 7 commits intoapache:masterfrom
FrankChen021:data_dir_permission

Conversation

@FrankChen021
Copy link
Copy Markdown
Member

@FrankChen021 FrankChen021 commented May 25, 2021

Fixes #11278 , #11298

Description

To fix 11278, mv instruction is used to replace original ln instruction to eliminate symlink so that the docker image works with AWS Fargate. This change has been verified with the help of @mattmassicotte , reporter of that issue.

To fix 11298, /opt/shared is created in the Dockerfile so that a named volume could be used in the docker-compose. And a named volume druid_shared is used to share segments and task log files among Druid services

Because the problems we have encountered are platform independent, I test these changes on 3 different platform following the steps below

Test Steps

  1. Execute command docker volume prune to clean up previous volumes
  2. Launch the example docker-compose
  3. Start an ingestion task | Task Successes
  4. Task successes

Test Result

Host OS Host OS Version Docker Versoin Test Result
CentOS 7.2.1511 20.10.6 OK
Ubuntu 20.04.2 20.10.2 OK
macOS 11.3 3.3.1 OK

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@FrankChen021 FrankChen021 added the Docker https://hub.docker.com/r/apache/druid label May 25, 2021
@jihoonson jihoonson added the Bug label May 25, 2021
@jihoonson
Copy link
Copy Markdown
Contributor

@FrankChen021 thank you for making this PR. I tested your change on my machine, but, unfortunately, ingestion still fails with the same error. I'm using linux mint 20.1, which is based on Ubuntu Focal. Should we update the docker-compose.yml file per your suggestion in this PR instead of documenting it?

@FrankChen021
Copy link
Copy Markdown
Member Author

@jihoonson I don't know why host directory is used in the original docker-compose.yml, and I'm waiting for an answer from @clintropolis . If there's no special reason, I think it's acceptable to use a named volume in the example compose file. Before that, we have to execute a command to set the permission for linux systems.

@FrankChen021
Copy link
Copy Markdown
Member Author

@jihoonson will the ingestion success after executing the following command ?

sudo docker exec --user root middlemanager chown druid:druid /opt/data

@clintropolis
Copy link
Copy Markdown
Member

@jihoonson I don't know why host directory is used in the original docker-compose.yml, and I'm waiting for an answer from @clintropolis . If there's no special reason, I think it's acceptable to use a named volume in the example compose file. Before that, we have to execute a command to set the permission for linux systems.

Sorry for the delay, as long as the same volume for storage is shared and available to the historical, middle manager, and overlord containers then everything should be ok. The important part is that it is shared between containers, because the storage directory is used as deep storage for the example docker cluster, so the segments and task logs that the peons running in the middle manager publish can be then read by the overlord and historical. I don't think it matters too much however this happens.

@FrankChen021
Copy link
Copy Markdown
Member Author

@clintropolis A named volume in docker-compose also shares contents among overlords, middle managers and historicals. At first I thought the reason why the host directory 'storge' is used has something to do with the disk size at first. Because by default, docker creates directory for a named volume under /var which usually has smaller disk size for druid data storage.

Since there's no special reasons to use a directory on the host OS to mount as a volume in docker, I think it's acceptable for us to use a named volume in the example docker-compose.yml for data storage, so that there won't be permission problems on Linux platforms. Another reason is that the docker-compose.yml is only an example, it's not recommended in production, the disk size problem of a named volume should not be a concern for us if we state it in the doc.

What do you think ? @clintropolis @jihoonson @xvrl
If you agree, I will push a new commit.

@clintropolis
Copy link
Copy Markdown
Member

I think anything that makes the quick start work across platforms is agreeable; it is not very large data so disk size shouldn't matter afaik.

Comment thread distribution/docker/Dockerfile Outdated
@@ -58,7 +58,10 @@ COPY --chown=druid:druid --from=builder /opt /opt
COPY distribution/docker/druid.sh /druid.sh
RUN mkdir /opt/druid/var \
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
RUN mkdir /opt/druid/var \
RUN mkdir /opt/druid/var /opt/data \

Comment thread distribution/docker/Dockerfile Outdated
RUN mkdir /opt/druid/var \
&& chown druid:druid /opt/druid/var \
&& chmod 775 /opt/druid/var
&& chmod 775 /opt/druid/var \
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
&& chmod 775 /opt/druid/var \
&& chmod 775 /opt/druid/var /opt/data \

Comment thread distribution/docker/Dockerfile Outdated
@@ -58,7 +58,10 @@ COPY --chown=druid:druid --from=builder /opt /opt
COPY distribution/docker/druid.sh /druid.sh
RUN mkdir /opt/druid/var \
&& chown druid:druid /opt/druid/var \
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
&& chown druid:druid /opt/druid/var \
&& chown druid:druid /opt/druid/var /opt/data \

Comment thread distribution/docker/Dockerfile Outdated
&& chown druid:druid /opt/druid/var \
&& chmod 775 /opt/druid/var
&& chmod 775 /opt/druid/var \
&& mkdir /opt/data \
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a comment to explain the difference between /opt/data and /opt/druid/var ?

Comment thread docs/tutorials/docker.md Outdated
The example `docker-compose.yml` will create a container for each Druid service, as well as Zookeeper and a PostgreSQL container as the metadata store. Deep storage will be a local directory, by default configured as `./storage` relative to your `docker-compose.yml` file, and will be mounted as `/opt/data` and shared between Druid containers which require access to deep storage. The Druid containers are configured via an [environment file](https://github.com/apache/druid/blob/{{DRUIDVERSION}}/distribution/docker/environment).
The example `docker-compose.yml` will create a container for each Druid service, as well as Zookeeper and a PostgreSQL container as the metadata store.

It will also create a named volumes `druid-data`, which is mounted as `opt/data` in container, as deep storage to keep and share segments and task logs among Druid services.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel druid-data and /opt/data is a little generic and doesn't really convey the idea that this is meant for shared storage. /opt/druid/var is also used for local "data" so the distinction is not very clear.

Maybe we should give it a more descriptive name such as /opt/shared, or something else if you have better suggestions, wdyt?

@FrankChen021
Copy link
Copy Markdown
Member Author

Thanks @xvrl , I have submitted a new commit to resolve all your comments. Test result will be updated in this weekend.

Copy link
Copy Markdown
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

latest changes lgtm, will try to test soon

@clintropolis clintropolis added this to the 0.21.1 milestone May 28, 2021
Copy link
Copy Markdown
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FrankChen021 thanks. +1 after CI. The quick start ingestion works as it is after applying this change.

As I mentioned in the dev thread, it seems to me that the real problem is this docker image is not being tested properly. I'm not sure why the tests we have today don't catch this error. Do you have any plan for follow-ups to investigate this issue and add more tests if necessary?

@FrankChen021
Copy link
Copy Markdown
Member Author

As I mentioned in the dev thread, it seems to me that the real problem is this docker image is not being tested properly. I'm not sure why the tests we have today don't catch this error.

I have the same question as you. I will check it when I'm free.

@FrankChen021
Copy link
Copy Markdown
Member Author

I tested these changes on 3 different host OS, all worked. See the description of this PR.

@clintropolis clintropolis merged commit d5139c9 into apache:master Jun 2, 2021
clintropolis pushed a commit to clintropolis/druid that referenced this pull request Jun 2, 2021
* Create /opt/data to fix permission problem

* eliminate symlink to avoid compatibility problem on AWS Fargate

* Add a workaround section

* Update instruction for named volume

* Use named volume in docker-compose

* Revert some doc change

* Resolve review comments
@FrankChen021 FrankChen021 deleted the data_dir_permission branch June 2, 2021 01:49
jihoonson pushed a commit that referenced this pull request Jun 2, 2021
* Create /opt/data to fix permission problem

* eliminate symlink to avoid compatibility problem on AWS Fargate

* Add a workaround section

* Update instruction for named volume

* Use named volume in docker-compose

* Revert some doc change

* Resolve review comments

Co-authored-by: frank chen <frank.chen021@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug Docker https://hub.docker.com/r/apache/druid

Projects

None yet

Development

Successfully merging this pull request may close these issues.

storage directory permission problem when using the example docker-compose.yml Unable to use AWS Fargate bind mounts with Dockerfile

4 participants