logging: EFK must avoid NFS by sosiouxme · Pull Request #2599 · openshift/openshift-docs

sosiouxme · 2016-08-02T21:09:03Z

It came to our attention via
https://bugzilla.redhat.com/show_bug.cgi?id=1347666
and further research (
http://mail-archives.apache.org/mod_mbox/lucene-java-user/201210.mbox/%3C01a401cda09e$17b00160$47100420$@thetaphi.de%3E
and
https://lucene.apache.org/core/4_8_0/core/org/apache/lucene/store/NativeFSLockFactory.html
) that NFS is a not suitable for Lucene storage. This documents how to
use local storage, that NFS is not supported, and what to do if NFS is
all you have.

sosiouxme · 2016-08-02T21:10:33Z

@ewolinetz @richm PTAL

@adellape or @ahardin-rh this is relatively high priority per the bug. It's relevant to origin and OSE 3.1+

sosiouxme · 2016-08-02T21:11:27Z

heck /cc @thoraxe too.

ewolinetz · 2016-08-02T21:20:12Z

Should probably remind users to stop their cluster first

I suppose so. I figured they were gonna lose any ephemeral data anyway...

ewolinetz · 2016-08-02T21:20:21Z

oi... do we forsee having a flag to do this for users with the deployer?

sosiouxme · 2016-08-03T12:09:13Z

I hadn't thought much about deployer parameters. It might be difficult to specify individual nodeselectors for each instance. But it probably wouldn't be hard to patch in the local mounts and the privileged security context.

BTW, need to re-examine whether there's a way short of "privileged" that gets us past the SELinux problem with local mounts.

pweil- · 2016-08-03T14:52:20Z

do you have to access to the privileged SCC here or will hostmount-anyuid (which does not allow privileged) be enough?

@pweil- I tried hostmount-anyuid first and it did not have access due to SELinux context. I believe it's much the same problem we had with fluentd - openshift/origin-aggregated-logging#89 (comment)

It seems like less-than-privileged may be possible, but I'm not quite sure how and it seems like it would be a PITA for a user to set up. What do you think?

Hey, whaddya know... openshift/origin#8504

@pweil- I'm a little foggy on whether exactly the same fix will apply. The problem with fluentd was that it was trying to read and write in /var/log. Here we're trying to read and write in an admin-supplied storage volume; I suppose we could have them chcon the volume to whatever would be convenient? If so, what would that be - is there a label that will allow read/write for any context the pod may be running in?

The kubelet (when a pod is using host namespaces) or docker should be performing a relabeling of the volume when it can. It uses the docker opts to pass in the selinux context that is being used. If that isn't working or this is a different use case then we can figure out what is different. cc @pmorie who is very familiar with the selinux code for volumes

What the AVC looks like, FYI:

type=AVC msg=audit(1470323991.042:27487): avc: denied { write } for pid=9883 comm="java" name="es-storage" dev="dm- 0" ino=68862303 scontext=system_u:system_r:svirt_lxc_net_t:s0:c2,c8 tcontext=unconfined_u:object_r:usr_t:s0 tclass=dir type=SYSCALL msg=audit(1470323991.042:27487): arch=c000003e syscall=83 success=no exit=-13 a0=7ff7d43d3780 a1=1ff a2=7 ff7d43d3780 a3=7ff7c47bd728 items=0 ppid=15669 pid=9883 auid=4294967295 uid=1000 gid=0 euid=1000 suid=1000 fsuid=1000 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="java" exe="/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.101-3.b13.el7_2 .x86_64/jre/bin/java" subj=system_u:system_r:svirt_lxc_net_t:s0:c2,c8 key=(null)

sferich888 · 2016-08-03T19:01:23Z

Should we also warn customers to avoid gluster (which uses NFS on the backend)?

sosiouxme · 2016-08-04T14:36:01Z

Think I've addressed existing concerns... any further?

ahardin-rh · 2016-08-04T18:26:23Z

what is an example complication?

The points that follow are the complications. Perhaps I should call them
something else... considerations?

On Thu, Aug 4, 2016 at 2:26 PM, Ashley Hardin notifications@github.com
wrote:

In install_config/aggregate_logging.adoc
#2599 (comment)
:

@@ -416,24 +416,82 @@ The deployer creates an ephemeral deployment in which all of a pod's data is
lost upon restart. For production usage, add a persistent storage volume to each
Elasticsearch deployment configuration.

-The following example specifies a volume for an Elasticsearch replica (using a
-xref:../architecture/additional_concepts/storage.adoc#persistent-volume-claims[PersistentVolumeClaim]):
+The best-performing volumes are local disks, if it is possible to use
+them. There are some complications with doing so.

what is an example complication?

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/openshift/openshift-docs/pull/2599/files/1784a1c09d644badf0f598cea1d6528883d92537#r73576744,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AABz-gaBc2nMpYonf1MkT7KcC4gBFRSUks5qci7PgaJpZM4JbDXF
.

ahardin-rh · 2016-08-04T18:31:49Z

@sosiouxme just a few minor comments from me. Thanks!

sosiouxme · 2016-08-04T21:00:36Z

@ahardin-rh think I addressed your comments now.

ahardin-rh · 2016-08-04T21:16:18Z

@sosiouxme Looks good! Thanks! Just a squash and we're good to go 🍻

It came to our attention via https://bugzilla.redhat.com/show_bug.cgi?id=1347666 and further research ( http://mail-archives.apache.org/mod_mbox/lucene-java-user/201210.mbox/%3C01a401cda09e$17b00160$47100420$@thetaphi.de%3E and https://lucene.apache.org/core/4_8_0/core/org/apache/lucene/store/NativeFSLockFactory.html ) that NFS is a not suitable for Lucene storage. This documents how to use local storage, that NFS is not supported, and what to do if NFS is all you have.

sosiouxme · 2016-08-04T21:20:36Z

ready, then.

On Thu, Aug 4, 2016 at 5:16 PM, Ashley Hardin notifications@github.com
wrote:

@sosiouxme https://github.com/sosiouxme Looks good! Thanks! Just a
squash and we're good to go 🍻

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#2599 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AABz-mF00Azhf6VRd189gdoMlQ3aCFd0ks5qclaigaJpZM4JbDXF
.

ahardin-rh · 2016-08-04T21:40:52Z

[rev_history]
|xref:../install_config/aggregate_logging.adoc#install-config-aggregate-logging[Aggregating Container Logs]
|Added that NFS is a not suitable for Lucene storage, NFS is not supported, and how to
use local storage.
%

sosiouxme force-pushed the 20160801-efk-no-nfs-2 branch from a459a2f to e49eab5 Compare August 2, 2016 21:13

ewolinetz reviewed Aug 2, 2016
View reviewed changes

pweil- reviewed Aug 3, 2016
View reviewed changes

ahardin-rh added this to the Next Release milestone Aug 4, 2016

ahardin-rh added the branch/enterprise-3.1 label Aug 4, 2016

ahardin-rh self-assigned this Aug 4, 2016

ahardin-rh reviewed Aug 4, 2016
View reviewed changes

sosiouxme force-pushed the 20160801-efk-no-nfs-2 branch from 510fef9 to 91ba11c Compare August 4, 2016 21:19

ahardin-rh added branch/enterprise-3.2 and removed branch/enterprise-3.1 labels Aug 4, 2016

ahardin-rh merged commit f662c80 into openshift:master Aug 4, 2016

ahardin-rh mentioned this pull request Aug 4, 2016

Follow-up edits to PR#2599 #2607

Merged

ahardin-rh added the branch/enterprise-3.1 label Aug 4, 2016

sosiouxme deleted the 20160801-efk-no-nfs-2 branch August 5, 2016 12:55

bfallonf modified the milestones: Next Release, Staging Aug 8, 2016

bfallonf modified the milestones: Staging, Next Release, Published - 08/08/2016 Aug 8, 2016

ahardin-rh mentioned this pull request Aug 25, 2016

logging: clarifications on NFS workaround #2703

Merged

Conversation

sosiouxme commented Aug 2, 2016

Uh oh!

sosiouxme commented Aug 2, 2016

Uh oh!

sosiouxme commented Aug 2, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ewolinetz commented Aug 2, 2016

Uh oh!

sosiouxme commented Aug 3, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sosiouxme Aug 4, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sferich888 commented Aug 3, 2016

Uh oh!

sosiouxme commented Aug 4, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahardin-rh commented Aug 4, 2016

Uh oh!

sosiouxme commented Aug 4, 2016

Uh oh!

ahardin-rh commented Aug 4, 2016

Uh oh!

sosiouxme commented Aug 4, 2016

Uh oh!

ahardin-rh commented Aug 4, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

sosiouxme Aug 4, 2016 •

edited

Loading