Skip to content

logging: EFK must avoid NFS#2599

Merged
ahardin-rh merged 1 commit into
openshift:masterfrom
sosiouxme:20160801-efk-no-nfs-2
Aug 4, 2016
Merged

logging: EFK must avoid NFS#2599
ahardin-rh merged 1 commit into
openshift:masterfrom
sosiouxme:20160801-efk-no-nfs-2

Conversation

@sosiouxme
Copy link
Copy Markdown
Member

It came to our attention via
https://bugzilla.redhat.com/show_bug.cgi?id=1347666
and further research (
http://mail-archives.apache.org/mod_mbox/lucene-java-user/201210.mbox/%3C01a401cda09e$17b00160$47100420$@thetaphi.de%3E
and
https://lucene.apache.org/core/4_8_0/core/org/apache/lucene/store/NativeFSLockFactory.html
) that NFS is a not suitable for Lucene storage. This documents how to
use local storage, that NFS is not supported, and what to do if NFS is
all you have.

@sosiouxme
Copy link
Copy Markdown
Member Author

@ewolinetz @richm PTAL

@adellape or @ahardin-rh this is relatively high priority per the bug. It's relevant to origin and OSE 3.1+

@sosiouxme
Copy link
Copy Markdown
Member Author

heck /cc @thoraxe too.

@sosiouxme sosiouxme force-pushed the 20160801-efk-no-nfs-2 branch from a459a2f to e49eab5 Compare August 2, 2016 21:13
Comment thread install_config/aggregate_logging.adoc Outdated
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably remind users to stop their cluster first

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose so. I figured they were gonna lose any ephemeral data anyway...

@ewolinetz
Copy link
Copy Markdown

oi... do we forsee having a flag to do this for users with the deployer?

@sosiouxme
Copy link
Copy Markdown
Member Author

I hadn't thought much about deployer parameters. It might be difficult to specify individual nodeselectors for each instance. But it probably wouldn't be hard to patch in the local mounts and the privileged security context.

BTW, need to re-examine whether there's a way short of "privileged" that gets us past the SELinux problem with local mounts.

Comment thread install_config/aggregate_logging.adoc Outdated
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you have to access to the privileged SCC here or will hostmount-anyuid (which does not allow privileged) be enough?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pweil- I tried hostmount-anyuid first and it did not have access due to SELinux context. I believe it's much the same problem we had with fluentd - openshift/origin-aggregated-logging#89 (comment)

It seems like less-than-privileged may be possible, but I'm not quite sure how and it seems like it would be a PITA for a user to set up. What do you think?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, whaddya know... openshift/origin#8504

Copy link
Copy Markdown
Member Author

@sosiouxme sosiouxme Aug 4, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pweil- I'm a little foggy on whether exactly the same fix will apply. The problem with fluentd was that it was trying to read and write in /var/log. Here we're trying to read and write in an admin-supplied storage volume; I suppose we could have them chcon the volume to whatever would be convenient? If so, what would that be - is there a label that will allow read/write for any context the pod may be running in?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The kubelet (when a pod is using host namespaces) or docker should be performing a relabeling of the volume when it can. It uses the docker opts to pass in the selinux context that is being used. If that isn't working or this is a different use case then we can figure out what is different. cc @pmorie who is very familiar with the selinux code for volumes

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What the AVC looks like, FYI:

type=AVC msg=audit(1470323991.042:27487): avc:  denied  { write } for  pid=9883 comm="java" name="es-storage" dev="dm-
0" ino=68862303 scontext=system_u:system_r:svirt_lxc_net_t:s0:c2,c8 tcontext=unconfined_u:object_r:usr_t:s0 tclass=dir
type=SYSCALL msg=audit(1470323991.042:27487): arch=c000003e syscall=83 success=no exit=-13 a0=7ff7d43d3780 a1=1ff a2=7
ff7d43d3780 a3=7ff7c47bd728 items=0 ppid=15669 pid=9883 auid=4294967295 uid=1000 gid=0 euid=1000 suid=1000 fsuid=1000 
egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="java" exe="/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.101-3.b13.el7_2
.x86_64/jre/bin/java" subj=system_u:system_r:svirt_lxc_net_t:s0:c2,c8 key=(null)

@sferich888
Copy link
Copy Markdown
Contributor

Should we also warn customers to avoid gluster (which uses NFS on the backend)?

@sosiouxme
Copy link
Copy Markdown
Member Author

Think I've addressed existing concerns... any further?

@ahardin-rh ahardin-rh added this to the Next Release milestone Aug 4, 2016
@ahardin-rh ahardin-rh self-assigned this Aug 4, 2016
Comment thread install_config/aggregate_logging.adoc Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is an example complication?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The points that follow are the complications. Perhaps I should call them
something else... considerations?

On Thu, Aug 4, 2016 at 2:26 PM, Ashley Hardin notifications@github.com
wrote:

In install_config/aggregate_logging.adoc
#2599 (comment)
:

@@ -416,24 +416,82 @@ The deployer creates an ephemeral deployment in which all of a pod's data is
lost upon restart. For production usage, add a persistent storage volume to each
Elasticsearch deployment configuration.

-The following example specifies a volume for an Elasticsearch replica (using a
-xref:../architecture/additional_concepts/storage.adoc#persistent-volume-claims[PersistentVolumeClaim]):
+The best-performing volumes are local disks, if it is possible to use
+them. There are some complications with doing so.

what is an example complication?


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/openshift/openshift-docs/pull/2599/files/1784a1c09d644badf0f598cea1d6528883d92537#r73576744,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AABz-gaBc2nMpYonf1MkT7KcC4gBFRSUks5qci7PgaJpZM4JbDXF
.

@ahardin-rh
Copy link
Copy Markdown
Contributor

@sosiouxme just a few minor comments from me. Thanks!

@sosiouxme
Copy link
Copy Markdown
Member Author

@ahardin-rh think I addressed your comments now.

@ahardin-rh
Copy link
Copy Markdown
Contributor

@sosiouxme Looks good! Thanks! Just a squash and we're good to go 🍻

It came to our attention via
https://bugzilla.redhat.com/show_bug.cgi?id=1347666
and further research (
http://mail-archives.apache.org/mod_mbox/lucene-java-user/201210.mbox/%3C01a401cda09e$17b00160$47100420$@thetaphi.de%3E
and
https://lucene.apache.org/core/4_8_0/core/org/apache/lucene/store/NativeFSLockFactory.html
) that NFS is a not suitable for Lucene storage. This documents how to
use local storage, that NFS is not supported, and what to do if NFS is
all you have.
@sosiouxme sosiouxme force-pushed the 20160801-efk-no-nfs-2 branch from 510fef9 to 91ba11c Compare August 4, 2016 21:19
@sosiouxme
Copy link
Copy Markdown
Member Author

ready, then.

On Thu, Aug 4, 2016 at 5:16 PM, Ashley Hardin notifications@github.com
wrote:

@sosiouxme https://github.com/sosiouxme Looks good! Thanks! Just a
squash and we're good to go 🍻


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#2599 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AABz-mF00Azhf6VRd189gdoMlQ3aCFd0ks5qclaigaJpZM4JbDXF
.

@ahardin-rh
Copy link
Copy Markdown
Contributor

[rev_history]
|xref:../install_config/aggregate_logging.adoc#install-config-aggregate-logging[Aggregating Container Logs]
|Added that NFS is a not suitable for Lucene storage, NFS is not supported, and how to
use local storage.
%

@sosiouxme sosiouxme deleted the 20160801-efk-no-nfs-2 branch August 5, 2016 12:55
@bfallonf bfallonf modified the milestones: Next Release, Staging Aug 8, 2016
@bfallonf bfallonf modified the milestones: Staging, Next Release, Published - 08/08/2016 Aug 8, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants