diff --git a/_topic_map.yml b/_topic_map.yml index 55114dde7562..9be26537b2b8 100644 --- a/_topic_map.yml +++ b/_topic_map.yml @@ -1633,19 +1633,21 @@ Name: Logging Dir: logging Distros: openshift-enterprise,openshift-origin,openshift-dedicated Topics: -- Name: About cluster logging +- Name: Release notes + File: cluster-logging-release-notes +- Name: About Logging File: cluster-logging -- Name: Installing cluster logging +- Name: Installing Logging File: cluster-logging-deploying Distros: openshift-enterprise,openshift-origin -- Name: Installing the Cluster Logging and Elasticsearch Operators +- Name: Installing the Logging and Elasticsearch Operators File: dedicated-cluster-deploying Distros: openshift-dedicated -- Name: Configuring your cluster logging deployment +- Name: Configuring your Logging deployment Dir: config Distros: openshift-enterprise,openshift-origin Topics: - - Name: About the Cluster Logging Custom Resource + - Name: About the Logging custom resource File: cluster-logging-configuring-cr - Name: Configuring the logging collector File: cluster-logging-collector @@ -1653,15 +1655,15 @@ Topics: File: cluster-logging-log-store - Name: Configuring the log visualizer File: cluster-logging-visualizer - - Name: Configuring cluster logging storage + - Name: Configuring Logging storage File: cluster-logging-storage-considerations - - Name: Configuring CPU and memory limits for cluster logging components + - Name: Configuring CPU and memory limits for Logging components File: cluster-logging-memory - - Name: Using tolerations to control cluster logging pod placement + - Name: Using tolerations to control Logging pod placement File: cluster-logging-tolerations - - Name: Moving the cluster logging resources with node selectors + - Name: Moving the Logging resources with node selectors File: cluster-logging-moving-nodes - - Name: Configuring systemd-journald for cluster logging + - Name: Configuring systemd-journald for Logging File: cluster-logging-systemd - Name: Configuring the log curator File: cluster-logging-curator @@ -1685,22 +1687,22 @@ Topics: # - Name: Forwarding logs using ConfigMaps # File: cluster-logging-external-configmap # Distros: openshift-enterprise,openshift-origin -- Name: Updating cluster logging +- Name: Updating Logging File: cluster-logging-upgrading -- Name: Uninstalling cluster logging +- Name: Uninstalling Logging File: cluster-logging-uninstall Distros: openshift-dedicated - Name: Viewing cluster dashboards File: cluster-logging-dashboards -- Name: Troubleshooting cluster logging +- Name: Troubleshooting Logging Dir: troubleshooting Distros: openshift-enterprise,openshift-origin Topics: - - Name: Viewing cluster logging status + - Name: Viewing Logging status File: cluster-logging-cluster-status - Name: Viewing the status of the log store File: cluster-logging-log-store-status - - Name: Understanding cluster logging alerts + - Name: Understanding Logging alerts File: cluster-logging-alerts - Name: Troubleshooting the log visualizer File: cluster-logging-troubleshooting-visualizer @@ -1708,7 +1710,7 @@ Topics: File: cluster-logging-troubleshooting-curator - Name: Collecting logging data for Red Hat Support File: cluster-logging-must-gather -- Name: Uninstalling cluster logging +- Name: Uninstalling Logging File: cluster-logging-uninstall - Name: Exported fields File: cluster-logging-exported-fields diff --git a/logging/cluster-logging-release-notes.adoc b/logging/cluster-logging-release-notes.adoc new file mode 100644 index 000000000000..73aa9079444b --- /dev/null +++ b/logging/cluster-logging-release-notes.adoc @@ -0,0 +1,242 @@ +[id="cluster-logging-release-notes"] += {ProductName} 5.0 release notes +include::modules/cluster-logging-document-attributes.adoc[] +:context: cluster-logging-release-notes-v5x + +toc::[] + +[id="openshift-logging-5-0-about-this-release"] +== About this release + +(link:https://errata.devel.redhat.com/docs/show/66974[RHBA-2020:66974-04 Errata Advisory for Openshift Logging 5.0.0]) is now available. New features, changes, and known issues that pertain to {ProductName} 5.0 are included in this topic. + +[id="openshift-logging-5-0-inclusive-language"] +== Making open source more inclusive + +Red Hat is committed to replacing problematic language in our code, documentation, and web properties. We are beginning with these four terms: master, slave, blacklist, and whitelist. Because of the enormity of this endeavor, these changes will be implemented gradually over several upcoming releases. For more details, see link:https://www.redhat.com/en/blog/making-open-source-more-inclusive-eradicating-problematic-language[Red Hat CTO Chris Wright’s message]. + +[id="openshift-logging-5-0-new-features-and-enhancements"] +== New features and enhancements + +This release adds improvements related to the following concepts. + +[discrete] +[id="ocp-4-7-cluster-logging-renamed-openshift-logging"] +==== Cluster Logging becomes Red Hat OpenShift Logging + +With this release, Cluster Logging becomes Red Hat OpenShift Logging, version 5.0. + +[discrete] +[id="openshift-logging-5-0-eo-max-five-shards"] +// https://bugzilla.redhat.com/show_bug.cgi?id=1883444 +=== Maximum five primary shards per index + +With this release, the Elasticsearch Operator (EO) sets the number of primary shards for an index between one and five, depending on the number of data nodes defined for a cluster. + +Previously, the EO set the number of shards for an index to the number of data nodes. When an index in Elasticsearch was configured with a number of replicas, it created that many replicas for each primary shard, not per index. Therefore, as the index sharded, a greater number of replica shards existed in the cluster, which created a lot of overhead for the cluster to replicate and keep in sync. + +[discrete] +[id="openshift-logging-5-0-updated-eo-name"] +// https://bugzilla.redhat.com/show_bug.cgi?id=1898920 +=== Updated Elasticsearch Operator name and maturity level + +This release updates the display name of the Elasticsearch Operator and operator maturity level. The new display name and clarified specific use for the Elasticsearch Operator are updated in Operator Hub. + +[discrete] +[id="openshift-logging-5-0-es-csv-success"] +// https://bugzilla.redhat.com/show_bug.cgi?id=1913464 +=== Elasticsearch Operator reports on CSV success + +This release adds reporting metrics to indicate that installing or upgrading the Elasticsearch Operator ClusterServiceVersion (CSV) was successful. Previously, there was no way to determine, or generate an alert, if the CSV installation or upgrade for the Elasticsearch Operator failed. Now, an alert is provided as part of the Elasticsearch Operator. + +[discrete] +[id="openshift-logging-5-0-reduced-cert-warnings"] +// https://bugzilla.redhat.com/show_bug.cgi?id=1884812 +=== Reduce Elasticsearch pod certificate permission warnings + +Previously, when the Elasticsearch pod started, it generated certificate permission warnings, which misled some users to troubleshoot their clusters. The current release fixes these permissions issues to reduce these types of notifications. + +[discrete] +[id="openshift-logging-5-0-links-from-alerts"] +// https://bugzilla.redhat.com/show_bug.cgi?id=1913469 +=== New links from alerts to explanations and troubleshooting + +This release adds a link from the alerts that an Elasticsearch cluster generates to a page of explanations and troubleshooting steps for that alert. + +[discrete] +[id="openshift-logging-5-0-curl-connection-timeout"] +// https://bugzilla.redhat.com/show_bug.cgi?id=1881709 +=== New connection timeout for deletion jobs + +The current release adds a connection timeout for deletion jobs, which helps prevent pods from occasionally hanging when they query Elasticsearch to delete indices. Now, if the underlying 'curl' call does not connect before the timeout period elapses, the timeout terminates the call. + +[discrete] +[id="openshift-logging-5-0-minimize-updates-to-rollover-index-templates"] +// https://bugzilla.redhat.com/show_bug.cgi?id=1920215 +=== Minimize updates to rollover index templates + +With this enhancement, the Elasticsearch Operator only updates its rollover index templates if they have different field values. Index templates have a higher priority than indices. When the template is updated, the cluster prioritizes distributing them over the index shards, impacting performance. To minimize Elasticsearch cluster operations, the operator only updates the templates when the number of primary shards or replica shards changes from what is currently configured. + + +// UNUSED IN THIS 5.0.0 RELEASE - KEEP THIS BOILERPLATE FOR FUTURE RELEASES +// [id="openshift-logging-5-0-notable-technical-changes"] +// == Notable technical changes +// +// {ProductName} 5.0 introduces the following notable technical changes. +// +// [id="openshift-logging-5-0-deprecated-removed-features"] +// == Deprecated and removed features +// +// Some features available in previous releases have been deprecated or removed. +// +// Deprecated functionality is still included in {ProductName} and continues to be supported; however, it will be removed in a future release of this product and is not recommended for new deployments. For the most recent list of major functionality deprecated and removed within {ProductName} {product-version}, refer to the table below. Additional details for more fine-grained functionality that has been deprecated and removed are listed after the table. +// +// In the table, features are marked with the following statuses: +// +// * *GA*: _General Availability_ +// * *DEP*: _Deprecated_ +// * *REM*: _Removed_ +// +// .Deprecated and removed features tracker +// [cols="3,1,1,1",options="header"] +// |==== +// |Feature |OCP 4.5 |OCP 4.6 |OCP 4.7 +// +// |`OperatorSource` objects +// |DEP +// |REM +// |REM +// |==== +// +// [id="openshift-logging-5-0-deprecated-features"] +// === Deprecated features +// +// [id="openshift-logging-5-0-removed-features"] +// === Removed features + +[id="openshift-logging-5-0-bug-fixes"] +== Bug fixes + +* Previously, Elasticsearch rejected HTTP requests whose headers exceeded the default max header size, 8 KB. Now, the max header size is 128 KB, and Elasticsearch no longer rejects HTTP requests for exceeding the max header size. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1845293[*BZ#1845293*]) + +* Previously, nodes did not recover from `Pending` status because a software bug did not correctly update their statuses in the Elasticsearch custom resource (CR). The current release fixes this issue, so the nodes can recover when their status is `Pending.` (link:https://bugzilla.redhat.com/show_bug.cgi?id=1887357[*BZ#1887357*]) + +* Previously, when the Cluster Logging Operator (CLO) scaled down the number of Elasticsearch nodes in the `clusterlogging` CR to three nodes, it omitted previously-created nodes that had unique IDs. The Elasticsearch Operator rejected the update because it has safeguards that prevent nodes with unique IDs from being removed. Now, when the CLO scales down the number of nodes and updates the Elasticsearch CR, it marks nodes with unique IDs as count 0 instead of omitting them. As a result, users can scale down their cluster to 3 nodes by using the `clusterlogging` CR. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1879150[*BZ#1879150*]) + +* Previously, the Fluentd collector pod went into a crash loop when the `ClusterLogForwarder` had an incorrectly-configured secret. The current release fixes this issue. Now, the `ClusterLogForwarder` validates the secrets and reports any errors in its status field. As a result, it does not cause the Fluentd collector pod to crash. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1888943[*1888943*]) + +* Previously, if you updated the Kibana resource configuration in the `clusterlogging` instance to `resource{}`, the resulting nil map caused a panic and changed the status of the Elasticsearch Operator to `CrashLoopBackOff`. The current release fixes this issue by initializing the map. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1889573[*BZ#1889573*]) + +* Previously, the fluentd collector pod went into a crash loop when the ClusterLogForwarder had multiple outputs using the same secret. The current release fixes this issue. Now, multiple outputs can share a secret. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1890072[*1890072*]) + +* Previously, if you deleted a Kibana route, the Cluster Logging Operator (CLO) could not recover or recreate it. Now, the CLO watches the route, and if you delete the route, the Elasticsearch Operator can reconcile or recreate it. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1890825[*BZ#1890825*]) + +* Previously, the Cluster Logging Operator (CLO) would attempt to reconcile the Elasticsearch resource, which depended upon the Red Hat-provided Elastic Custom Resource Definition (CRD). Attempts to list an unknown kind caused the CLO to exit its reconciliation loop. This happened because the CLO tried to reconcile all of its managed resources whether they were defined or not. The current release fixes this issue. The CLO only reconciles types provided by the Elasticsearch operator if a user defines managed storage. As a result, users can create collector-only deployments of cluster logging by deploying the CLO. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1891738[*BZ#1891738*]) + +* Previously, because of an LF GA syslog implementation for RFC 3164, logs sent to remote syslog were not compatible with the legacy behavior. The current release fixes this issue. AddLogSource adds details about log's source details to the "message" field. Now, logs sent to remote syslog are compatible with the legacy behavior. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1891886[*BZ#1891886*]) + +* Previously, the Elasticsearch rollover pods failed with a `resource_already_exists_exception` error. Within the Elasticsearch rollover API, when the next index was created, the `*-write` alias was not updated to point to it. As a result, the next time the rollover API endpoint was triggered for that particular index, it received an error that the resource already existed. ++ +The current release fixes this issue. Now, when a rollover occurs in the `indexmanagement` cronjobs, if a new index was created, it verifies that the alias points to the new index. This behavior prevents the error. If the cluster is already receiving this error, a cronjob fixes the issue so that subsequent runs work as expected. Now, performing rollovers no longer produces the exception. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1893992[*BZ#1893992*]) + +* Previously, Fluent stopped sending logs even though the logging stack seemed functional. Logs were not shipped to an endpoint for an extended period even when an endpoint came back up. This happened if the max backoff time was too long and the endpoint was down. The current release fixes this issue by lowering the max backoff time, so the logs are shipped sooner. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1894634[*BZ#1894634*]) + +* Previously, omitting the Storage size of the Elasticsearch node caused panic in the Elasticsearch Operator code. This panic appeared in the logs as: `Observed a panic: "invalid memory address or nil pointer dereference"` The panic happened because although Storage size is a required field, the software didn't check for it. The current release fixes this issue, so there is no panic if the storage size is omitted. Instead, the storage defaults to ephemeral storage and generates a log message for the user. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1899589[*BZ#1899589*]) + +* Previously, `elasticsearch-rollover` and `elasticsearch-delete` pods remained in the `Invalid JSON:` or `ValueError: No JSON object could be decoded` error states. This exception was raised because there was no exception handler for invalid JSON input. The current release fixes this issue by providing a handler for invalid JSON input. As a result, the handler outputs an error message instead of an exception traceback, and the `elasticsearch-rollover` and `elasticsearch-delete` jobs do not remain those error states. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1899905[*BZ#1899905*]) + +* Previously, when deploying Fluentd as a stand-alone, a Kibana pod was created even if the value of `replicas` was `0`. This happened because Kibana defaulted to `1` pod even when there were no Elasticsearch nodes. The current release fixes this. Now, a Kibana only defaults to `1` when there are one or more Elasticsearch nodes. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1901424[*BZ#1901424*]) + +* Previously, if you deleted the secret, it was not recreated. Even though the certificates were on a disk local to the operator, they weren't rewritten because they hadn't changed. That is, certificates were only written if they changed. The current release fixes this issue. It rewrites the secret if the certificate changes or is not found. Now, if you delete the master-certs, they are replaced. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1901869[*BZ#1901869*]) + +* Previously, if a cluster had multiple custom resources with the same name, the resource would get selected alphabetically when not fully qualified with the API group. As a result, if you installed both Red Hat’s Elasticsearch operator alongside the Elastic Elasticsearch operator, you would see failures when collected data via a must-gather report. The current release fixes this issue by ensuring must-gathers now use the full API group when gathering information about the cluster's custom resources. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1897731[*BZ#1897731*]) + +* An earlier bug fix to address issues related to certificate generation introduced an error. Trying to read the certificates caused them to be regenerated because they were recognized as missing. This, in turn, triggered the Elasticsearch operator to perform a rolling upgrade on the Elasticsearch cluster and, potentially, to have mismatched certificates. This bug was caused by the operator incorrectly writing certificates to the working directory. The current release fixes this issue. Now the operator consistently reads and writes certificates to the same working directory, and the certificates are only regenerated if needed. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1905910[*BZ#1905910*]) + +* Previously, queries to the root endpoint to retrieve the Elasticsearch version received a 403 response. The 403 response broke any services that used this endpoint in prior releases. This error happened because non-administrative users did not have the `monitor` permission required to query the root endpoint and retrieve the Elasticsearch version. Now, non-administrative users can query the root endpoint for the deployed version of Elasticsearch. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1906765[*BZ#1906765*]) + +* Previously, in some bulk insertion situations, the Elasticsearch proxy timed out connections between fluentd and Elasticsearch. As a result, fluentd failed to deliver messages and logged a `Server returned nothing (no headers, no data)` error. The current release fixes this issue: It increases the default HTTP read and write timeouts in the Elasticsearch proxy from five seconds to one minute. It also provides command-line options in the Elasticsearch proxy to control HTTP timeouts in the field. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1908707[*BZ#1908707*]) + +* Previously, in some cases, the {ProductName}/Elasticsearch dashboard was missing from the {product-title} monitoring dashboard because the dashboard configuration resource referred to a different namespace owner and caused the {product-title} to garbage-collect that resource. Now, the ownership reference is removed from the Elasticsearch Operator reconciler configuration, and the logging dashboard appears in the console. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1910259[*BZ#1910259*]) + +* Previously, the code that uses environment variables to replace values in the Kibana configuration file did not consider commented lines. This prevented users from overriding the default value of server.maxPayloadBytes. The current release fixes this issue by uncommenting the default value of server.maxPayloadByteswithin. Now, users can override the value by using environment variables, as documented. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1918876[*BZ#1918876*]) + +* Previously, logs were not sent to managed storage when legacy log forwarding was enabled. This happened because the internal generation of the `logforwarding` configuration improperly made a decision for either `logforwarding` or legacy `logforwarding`. The current release fixes this issue: Logs are sent to managed storage when the logstore is defined in the `clusterlogging` instance. Additionally, logs are sent to legacy `logforwarding` when enabled regardless of whether a managed logstore is enabled or not. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1921263[*1921263*]) + +* Previously, the Kibana log level was increased not to suppress instructions to delete indices that failed to migrate, which also caused the display of GET requests at the INFO level that contained the Kibana user's email address and OAuth token. The current release fixes this issue by masking these fields, so the Kibana logs do not display them. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1925081[*BZ#1925081*]) + + +[id="openshift-logging-5-0-technology-preview"] +== Technology Preview features + +Some features in this release are currently in Technology Preview. These experimental features are not intended for production use. Note the following scope of support on the Red Hat Customer Portal for these features: + +link:https://access.redhat.com/support/offerings/techpreview[Technology Preview Features Support Scope] + +In the table below, features are marked with the following statuses: + +* *TP*: _Technology Preview_ +* *GA*: _General Availability_ +* *-*: _Not Available_ + +.Technology Preview tracker +[cols="4,1,1,1",options="header"] +|==== +|Feature |OCP 4.5 |OCP 4.6 |Logging 5.0 + +|Log forwarding +|TP +|GA +|GA + +|==== + +[id="openshift-logging-5-0-known-issues"] +== Known issues + +* Fluentd pods with the `ruby-kafka-1.1.0` and `fluent-plugin-kafka-0.13.1` gems are not compatible with Apache Kafka version 0.10.1.0. ++ +As a result, log forwarding to Kafka fails with a message: `error_class=Kafka::DeliveryFailed error="Failed to send messages to flux-openshift-v4/1"` ++ +The `ruby-kafka-0.7` gem dropped support for Kafka 0.10 in favor of native support for Kafka 0.11. The `ruby-kafka-1.0.0` gem added support for Kafka 2.3 and 2.4. The current version of OpenShift Logging tests and therefore supports Kafka version 2.4.1. ++ +To work around this issue, upgrade to a supported version of Apache Kafka. ++ +(link:https://bugzilla.redhat.com/show_bug.cgi?id=1907370[*BZ#1907370*]) + +// UNUSED IN THIS 5.0.0 RELEASE - KEEP THIS BOILERPLATE FOR FUTURE RELEASES +// [id="openshift-logging-5-0-asynchronous-errata-updates"] +// == Asynchronous errata updates +// +// Security, bug fix, and enhancement updates for {ProductName} 5.0 are released as asynchronous errata through the Red Hat Network. All {ProductName} 5.0 errata are https://access.redhat.com/downloads/[available on the Red Hat Customer Portal]. See the https://access.redhat.com/support/policy/updates/openshift#logging[{ProductName} Life Cycle] for more information about asynchronous errata. +// // TBD Update https://access.redhat.com/downloads/ to something like https://access.redhat.com/downloads/content/201/ once the Logging product has been released. +// +// Red Hat Customer Portal users can enable errata notifications in the account settings for Red Hat Subscription Management (RHSM). When errata notifications are enabled, users are notified via email whenever new errata relevant to their registered systems are released. +// +// [NOTE] +// ==== +// Red Hat Customer Portal user accounts must have systems registered and consuming {ProductName} entitlements for {ProductName} errata notification emails to generate. +// ==== +// +// This section will continue to be updated over time to provide notes on enhancements and bug fixes for future asynchronous errata releases of {ProductName} 5.0. Versioned asynchronous releases, for example with the form {ProductName} 5.0.z, will be detailed in subsections. In addition, releases in which the errata text cannot fit in the space provided by the advisory will be detailed in subsections that follow. +// +// [IMPORTANT] +// ==== +// For any {ProductName} release, always review the instructions on xref:../updating/updating-cluster.adoc#TBD[updating your cluster] properly. +// ==== +// +// [id="openshift-logging-5-0-0-ga"] +// === RHBA-2020:66974-04 Errata Advisory for Openshift Logging 5.0.0 +// +// (link:https://errata.devel.redhat.com/docs/show/66974[RHBA-2020:66974-04 Errata Advisory for Openshift Logging 5.0.0]) is now available. New features, changes, +// +// Issued: 2021-02-24 +// +// {ProductName} release 5.0 is now available. The list of bug fixes that are included in the update is documented in the link:https://errata.devel.redhat.com/docs/show/66974[RHBA-2020:66974-04] advisory. + +// UNUSED IN THIS 5.0.0 RELEASE - KEEP THIS BOILERPLATE FOR FUTURE RELEASES +// The RPM packages that are included in the update are provided by the link:https://access.redhat.com/errata/RHBA-2020:5678[RHBA-2020:5678] advisory. +// +// Space precluded documenting all of the container images for this release in the advisory. See the following article for notes on the container images in this release: +// +// link:https://access.redhat.com/solutions/[{ProductName} 5.0.0 container image list] diff --git a/logging/cluster-logging.adoc b/logging/cluster-logging.adoc index 78eabb06c750..9d35bb7eddb7 100644 --- a/logging/cluster-logging.adoc +++ b/logging/cluster-logging.adoc @@ -1,6 +1,6 @@ :context: cluster-logging [id="cluster-logging"] -= Understanding OpenShift Logging += Understanding Red Hat OpenShift Logging include::modules/common-attributes.adoc[] toc::[] @@ -16,7 +16,7 @@ OpenShift Logging aggregates the following types of logs: * `application` - Container logs generated by user applications running in the cluster, except infrastructure container applications. * `infrastructure` - Logs generated by infrastructure components running in the cluster and {product-title} nodes, such as journal logs. Infrastructure components are pods that run in the `openshift*`, `kube*`, or `default` projects. -* `audit` - Logs generated by the node audit system (auditd), which are stored in the */var/log/audit/audit.log* file, and the audit logs from the Kubernetes apiserver and the OpenShift apiserver. +* `audit` - Logs generated by the node audit system (auditd), which are stored in the */var/log/audit/audit.log* file, and the audit logs from the Kubernetes apiserver and the OpenShift apiserver. [NOTE] ==== diff --git a/modules/cluster-logging-document-attributes.adoc b/modules/cluster-logging-document-attributes.adoc new file mode 100644 index 000000000000..74edda273c32 --- /dev/null +++ b/modules/cluster-logging-document-attributes.adoc @@ -0,0 +1,38 @@ +// Standard document attributes to be used in the Logging documentation +// +// The following are shared by all RHOSSM documents: +:toc: +:toclevels: 4 +:toc-title: +:experimental: +// +// Product content attributes, that is, substitution variables in the files. +// +:product-title: OpenShift Container Platform +:ProductName: Red Hat OpenShift Logging +:ProductShortName: Logging +:ProductRelease: +:ProductVersion: 5.0.0 +:MaistraVersion: 5.0 +:product-build: +:DownloadURL: registry.redhat.io +:cloud-redhat-com: Red Hat OpenShift Cluster Manager +:kebab: image:kebab.png[title="Options menu"] +// +// Documentation publishing attributes used in the master-docinfo.xml file +// Note that the DocInfoProductName generates the URL for the product page. +// Changing the value changes the generated URL. +// +:DocInfoProductName: OpenShift Logging +:DocInfoProductNumber: 5.0 +// +// Book Names: +// Defining the book names in document attributes instead of hard-coding them in +// the master.adoc files and in link references. This makes it easy to change the +// book name if necessary. +// Using the pattern ending in 'BookName' makes it easy to grep for occurrences +// throughout the topics +// +:Install_BookName: Installing Red Hat OpenShift Logging +:Using_BookName: Using Red Hat OpenShift Logging +:RN_BookName: Red Hat OpenShift Logging Release Notes diff --git a/release_notes/ocp-4-7-release-notes.adoc b/release_notes/ocp-4-7-release-notes.adoc index 8ca5fa0723eb..6c4df7002b25 100644 --- a/release_notes/ocp-4-7-release-notes.adoc +++ b/release_notes/ocp-4-7-release-notes.adoc @@ -712,57 +712,7 @@ link:https://bugzilla.redhat.com/show_bug.cgi?id=1775444[*BZ#1775444*] for more [id="ocp-4-7-cluster-logging-renamed-openshift-logging"] ==== Cluster Logging becomes Red Hat OpenShift Logging -With this release, Cluster Logging becomes Red Hat OpenShift Logging, version 5.0. - -[discrete] -[id="ocp-4-7-eo-max-five-shards"] -// https://bugzilla.redhat.com/show_bug.cgi?id=1883444 -==== Maximum five primary shards per index - -With this release, the Elasticsearch Operator (EO) sets the number of primary shards for an index between one and five, depending on the number of data nodes defined for a cluster. - -Previously, the EO set the number of shards for an index to the number of data nodes. When an index in Elasticsearch was configured with a number of replicas, it created that many replicas for each primary shard, not per index. Therefore, as the index sharded, a greater number of replica shards existed in the cluster, which created a lot of overhead for the cluster to replicate and keep in sync. - -[discrete] -[id="ocp-4-7-updated-eo-name"] -// https://bugzilla.redhat.com/show_bug.cgi?id=1898920 -==== Updated Elasticsearch Operator name and maturity level - -This release updates the display name of the Elasticsearch Operator and Operator maturity level. The new display name and clarified specific use for the Elasticsearch Operator are updated in Operator Hub. - -[discrete] -[id="ocp-4-7-es-csv-success"] -// https://bugzilla.redhat.com/show_bug.cgi?id=1913464 -==== Elasticsearch Operator reports on CSV success - -This release adds reporting metrics to indicate that installing or upgrading the Elasticsearch Operator ClusterServiceVersion (CSV) was successful. Previously, there was no way to determine, or generate an alert, if the CSV installation or upgrade for the Elasticsearch Operator failed. Now, an alert is provided as part of the Elasticsearch Operator. - -[discrete] -[id="ocp-4-7-es-operator-template-update-changes"] -==== Elasticsearch Operator template update changes - -The Elasticsearch Operator now only updates its rollover index templates if they have different field values. Index templates have a higher priority than indices. When the template is updated, the cluster prioritizes distributing them over the index shards, impacting performance. To minimize Elasticsearch cluster operations, the Operator only updates the templates when the number of primary shards or replica shards changes from what is currently configured. See link:https://bugzilla.redhat.com/show_bug.cgi?id=1920215[*1920215*] for more information. - -[discrete] -[id="ocp-4-7-reduced-cert-warnings"] -// https://bugzilla.redhat.com/show_bug.cgi?id=1884812 -==== Reduce Elasticsearch pod certificate permission warnings - -Previously, when the Elasticsearch pod started, it generated certificate permission warnings, which misled some users to troubleshoot their clusters. The current release fixes these permissions issues to reduce these types of notifications. - -[discrete] -[id="ocp-4-7-links-from-alerts"] -// https://bugzilla.redhat.com/show_bug.cgi?id=1913469 -==== New links from alerts to explanations and troubleshooting - -This release adds a link from the alerts that an Elasticsearch cluster generates to a page of explanations and troubleshooting steps for that alert. - -[discrete] -[id="ocp-4-7-curl-conn-timeout"] -// https://bugzilla.redhat.com/show_bug.cgi?id=1881709 -==== New connection timeout for deletion jobs - -The current release adds a connection timeout for deletion jobs, which helps prevent pods from occasionally hanging when they query Elasticsearch to delete indices. Now, if the underlying 'curl' call does not connect before the timeout period elapses, the timeout terminates the call. +With this release, _Cluster Logging_ becomes _Red Hat OpenShift Logging_, version 5.0. For more information, see xref:../logging/cluster-logging-release-notes.adoc[Red Hat OpenShift Logging 5.0 release notes]. [id="ocp-4-7-monitoring"] === Monitoring @@ -1394,49 +1344,7 @@ If you are using the option `--keep-manifest-list=true`, the only valid value fo *Red Hat OpenShift Logging* -* Previously, logs were not sent to managed storage when legacy log forwarding was enabled. This happened because the internal generation of the `logforwarding` configuration improperly made a decision for either `logforwarding` or legacy `logforwarding`. The current release fixes this issue: Logs are sent to managed storage when the logstore is defined in the `clusterlogging` instance. Additionally, logs are sent to legacy `logforwarding` when enabled regardless of whether a managed logstore is enabled or not. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1921263[*BZ#1921263*]) - -* Previously, the Fluentd collector pod went into a crash loop when the `ClusterLogForwarder` had an incorrectly-configured secret. The current release fixes this issue. Now, the `ClusterLogForwarder` validates the secrets and reports any errors in its status field. As a result, it does not cause the Fluentd collector pod to crash. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1888943[*BZ#1888943*]) - -* Previously, nodes did not recover from `Pending` status because a software bug did not correctly update their statuses in the Elasticsearch custom resource (CR). The current release fixes this issue, so the nodes can recover when their status is `Pending.` (link:https://bugzilla.redhat.com/show_bug.cgi?id=1887357[*BZ#1887357*]) - -* Previously, omitting the Storage size of the Elasticsearch node caused panic in the Elasticsearch Operator code. This panic appeared in the logs as: `Observed a panic: "invalid memory address or nil pointer dereference"` The panic happened because although Storage size is a required field, the software didn't check for it. The current release fixes this issue, so there is no panic if the storage size is omitted. Instead, the storage defaults to ephemeral storage and generates a log message for the user. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1899589[*BZ#1899589*]) - -* Previously, Elasticsearch rejected HTTP requests whose headers exceeded the default max header size, 8 KB. Now, the max header size is 128 KB, and Elasticsearch no longer rejects HTTP requests for exceeding the max header size. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1845293[*BZ#1845293*]) - -* Previously, when the Cluster Logging Operator (CLO) scaled down the number of Elasticsearch nodes in the `clusterlogging` CR to three nodes, it omitted previously-created nodes that had unique IDs. The Elasticsearch Operator rejected the update because it has safeguards that prevent nodes with unique IDs from being removed. Now, when the CLO scales down the number of nodes and updates the Elasticsearch CR, it marks nodes with unique IDs as count 0 instead of omitting them. As a result, users can scale down their cluster to 3 nodes by using the `clusterlogging` CR. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1879150[*BZ#1879150*]) - -* Previously, the Elasticsearch rollover pods failed with a `resource_already_exists_exception` error. Within the Elasticsearch rollover API, when the next index was created, the `*-write` alias was not updated to point to it. As a result, the next time the rollover API endpoint was triggered for that particular index, it received an error that the resource already existed. -+ -The current release fixes this issue. Now, when a rollover occurs in the `indexmanagement` cronjobs, if a new index was created, it verifies that the alias points to the new index. This behavior prevents the error. If the cluster is already receiving this error, a cronjob fixes the issue so that subsequent runs work as expected. Now, performing rollovers no longer produces the exception. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1893992[*BZ#1893992*]) - -* Previously, if you updated the Kibana resource configuration in the `clusterlogging` instance to `resource{}`, the resulting nil map caused a panic and changed the status of the Elasticsearch Operator to `CrashLoopBackOff`. The current release fixes this issue by initializing the map. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1889573[*BZ#1889573*]) - -* Previously, if you deleted a Kibana route, the Cluster Logging Operator (CLO) could not recover or recreate it. Now, the CLO watches the route, and if you delete the route, the Elasticsearch Operator can reconcile or recreate it. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1890825[*BZ#1890825*]) - -* Previously, `elasticsearch-rollover` and `elasticsearch-delete` pods remained in the `Invalid JSON:` or `ValueError: No JSON object could be decoded` error states. This exception was raised because there was no exception handler for invalid JSON input. The current release fixes this issue by providing a handler for invalid JSON input. As a result, the handler outputs an error message instead of an exception traceback, and the `elasticsearch-rollover` and `elasticsearch-delete` jobs do not remain those error states. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1899905[*BZ#1899905*]) - -* Previously, in some cases, the Red Hat OpenShift Logging/Elasticsearch dashboard was missing from the {product-title} monitoring dashboard because the dashboard configuration resource referred to a different namespace owner and caused the {product-title} to garbage-collect that resource. Now, the ownership reference is removed from the Elasticsearch Operator reconciler configuration, and the logging dashboard appears in the console. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1910259[*BZ#1910259*]) - -* Previously, Fluent stopped sending logs even though the logging stack seemed functional. Logs were not shipped to an endpoint for an extended period even when an endpoint came back up. This happened if the max backoff time was too long and the endpoint was down. The current release fixes this issue by lowering the max backoff time, so the logs are shipped sooner. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1894634[*BZ#1894634*]) - -* Previously, if you deleted the secret, it was not recreated. Even though the certificates were on a disk local to the Operator, they weren't rewritten because they hadn't changed. That is, certificates were only written if they changed. The current release fixes this issue. It rewrites the secret if the certificate changes or is not found. Now, if you delete the master certificates, they are replaced. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1901869[*BZ#1901869*]) - -* Previously, because of a bug, the software did not find some certificates and regenerated them. This triggered the Elasticsearch Operator to perform a rolling upgrade on the Elasticsearch cluster, which sometimes produced mismatched certificates. The current release fixes this issue. Now, the Operator consistently reads and writes certificates to the same working directory and only regenerates the certificates if needed. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1905910[*BZ#1905910*]) - -* Previously, queries to the root endpoint to retrieve the Elasticsearch version received a 403 response. The 403 response broke any services that used this endpoint in prior releases. This error happened because non-administrative users did not have the `monitor` permission required to query the root endpoint and retrieve the Elasticsearch version. Now, non-administrative users can query the root endpoint for the deployed version of Elasticsearch. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1906765[*BZ#1906765*]) - -* Previously, the Cluster Logging Operator (CLO) would attempt to reconcile the Elasticsearch resource, which depended upon the Red Hat-provided Elastic Custom Resource Definition (CRD). Attempts to list an unknown kind caused the CLO to exit its reconciliation loop. This happened because the CLO tried to reconcile all of its managed resources whether they were defined or not. The current release fixes this issue. The CLO only reconciles types provided by the Elasticsearch Operator if a user defines managed storage. As a result, users can create collector-only deployments of cluster logging by deploying the CLO. -(link:https://bugzilla.redhat.com/show_bug.cgi?id=1891738[*BZ#1891738*]) - -* Previously, when deploying Fluentd as a stand-alone, a Kibana pod was created even if the value of `replicas` was `0`. This happened because Kibana defaulted to `1` pod even when there were no Elasticsearch nodes. The current release fixes this. Now, a Kibana only defaults to `1` when there are one or more Elasticsearch nodes. -(link:https://bugzilla.redhat.com/show_bug.cgi?id=1901424[*BZ#1901424*]) - -* Previously, in some bulk insertion situations, the Elasticsearch proxy timed out connections between fluentd and Elasticsearch. As a result, fluentd failed to deliver messages and logged a `Server returned nothing (no headers, no data)` error. The current release fixes this issue: It increases the default HTTP read and write timeouts in the Elasticsearch proxy from five seconds to one minute. It also provides command-line options in the Elasticsearch proxy to control HTTP timeouts in the field. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1908707[*BZ#1908707*]) - -* Previously, the Kibana log level was increased not to suppress instructions to delete indices that failed to migrate, which also caused the display of GET requests at the INFO level that contained the Kibana user's email address and OAuth token. The current release fixes this issue by masking these fields, so the Kibana logs do not display them. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1925081[*BZ#1925081*]) - -* Previously, the fluentd collector pod went into a crash loop when the ClusterLogForwarder had multiple outputs using the same secret. The current release fixes this issue. Now, multiple outputs can share a secret. (link:https://bugzilla.redhat.com/show_bug.cgi?id=1890072[*BZ#1890072*]) +With this release, _Cluster Logging_ becomes _Red Hat OpenShift Logging_, version 5.0. For more information, see xref:../logging/cluster-logging-release-notes.adoc[Red Hat OpenShift Logging 5.0 release notes]. *Machine Config Operator* @@ -2049,15 +1957,7 @@ If a new node is being added to a Machine Config Pool that includes SR-IOV, this * The `stalld` service triggers a bug in the kernel, which results in the node freezing. In order to work around this issue, the Performance Addon Operator disables `stalld` by default. The fix impacts latency associated with DPDK based workloads, however the functionality will be restored once the kernel bug (link:https://bugzilla.redhat.com/show_bug.cgi?id=1912118[*BZ#1912118*]) is fixed. -* Fluentd pods with the `ruby-kafka-1.1.0` and `fluent-plugin-kafka-0.13.1` gems are not compatible with Apache Kafka version 0.10.1.0. -+ -As a result, log forwarding to Kafka fails with a message: `error_class=Kafka::DeliveryFailed error="Failed to send messages to flux-openshift-v4/1"` -+ -The `ruby-kafka-0.7` gem dropped support for Kafka 0.10 in favor of native support for Kafka 0.11. The `ruby-kafka-1.0.0` gem added support for Kafka 2.3 and 2.4. The current version of OpenShift Logging tests and therefore supports Kafka version 2.4.1. -+ -To work around this issue, upgrade to a supported version of Apache Kafka. -+ -(link:https://bugzilla.redhat.com/show_bug.cgi?id=1907370[*BZ#1907370*]) +* Fluentd pods with the `ruby-kafka-1.1.0` and `fluent-plugin-kafka-0.13.1` gems are not compatible with Apache Kafka version 0.10.1.0. For more information, see xref:../logging/cluster-logging-release-notes.adoc#openshift-logging-5-0-known-issues["Known issues" in the Red Hat OpenShift Logging 5.0 release notes]. * Precision Time Protocol (PTP) faults are observed on the Mellanox MT27800 Family [ConnectX-5] of adapter cards. In the `ptp4l` log, errors are observed which disturb clock synchronization. +