Improve configuration of connection to Elasticsearch #127

smarsching · 2024-03-18T17:25:54Z

This PR introduces two major improvements:

It is now possible to use multiple URLs instead of a separately configured single hostname and port. This has two benefits: First, multiple hosts can be specified, enabling high-availability when using a multi-host Elasticsearch cluster. Second, the scheme can be specified, thus enabling the user to choose HTTPS instead of HTTP when connecting to the Elasticsearch cluster.
Credentials for connecting to the Elasticsearch cluster can be specified. This is required when security is enabled on the ES cluster. The supported methods are username / password, API key, and token-based authentication.

As a side effect, this PR also introduces three minor improvements:

The unused clusterName variable is removed from ElasticConfig.
The way how the esInitialized flag is used in ElasticConfig is changed, fixing a potential race condition.
Comments in application.properties that apparently were accidentally copied from the Elasticsearch server configuration file and were not relevant to the ChannelFinder service are removed.

smarsching · 2024-03-18T17:37:42Z

The issues that are reported by SonarCloud seem to be false positives. I cannot see anything that is wrong with the use of properties in the marked code and it works as expected.

jacomago · 2024-03-19T10:48:58Z

@smarsching You're right. Sonarcloud doesn't seem to be good at evaluating those expressions. I'll try turning them off.

For the integration tests, you might need to update the docker-compose files to get them inline with your changes. And/or the application.properties file in the test resources folder.

smarsching · 2024-03-19T10:58:46Z

@jacomago I just took a look at the logs from the integration tests, and to me it does not look like the test failure is related to my changes. I specifically designed the changes, so that configuration through the old properties is still possible (so that users do not have to adapt their working configuration when upgrading).

To me, it rather looks like a problem with the Elasticsearch service in the test environment. For example:

Search failed for {testProperty=[*]} Cause [es/search] failed: [search_phase_execution_exception] all shards failed”)

If you like, I can open a dummy PR without any real changes in order to confirm whether the integration tests go through for the unchanged code or not.

jacomago · 2024-03-19T11:12:23Z

Another question I have is why do you want to do load balancing on Channelfinder, rather than using an inbetween load balancer? I can only think of doing this if you are using elastic for multiple applications, and then the setup of a load balancer would have the benefit of affecting every application. We currently have no performance problems with Channelfinder, so I'm not quite sure the use of multiple elastic nodes.

If you don't want to use a load balancer. I would rather you split up the adding authentication and adding multiple elastic nodes into multiple commits anyway.

In addition I think you should update the docker compose files and test properties to use the authentication, so we know it works in the tests.

smarsching · 2024-03-19T11:30:24Z

Another question I have is why do you want to do load balancing on Channelfinder, rather than using an inbetween load balancer? I can only think of doing this if you are using elastic for multiple applications, and then the setup of a load balancer would have the benefit of affecting every application. We currently have no performance problems with Channelfinder, so I'm not quite sure the use of multiple elastic nodes.

This is not about performance or load-balancing but about high availability. Yes, we run multiple applications on this ES cluster, but adding a load balancer in front of the cluster would make this load balancer the single point of failure, so there would be nothing gained from having a multi-node ES cluster.

Of course we could add a fail-over setup for the load-balancer, but this would render the whole setup even more complex. In general, we have a policy of implementing failover in the client, where reasonably possible, because it reduces complexity and makes it easier to find the underlying issue when a connection fails.

As the ES client library already supports this kind of configuration, just using it seemed like the easiest approach.

If you don't want to use a load balancer. I would rather you split up the adding authentication and adding multiple elastic nodes into multiple commits anyway.

I already implemented this in multiple commits, it’s just a common branch because I needed all these changes combined for our setup. Authentication is implemented in 0eef1ab and the switch to URLs / supporting multiple ES hosts is implemented in a750938. As written earlier, the second change preserves the ability to configure the host and port through the existing properties, so it should not affect users who don’t need it in any way.

In addition I think you should update the docker compose files and test properties to use the authentication, so we know it works in the tests.

That’s a good idea. I will look into how authentication for ES can be enabled in the test environment.

smarsching · 2024-03-19T16:33:26Z

I have added the changes, so that the test setup should use authentication in 42ff81f.

jacomago

Can you also update the docker-compose.yml and the docker-compose-integrationtest.yml to include the authorization.

Thanks for the update.

src/main/java/org/phoebus/channelfinder/ElasticConfig.java

.github/workflows/codecov.yml

.github/workflows/maven.yml

src/main/java/org/phoebus/channelfinder/ElasticConfig.java

src/test/resources/application_test.properties

smarsching · 2024-03-22T16:53:12Z

@jacomago Thank you very much for your code review. I addressed the issues that you pointed out as far as I could with a reasonable amount of effort.

Regarding the authentication in the test environment, the only solution that I can see would be to write our own action for starting Elasticsearch instead of using https://github.com/ankane/setup-elasticsearch, but I do not think that just testing one small aspect (that authorization headers are added correctly) does not just the amount of work needed to completely replace that action with our own code, in particular from a maintenance perspective (updating the code for future versions of Elasticsearch).

jacomago · 2024-03-25T12:21:23Z

@smarsching Figured out why the tests are failing. It is the new code. Seems like HttpHost when built from just a string, uses the string as the hostname rather than as a uri.
Solution is to set the variable elasticsearch.host_urls to be a List and use a conversion in getHttpHosts like so:

HttpHost[] localHttpHosts = this.httpHosts.stream().map(HttpHost::create).toArray(HttpHost[]::new);

smarsching · 2024-03-25T16:02:20Z

@jacomago Thanks for checking the code. You are right, I broke this when I changed the code to directly inject the property into an array of HttpHost objects.

I didn’t notice it because I didn’t update our ChannelFinder instance to use the newest version of the code after this change. Now, I have updated our ChannelFinder to use the latest version of this branch and it seems to be finde, so I am reasonably sure that this problem is resolved.

jacomago · 2024-03-28T11:39:16Z

@smarsching I think maybe the problem might have been with the version of elasticsearch. I've updated it on master, so if you rebase or merge in the changes then hopefully the CI tests will pass.
I think the CI tests should be passable since other actions have passed recently.

Before, esInitialized would be reset when createClient was called a second time.

This enables the ChannelFinder to use continue working when one of the hosts in an Elasticsearch HA cluster goes down. By switching to URLs, it is also possible to connect to Elasticsearch through HTTPS instead of HTTP.

The latter one expects a hostname instead of a URL.

smarsching · 2024-03-28T12:17:03Z

@jacomago I just rebased my changes on top of the current master branch, but the checks still seem to fail.

Interestingly, when I run the checks locally with act, they also fail for both the master and es-connection-improvements branch.

Looking at the log output, the cause of the issues seems to have changed though: It seems like most or all tests fail now, but they consistently fail with NoClassDefFound com/fasterxml/jackson/core/StreamReadConstraints. I will try to find the cause of this exception.

Using a version of jackson-core that does not match jackson-databind causes problems (see ChannelFinder#127 for details).

smarsching · 2024-03-28T13:17:43Z

@jacomago I managed to fix the problem with Jackson: The cause was that different versions of jackson-core (2.14.2) and jackson-databind (2.16.0) were used. This problem got introduced when I rebased my changes on the latest version of the master branch.

After fixing this issue in my branch, we are now back to the original situation, where only a few tests fail. I created a separate PR #130, that only fixes the Jackson issue. If the CI checks succeed for that PR, we know that some of the other changes I made must be the cause. If the CI fails for that PR as well, the problems must also be present in the master branch.

jacomago · 2024-03-28T13:21:40Z

After fixing this issue in my branch, we are now back to the original situation, where only a few tests fail.

Just to note that the tests that fail are the ones that rely on the elasticsearch setup by the action. The tests that passed create a new elastic docker container for each test (a bit complicated I know, plans are to merge the tests).

sonarqubecloud · 2024-03-28T13:27:32Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

smarsching · 2024-03-28T13:41:03Z

@jacomago Okay, so the cause for the problems were changes that I made as part of this PR, in particular the change that I made to the initialization logic.

I changed:

        esInitialized.set(!Boolean.parseBoolean(createIndices));
        if (esInitialized.compareAndSet(false, true)) {
            config.elasticIndexValidation(client);
        }

to

         if (Boolean.parseBoolean(createIndices) && esInitialized.compareAndSet(false, true)) {
             config.elasticIndexValidation(client);
         }

The difference is that my version only runs config.elasticIndexValidation(client) once, while the original version runs it every time createClient(…) is callled. It seems that for some reason, calling this method mulitple times is needed, at least in the test context.

Under this circumstances, having the esInitialized flag does not make any sense, so now I made changes to remove it completely:

        if (Boolean.parseBoolean(createIndices)) {
            config.elasticIndexValidation(client);
        }

This way, it at least is clear that this method might be called more than once.

With these checks now succeeding, are there any other issues that need to be addressed before this PR can be merged?

jacomago · 2024-03-28T15:14:30Z

Awesome! I will merge it in.

smarsching · 2024-03-28T15:24:54Z

@jacomago Thanks for your help in figuring this issue out.

jacomago requested changes Mar 20, 2024

View reviewed changes

smarsching force-pushed the es-connection-improvements branch 3 times, most recently from 8aa401b to 6644244 Compare March 22, 2024 16:46

smarsching added 13 commits March 28, 2024 13:01

Remove unused clusterName variable.

96231d1

Avoid duplicate attempts to create indices.

2c493eb

Before, esInitialized would be reset when createClient was called a second time.

Allow specifying multiple Elasticsearch URLs.

b64a57d

This enables the ChannelFinder to use continue working when one of the hosts in an Elasticsearch HA cluster goes down. By switching to URLs, it is also possible to connect to Elasticsearch through HTTPS instead of HTTP.

Allow authentication with the Elasticsearch cluster.

103802f

Use equals for string comparison.

529772f

Avoid hiding class field with local variable.

7b7ed27

Remove unnecessary parentheses.

ca73e5e

Use array for ES host URLs.

992c882

Fix typo in log message.

a78b24b

Use lambda expression instead of anonymous class.

7a03a31

Use elasticsearch.host_urls property in test configuration.

49548d5

Use equal sign instead of colon in properties files.

f49786b

Use HttpHost.create instead of HttpHost::new.

b10db0c

The latter one expects a hostname instead of a URL.

smarsching force-pushed the es-connection-improvements branch from c1d3f26 to b10db0c Compare March 28, 2024 12:01

Use version 2.16.0 of jackson-core.

19236ff

Using a version of jackson-core that does not match jackson-databind causes problems (see ChannelFinder#127 for details).

Revert changes to index validation logic.

67d1fff

smarsching added a commit to KIT-IBPT/ChannelFinderService that referenced this pull request Mar 28, 2024

Use version 2.16.0 of jackson-core.

605027c

Using a version of jackson-core that does not match jackson-databind causes problems (see ChannelFinder#127 for details).

smarsching mentioned this pull request Mar 28, 2024

Use common version for jackson-core and jackon-databind #130

Closed

Remove esInitialized flag.

2674f19

jacomago approved these changes Mar 28, 2024

View reviewed changes

jacomago merged commit d6e571b into ChannelFinder:master Mar 28, 2024

smarsching deleted the es-connection-improvements branch March 28, 2024 15:26

smarsching mentioned this pull request Apr 25, 2024

Improve Elasticsearch connection in alarm logging service ControlSystemStudio/phoebus#3001

Merged

Improve configuration of connection to Elasticsearch #127

Improve configuration of connection to Elasticsearch #127

Uh oh!

Conversation

smarsching commented Mar 18, 2024

Uh oh!

smarsching commented Mar 18, 2024

Uh oh!

jacomago commented Mar 19, 2024

Uh oh!

smarsching commented Mar 19, 2024

Uh oh!

jacomago commented Mar 19, 2024

Uh oh!

smarsching commented Mar 19, 2024

Uh oh!

smarsching commented Mar 19, 2024

Uh oh!

jacomago left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

smarsching commented Mar 22, 2024

Uh oh!

jacomago commented Mar 25, 2024

Uh oh!

smarsching commented Mar 25, 2024

Uh oh!

jacomago commented Mar 28, 2024

Uh oh!

smarsching commented Mar 28, 2024

Uh oh!

smarsching commented Mar 28, 2024

Uh oh!

jacomago commented Mar 28, 2024

Uh oh!

sonarqubecloud bot commented Mar 28, 2024

Quality Gate passed

Uh oh!

smarsching commented Mar 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jacomago commented Mar 28, 2024

Uh oh!

smarsching commented Mar 28, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

smarsching commented Mar 28, 2024 •

edited

Loading