Skip to content

Deploy search engine#359

Merged
sbesson merged 22 commits intoIDR:masterfrom
khaledk2:master
May 2, 2022
Merged

Deploy search engine#359
sbesson merged 22 commits intoIDR:masterfrom
khaledk2:master

Conversation

@khaledk2
Copy link
Copy Markdown
Contributor

I have added playbooks to deploy searchengine, searchengine client and ELasticsearch.
"management-searchengine.yml" is used to configure and run all three applications.
There is a variables file (searchengine_vars.yml) that the user needs to customize before running the playbook
After deploying the apps using the playbook, it is needed to run another playbook (run_searchengine_index_cache_services.yml) for caching and indexing
As the caching and indexing processes take a long time, there are another two playbooks that enable the user to check if they have finished or not i.e. check_indexing_service.yml and check_caching_service.yml.

Comment thread ansible/management-searchengine.yml Outdated
Comment thread ansible/management-searchengine.yml Outdated
Comment thread ansible/management-searchengine.yml Outdated
Comment thread ansible/searchengine_vars.yml Outdated
@khaledk2 khaledk2 requested a review from sbesson January 27, 2022 12:27
@sbesson
Copy link
Copy Markdown
Member

sbesson commented Jan 27, 2022

After copying the vars files to match the name of the group and removing the variable pointing to a local path

(idr-ansible) (base) sbesson@ls30630:ansible ((db001bb...)) $ diff group_vars/searchengine_vars.yml group_vars/management-hosts.yml 
11d10
< ansible_python_interpreter: path/to/bin/python

the playbook executed until

TASK [configure elasticsearch  for docker searchengine] **********************************************************************************************************************************************
fatal: [test104-management]: FAILED! => {"ansible_facts": {"discovered_interpreter_python": "/usr/bin/python"}, "changed": false, "msg": "Docker SDK for Python version is 1.10.6 (test104-management.novalocal's Python /usr/bin/python). Minimum version required is 2.1.0 to set auto_remove option. Try `pip uninstall docker-py` followed by `pip install docker`."}

PLAY RECAP *******************************************************************************************************************************************************************************************
test104-management         : ok=12   changed=11   unreachable=0    failed=1    skipped=0    rescued=0    ignored=0   

Possible options to move forward are:

  • deploy the auto_remove option for now (does any behavior depend on it?)
  • review how the Docker SDK for Python is install and upgrade and/or create a local virtual environment with a recent version of the docker module

@khaledk2
Copy link
Copy Markdown
Contributor Author

auto_remove instructs to delete the container after it runs. I think we may comment on it for the time being, what do you think?

@sbesson
Copy link
Copy Markdown
Member

sbesson commented Jan 27, 2022

Agreed, let's comment it out and come back to it later in the testing.

@khaledk2
Copy link
Copy Markdown
Contributor Author

Sorry, I should mention before that I have commented on auto_remove and push the playbooks yesterday.

Comment thread ansible/group_vars/searchengine_vars.yml Outdated
Comment thread docs/searchengine_deployemnt.md Outdated
Comment thread ansible/group_vars/searchengine_vars.yml Outdated
Comment thread ansible/management-searchengine.yml
Comment thread ansible/check_caching_service.yml Outdated
@sbesson
Copy link
Copy Markdown
Member

sbesson commented Feb 1, 2022

Added a minimal configuration allowing to proxy the 5567 port under the /searchengine endpoint

TASK [ome.nginx_proxy : nginx | proxy config] *************************************************************************************************
--- before: /etc/nginx/conf.d/proxy-default.conf
+++ after: /Users/sbesson/.ansible/tmp/ansible-local-26677fmuyzkn/tmp_tkpwlmz/nginx-confd-proxy.j2
@@ -253,15 +253,6 @@
 
     }
 
-    location ^~ /searchengine {
-        proxy_pass http://searchengine/;
-        proxy_redirect http://searchengine $scheme://$server_name;
-
-
-        proxy_ignore_headers   "Set-Cookie" "Vary" "Expires";
-        proxy_hide_header Set-Cookie;
-    }
-
 
     add_header Access-Control-Allow-Origin $allow_origin;
 
​
changed: [test104-proxy] => (item={'nginx_proxy_is_default': True, 'nginx_proxy_additional_directives': ['add_header Access-Control-Allow-Origin $allow_origin']})
ok: [test104-proxy] => (item={'nginx_proxy_server_name': 'cachebuster', 'nginx_proxy_listen_http': 0, 'nginx_proxy_ssl': False, 'nginx_proxy_cachebuster_enabled': True, 'nginx_proxy_backends': [{'name': 'omerocached', 'location': '~ /webclient/metadata_*|/webclient/render_*|/webclient/get_thumbnail*|/webgateway/metadata_*|/webgateway/render_*|/webgateway/get_thumbnail*|/webclient/api/*|/webclient/search/*|/api/*|/webclient/img_detail/*|/iviewer/*|/figure/*|/gallery-api/*|/mapr/*', 'server': 'http://omeroreadwrite', 'cache_validity': '1d', 'read_timeout': 900}, {'name': 'omerostatic', 'location': '~ /static/*', 'server': 'http://omeroreadwrite', 'cache_validity': '1d'}, {'name': 'omero', 'location': '/', 'server': 'http://omeroreadwrite'}]})
ok: [test104-proxy] => (item={'nginx_proxy_server_name': 'idr-demo.openmicroscopy.org', 'nginx_proxy_ssl': True, 'nginx_proxy_redirect_map_locations': [], 'nginx_proxy_direct_locations': [{'location': '/', 'redirect301': '$scheme://idr.openmicroscopy.org$request_uri'}], 'nginx_proxy_backends': []})
​
TASK [ome.nginx_proxy : nginx | proxy upstream servers] ***************************************************************************************
--- before: /etc/nginx/conf.d/proxy-upstream.conf
+++ after: /Users/sbesson/.ansible/tmp/ansible-local-26677fmuyzkn/tmplufdxnu2/nginx-confd-proxy-upstream.j2
@@ -13,6 +13,3 @@
 upstream omeroreadwrite {
   server 192.168.3.22;
 }
-upstream searchengine {
-  server 192.168.3.120:5567;
-}

Following this morning's discussion, we are currently running into two issues:

  • the static files are missing from the endpoint. Khaled is looking into the Nginx configuration
  • the indexing process failed with a statement timeout. Khaled identified the issue as memory related and reduced the number of rows processed concurrently to allow the indexing to run. The management VM currently used for the deployment is relatively small with 8GB RAM and 4 VCPUs. When moving to a production state, we might consider hardening this configuration and provisioning the searchengine VM with more resources similarly to the OMERO ro/rw VMS.

…aching and set secret key for search engine and client
@khaledk2
Copy link
Copy Markdown
Contributor Author

  • I should have written that before. I have changed one line in the searchengine section in the proxy-default.conf Nginx configuration file to add searchengine" subdomain to the HOST header
    proxy_set_header Host $host/searchengine
  • Also, I have made some modifications in the search engine client code, to work when its URL is not the domain root.
  • I have renamed the "static" folder to the "searchengineclientstatic" folder in the search engine client.
  • I have changed the searchengine code to allow the number of rows to be configured externally (in the app configuration) so it can be customized according to the host machine configuration.
  • I have changed the deployment files and added "cache_rows" variable in management-searchengine-hosts.yml to set the number of rows and added an additional step to the management-searchengine.yml to set his configuration.
  • I have added searchengine_secret_key and searchengineclient_secret_key variables to set SECRET_KEYs for each searchengine and searchengine client and added steps inside the deployment yml file to configure them.
  • The docker images for each of searchengine and searchengineclient are hosted in my Docker Hub account, they are built and pushed manually. I have created GitHub actions to build and push them automatically to the openmicroscopy DockerHub account, they are in the testing stage.

Comment thread ansible/management-searchengine.yml Outdated
Comment thread ansible/management-searchengine.yml Outdated
Copy link
Copy Markdown
Member

@sbesson sbesson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the last set of commits, I was able to successfully deploy the search engine application and launch the indexing/caching processes on a fresh test104 deployment.

The following local changes to the idr-proxy.yml playbook/variables were also applied to deploy the client under the /searchengine endpoint:

diff --git a/ansible/group_vars/proxy-hosts.yml b/ansible/group_vars/proxy-hosts.yml
index e082a821..63dc5cd8 100644
--- a/ansible/group_vars/proxy-hosts.yml
+++ b/ansible/group_vars/proxy-hosts.yml
@@ -38,6 +38,8 @@ nginx_proxy_upstream_servers:
   servers: "{{ omero_omeroreadonly_hosts_external | map('regex_replace', '^(.*)$', '\\1:4065') | sort }}"
 - name: omeroreadwrite
   servers: "{{ omero_omeroreadwrite_hosts }}"
+- name: searchengine
+  servers: "{{ searchengine_hosts | map('regex_replace', '^(.*)$', '\\1:5567') | sort }}"
 
 # The regex is getting complicated, so unroll it into a list and join
 _nginx_proxy_omero_locations:
@@ -100,11 +102,19 @@ _nginx_proxy_backends_prometheus_federate:
   server: "http://{{ management_host_ansible | default('localhost') }}:9090/federate"
   cache_validity: 15s
 
+_nginx_proxy_backends_searchengine:
+- name: prometheusfederate
+  location: "^~ /searchengine"
+  server: http://searchengine/
+  host_header: "$host/searchengine"
+
+
 nginx_proxy_backends: >
   {{ _nginx_proxy_backends_omero +
      _nginx_proxy_backends_omerowebsockets +
      _nginx_proxy_backends_grafana_render +
-     _nginx_proxy_backends_prometheus_federate
+     _nginx_proxy_backends_prometheus_federate +
+     _nginx_proxy_backends_searchengine
   }}
 
 
diff --git a/ansible/idr-proxy.yml b/ansible/idr-proxy.yml
index edc0db47..adb601f2 100644
--- a/ansible/idr-proxy.yml
+++ b/ansible/idr-proxy.yml
@@ -61,6 +61,12 @@
             idr_environment | default('idr') + '-management-hosts'][0]]
             ['ansible_' + (idr_net_iface | default('eth0'))]['ipv4']['address']
         }}
+      searchengine_hosts: >-
+        {{
+          groups[idr_environment | default('idr') + '-management-hosts'] |
+          map('extract', hostvars,
+            ['ansible_' + (idr_net_iface | default('eth0')), 'ipv4', 'address']) | list
+        }}
     when: groups[idr_environment | default('idr') + '-management-hosts'] is defined
 
   roles:

A few inline comments and the client probably needs some testing once the indexing/caching has completed.
From a code perspective, I think we are approaching thew point where this playbook can be safely merged into the repository. Importantly, as things stand, this app is not included by default and needs to be deployed manually. Probably the biggest question for the IDR team is whether we would consider deploying it on all prod deployment as an experimental endpoint and/or the steps to move towards this target.

database_user_password: "{{ idr_secret_postgresql_password_ro | default('omero') }}"
searchenginecache_folder: /data/searchengine/searchengine/cacheddata/
search_engineelasticsearch_docker_image: docker.elastic.co/elasticsearch/elasticsearch:7.16.2
searchengine_docker_image: openmicroscopy/omero-searchengine:latest
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a deployment from scratch this will do the job but as soon as we want to update Docker image, it will be preferable to use named tagged images rather than latest.

searchengine_index: searchengine_index
cache_rows: 10000
# I think that the following two variables should be in secret
searchengine_secret_key: "fagfdssf3fgdnvhg56ghhgfhgfgh45f"
Copy link
Copy Markdown
Member

@sbesson sbesson Feb 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposing to update the valoue of these keys and migrate them as private variables

Comment thread docs/searchengine_deployemnt.md Outdated
* If the Postgresql database server is located at the same machine which hosts the searchengine, it is needed to:
* Edit pg_hba.conf file (one of the postgresql configuration files) and add two client ips (i.e. 10.11.0.10 and 10.11.0.11)
* Reload the configuration; so the PostgreSQL accepts the connection from indexing and caching services.
* As the caching and indexing processes take a long time, there are another two playbooks that enable the user to check if they have finished or not:
Copy link
Copy Markdown
Member

@sbesson sbesson Feb 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike the service set-up playbook, I am less convinced of the value of running and checking the indexing/caching via Ansible playbooks.

Unless there is an obvious alternative, happy to keep things as they are right now and revisit this behaviorin the future. I suspect this will become apparent as we start run these workflows during the app lifecycle e.g. before release.

@sbesson sbesson self-requested a review February 24, 2022 20:58
…ervice as all the cached data now is saved in Elasticsearch.
@khaledk2
Copy link
Copy Markdown
Contributor Author

I have pushed changes to run on the searchengine-hosts group and removed hdf5 caching service as all the cached data now is saved in Elasticsearch.

Comment thread ansible/group_vars/management-hosts.yml
Comment thread ansible/group_vars/management-hosts.yml Outdated
Comment thread ansible/management-searchengine.yml
Comment thread ansible/management-searchengine.yml
@khaledk2
Copy link
Copy Markdown
Contributor Author

khaledk2 commented Apr 20, 2022

I have renamed the files and increased cache_rows to 50000, I think we can increase it more than that.

@khaledk2
Copy link
Copy Markdown
Contributor Author

I have reverted renaming dockermanager-hosts.yml and renamed the files

Copy link
Copy Markdown
Member

@sbesson sbesson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the change to the hosts section of idr-searchengine.yml, I was able to deploy the searchengine stack on pilot-idr0000 and start the indexing process which completed in ~12h.

Comment thread ansible/idr-searchengine.yml Outdated
@khaledk2
Copy link
Copy Markdown
Contributor Author

khaledk2 commented Apr 28, 2022

These are the modifications to fix the issues of displaying swagger documents using the searchengineapi url

I have added a variable to the "searchengine-hosts.yml"; its value equals the URL prefix part (searchengineapi)
searchengineurlprefix: "searchengineapi"

It will be used to set the script_name when running the gunicorn

Also, I have changed the Nginx configuration at the searchengineapi section

location ^~ /searchengineapi { proxy_pass http://searchengineapi/searchengineapi; proxy_redirect http://searchengineapi/searchengineapi $scheme://$server_name; }

Copy link
Copy Markdown
Member

@sbesson sbesson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the latest set of changes and #367, I was able to successfully deploy the search engine stack onto a newly create pilot using the new group.

The API is available when forwarding the port 5577 and the client is available when accessing the port 5556.

The playbook is currently set up so that it will only run when executed manually against VMs with the correct groups.

As discussed this morning as part of the weekly IDR call, merging this so that we can make incremental progresses towards a production release of the new service via smaller PRs. I will capture the outstanding issues as todos.

@sbesson sbesson merged commit c7bc747 into IDR:master May 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants