From 8cefd78d85ccd42effc261c833b5a0fc0794d357 Mon Sep 17 00:00:00 2001 From: Daniel <61777625+Sokadyn@users.noreply.github.com> Date: Sun, 23 Apr 2023 22:40:21 +0200 Subject: [PATCH 1/3] Dev (#13) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * added cli submodule * use dev branch of execDAT-CLI * use main branch for checkout of cli in main * add index and valueproposition * Template for the ADRs We adapted an ADR template we found online. Some changes are unique to our template: We are using the date as prefix in the ADR folder for natural ordering. Also, we require a link to the corresponding ADR in deprecated and superseded ADRs. * add operator as submodule * add architecture and local k3d setup * Template for the ADRs (#6) We adapted an ADR template we found online. Some changes are unique to our template: We are using the date as prefix in the ADR folder for natural ordering. Also, we require a link to the corresponding ADR in deprecated and superseded ADRs. * public vs private ADR Co-authored-by: Thomas Weber * testing for gpg signing * accepted public-vs-private-data ADR * proposed branch-naming ADR * fix: use the port of the devel (#7) * Start CICD ADR * test signed commit * Fix commit sign * Finish CICD ADR * Some small changes to cicd ADR * Start lecture 9 readme Co-authored-by: Naexon * finish lecture 9 Co-authored-by: Thomas Weber --------- Co-authored-by: Naexon Co-authored-by: Naexon <75274749+Naexon@users.noreply.github.com> Co-authored-by: Thomas Weber Co-authored-by: Daniel Hofstätter Co-authored-by: Thomas Weber Co-authored-by: Naexon --- .gitmodules | 8 + README.md | 16 +- docs/Index.md | 5 + docs/ValueProposition.md | 91 ++++++ docs/adrs/.gitignore | 0 .../adrs/2023-03-16-public-vs-private-data.md | 18 ++ docs/adrs/2023-03-23-branch-naming.md | 25 ++ docs/adrs/2023-03-23-cicd-solution.md | 19 ++ docs/adrs/2023-03-DD-regsitry-solution.md | 26 ++ .../2023-03-DD-storage-bucket-solution.md | 26 ++ docs/adrs/YYYY-MM-DD-template.md | 26 ++ docs/drawio/architecture_overview.drawio | 294 ++++++++++++++++++ docs/images/.gitignore | 0 execDAT-CLI | 1 + execDAT-operator | 1 + k3d-dev.yaml | 47 +++ lecture9/README.md | 120 +++++++ 17 files changed, 722 insertions(+), 1 deletion(-) create mode 100644 .gitmodules create mode 100644 docs/Index.md create mode 100644 docs/ValueProposition.md create mode 100644 docs/adrs/.gitignore create mode 100644 docs/adrs/2023-03-16-public-vs-private-data.md create mode 100644 docs/adrs/2023-03-23-branch-naming.md create mode 100644 docs/adrs/2023-03-23-cicd-solution.md create mode 100644 docs/adrs/2023-03-DD-regsitry-solution.md create mode 100644 docs/adrs/2023-03-DD-storage-bucket-solution.md create mode 100644 docs/adrs/YYYY-MM-DD-template.md create mode 100644 docs/drawio/architecture_overview.drawio create mode 100644 docs/images/.gitignore create mode 160000 execDAT-CLI create mode 160000 execDAT-operator create mode 100644 k3d-dev.yaml create mode 100644 lecture9/README.md diff --git a/.gitmodules b/.gitmodules new file mode 100644 index 0000000..1bfe593 --- /dev/null +++ b/.gitmodules @@ -0,0 +1,8 @@ +[submodule "execDAT-CLI"] + path = execDAT-CLI + url = git@github.com:AustrianDataLAB/execDAT-CLI.git + branch = main +[submodule "execDAT-operator"] + path = execDAT-operator + url = git@github.com:AustrianDataLAB/execDAT-operator.git + branch = main diff --git a/README.md b/README.md index 622d3d9..8a2c0e7 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,16 @@ # execDAT -execDAT - remote code execution for research + +execDAT - remote code execution for research + +## Getting Started + +### Prerequisites + +* k3d +* docker + +### Start k3d cluster + +```shell +k3d cluster create -c k3d-dev.yaml +``` diff --git a/docs/Index.md b/docs/Index.md new file mode 100644 index 0000000..f884d33 --- /dev/null +++ b/docs/Index.md @@ -0,0 +1,5 @@ +# ExecDat + +Table of Contents + +- [Value Proposition](./ValueProposition.md) diff --git a/docs/ValueProposition.md b/docs/ValueProposition.md new file mode 100644 index 0000000..09dc035 --- /dev/null +++ b/docs/ValueProposition.md @@ -0,0 +1,91 @@ +# ExecDat - Value Proposition + +## What is the core value being generated? + +The goal of this project is to provide researchers with an easy and efficient way to execute and verify scientific evaluations, regardless of whether the required data and code is local or remote. + +This will be achieved through the development of a user-friendly tool, which could take the form of a CLI tool, kubectl plugin, or API endpoint. The tool will enable easy remote execution of code that requires access to remote or local datasets, simplifying the research process and reducing the technical barriers to entry. + +## Team + +### Project owner / Deputy owner + +DAT Team + +### Team members + +Daniel Hofstätter, Alexander Woda and Thomas Weber + +## Problem Space + +Why are we doing this? How do we judge success? + +### Problem statement + +Researchers face difficulties executing code that requires access to remote or local datasets. + +I.e., executing scientific evaluations on those datasets exclusively local might be problematic because users face large dataset sizes and have certain dependencies on, for example Operating Systems or Hardware. Furthermore, the current coupling of code to local hardware leads to limitations in parallel executions, resulting in high evaluation and iteration times. + +### Impact of this problem + +The impact of the problem is that it can slow down the progress of research and create barriers to entry for researchers with limited technical expertise. The manual setup and management of research environments can be time-consuming, distracting, and prone to errors. This can limit the ability of researchers to explore and analyze data, and ultimately, hinder the development of new scientific insights and breakthroughs. The impact is especially significant in fields such as data science and machine learning, where access to large and complex datasets is crucial for research. + +E.g., imagine a scientific paper is published, or going to be published, and reviewers want to verify results in them, maybe even for different datasets. Downloading Gigabytes of data or demanding hours of runtime on limited hardware slows down the review process. + +### Who is the customer/ target audience + +The target audiences for the proposed software tool are researchers and scientists who require access to remote or local datasets for their research. This includes researchers in fields such as data science, machine learning, and other areas that require extensive data analysis. + +For Example: + +Everyone interested in research, but with an initial scope limited to Universities (Professors, students, etc.) +Universities to host our service and provide access to staff +Research teams at any organization + +### Criteria for Success + +We provide simplicity of execution, reusability of environments, proofable validity of results and asynchronicity in the evaluation process. Our solution is to create a user-friendly software tool that simplifies the process, reduces time and effort, and allows researchers to focus on their research questions. + +According to these goals, we define the following criteria: + +Usability: One simple function call should be enough. +Scalability: Multiple users should be able to do evaluations in parallel. +Flexibility: Should support multiple languages and a variety of operating systems. +Repeatability: Different users should get the same results for the same evaluation. + +## MVP + +### What needs to be true in order for a prototype to be ready for release? + +We can ship an MVP to researchers and scientists, as soon as we have the following MUST-HAVEs: + +#### Functional MUST-HAVEs + +Remote code execution: The tool enables remote execution of code that requires access to remote datasets. + +Parallel execution support: The tool supports parallel execution of scientific evaluations to reduce evaluation and iteration times. + +Result size: A maximum size of one Gigabyte in the result file is supported. +User interaction: Only one CLI command and one config file is needed to run a job. +E.g., "execDAT " or "kubectl apply -f " + +#### Non-Functional MUST-HAVEs + +Flexibility: We support at least two different environment configurations. +Scalability: Scales to at least two users each having at least two jobs running. +Validity of results: Two users executing with the same configuration file get the same result. + +### What crucial factors are we missing? + +Definition of work packages +Technical Overview Diagram +Cluster Access + +### What is the key question we would ask to understand if we are on the right track? + +Do we simplify the research process? +In a side-by-side comparison, are users preferring to use our service, compared to a local execution of the task on their hardware? + +### Who are the alpha testers that we can use for validating our assumptions? + +DAT Team diff --git a/docs/adrs/.gitignore b/docs/adrs/.gitignore new file mode 100644 index 0000000..e69de29 diff --git a/docs/adrs/2023-03-16-public-vs-private-data.md b/docs/adrs/2023-03-16-public-vs-private-data.md new file mode 100644 index 0000000..b10e381 --- /dev/null +++ b/docs/adrs/2023-03-16-public-vs-private-data.md @@ -0,0 +1,18 @@ +# Decide if source code an data of users can be private. + +Date: 2023-03-16 + +## Status +__ACCEPTED__ + +## Context + +The jobs need to access the data and source code of the user in order to create the image and run the task. Private repositories need additional user authentication whereas public ones don't. + +## Decision + +For now we only allow public code repositories and data sources. This means that the code and data of the user is public. This is the easiest way to implement the jobs. We can always change this later. + +## Consequences + +This means that the user has to make the code and data public. This is not a problem for the user, because the user wants to publish the code and data anyway. The user can always make the code and data private later. \ No newline at end of file diff --git a/docs/adrs/2023-03-23-branch-naming.md b/docs/adrs/2023-03-23-branch-naming.md new file mode 100644 index 0000000..9a577f3 --- /dev/null +++ b/docs/adrs/2023-03-23-branch-naming.md @@ -0,0 +1,25 @@ +# Branch Naming + +Date: 2023-03-23 + +## Status +__PROPOSED__ + +## Context +We want a unified naming scheme for the naming of branches. Currently, no concrete scheme was decided on and we had a discussion between `feature/`, `features/` and `issue/` prefixes for branches beside `dev` or `main`. + +## Decision +We decided on the following namings: +* `main` for the main branch +* `dev` for the development branch +* `feature/` for all branches that implement a new functionality or feature +* `issue/` for all branches that are concerned with a bug-fixe or issue +* `testing/` for all branches that fit neither `feature/` or `issue/` + +We researched Pre-Commit-Hooks to enforce this, however a local installation of the CLI tool would be required and we do not want the added tool requirements and complexity. + +Instead we will use the __branch protection rules__ to pattern match all other names and lock the corresponding branches. This should correspond some type of enforcing. + +## Consequences +* developers need to adhere to the naming scheme for branches +* tighter control over the branch protection rules because we only have a small set of legal names \ No newline at end of file diff --git a/docs/adrs/2023-03-23-cicd-solution.md b/docs/adrs/2023-03-23-cicd-solution.md new file mode 100644 index 0000000..03da31d --- /dev/null +++ b/docs/adrs/2023-03-23-cicd-solution.md @@ -0,0 +1,19 @@ +# Title + +Date: 2023-03-23 + +## Status + +__PROPOSED__ + +## Context + +We need to dicide on a CI/CD solution for our project, so we can automate certain tasks, e.g., building, testing, releasing, etc.. + +## Decision + +Some choices for our CICD platform would be GitHub Actions, Tekton, Jenkins or Argo CD. Solutions like Tekton or Argo CD are build up upon Kubernetes, are cloud-native and platform agnostic. GitHub Actions workflows are much simpler, have predefined workflow steps and only require a yaml file for configuration. We decided to use GitHub Actions workflows because we already use GitHub for other related tasks, such as branch naming and protection rules, and therfore we have all of our configuration in one place. Additionally, GitHub Actions are much easier to setup and there are many already existing yaml configurations we can build up upon. + +## Consequences + +By choosing GitHub Action workflows, compared to running custom workflows in a Kubernetes environment with other solutions, we have a much simpler setup. But we are also more limited in our possibilities, as you have more options in a custom Kubernetes cluster. diff --git a/docs/adrs/2023-03-DD-regsitry-solution.md b/docs/adrs/2023-03-DD-regsitry-solution.md new file mode 100644 index 0000000..2c75292 --- /dev/null +++ b/docs/adrs/2023-03-DD-regsitry-solution.md @@ -0,0 +1,26 @@ +# Title + +Date: YYYY-MM-DD + +## Status + +What is the status if the ADR? + +Possible options: +* __PROPOSED__ +* __ACCEPTED__ +* __REJECTED__ +* __DEPRECATED__ (include reference to the superseding ADR) +* __SUPERSEDED__ (include reference to the deprecating ADR) + +## Context + +What is the context of this ADR? What is the issue that we are seeing? What is motivating this decision or change? + +## Decision + +What is the change we are proposing? What do we plan on doing to solve the issue? + +## Consequences + +What are the consequences of the change? What will be more difficult? What will be easier? \ No newline at end of file diff --git a/docs/adrs/2023-03-DD-storage-bucket-solution.md b/docs/adrs/2023-03-DD-storage-bucket-solution.md new file mode 100644 index 0000000..2c75292 --- /dev/null +++ b/docs/adrs/2023-03-DD-storage-bucket-solution.md @@ -0,0 +1,26 @@ +# Title + +Date: YYYY-MM-DD + +## Status + +What is the status if the ADR? + +Possible options: +* __PROPOSED__ +* __ACCEPTED__ +* __REJECTED__ +* __DEPRECATED__ (include reference to the superseding ADR) +* __SUPERSEDED__ (include reference to the deprecating ADR) + +## Context + +What is the context of this ADR? What is the issue that we are seeing? What is motivating this decision or change? + +## Decision + +What is the change we are proposing? What do we plan on doing to solve the issue? + +## Consequences + +What are the consequences of the change? What will be more difficult? What will be easier? \ No newline at end of file diff --git a/docs/adrs/YYYY-MM-DD-template.md b/docs/adrs/YYYY-MM-DD-template.md new file mode 100644 index 0000000..2c75292 --- /dev/null +++ b/docs/adrs/YYYY-MM-DD-template.md @@ -0,0 +1,26 @@ +# Title + +Date: YYYY-MM-DD + +## Status + +What is the status if the ADR? + +Possible options: +* __PROPOSED__ +* __ACCEPTED__ +* __REJECTED__ +* __DEPRECATED__ (include reference to the superseding ADR) +* __SUPERSEDED__ (include reference to the deprecating ADR) + +## Context + +What is the context of this ADR? What is the issue that we are seeing? What is motivating this decision or change? + +## Decision + +What is the change we are proposing? What do we plan on doing to solve the issue? + +## Consequences + +What are the consequences of the change? What will be more difficult? What will be easier? \ No newline at end of file diff --git a/docs/drawio/architecture_overview.drawio b/docs/drawio/architecture_overview.drawio new file mode 100644 index 0000000..2b36fa8 --- /dev/null +++ b/docs/drawio/architecture_overview.drawio @@ -0,0 +1,294 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/docs/images/.gitignore b/docs/images/.gitignore new file mode 100644 index 0000000..e69de29 diff --git a/execDAT-CLI b/execDAT-CLI new file mode 160000 index 0000000..84dd8f5 --- /dev/null +++ b/execDAT-CLI @@ -0,0 +1 @@ +Subproject commit 84dd8f5b66d070aefa88eab195a7aa7fb0cf24cb diff --git a/execDAT-operator b/execDAT-operator new file mode 160000 index 0000000..75ee221 --- /dev/null +++ b/execDAT-operator @@ -0,0 +1 @@ +Subproject commit 75ee221e70b73d4857c420e8de590ca57c3e4081 diff --git a/k3d-dev.yaml b/k3d-dev.yaml new file mode 100644 index 0000000..bd73196 --- /dev/null +++ b/k3d-dev.yaml @@ -0,0 +1,47 @@ +--- +apiVersion: k3d.io/v1alpha4 +kind: Simple +metadata: + name: execdat-dev +servers: 1 +agents: 0 +image: docker.io/rancher/k3s:v1.25.6-k3s1 + +# kubeAPI: # same as `--api-port myhost.my.domain:6445` (where the name would resolve to 127.0.0.1) +# host: "myhost.my.domain" # important for the `server` setting in the kubeconfig +# hostIP: "127.0.0.1" # where the Kubernetes API will be listening on +# hostPort: "6445" # where the Kubernetes API listening port will be mapped to on your host system +network: k3d-network # same as `--network my-custom-net` +ports: + - port: 6699:80 # same as `--port '8080:80@loadbalancer'` + nodeFilters: + - loadbalancer + +# registries: # define how registries should be created or used +# create: # creates a default registry to be used with the cluster; same as `--registry-create registry.localhost` +# name: registry.localhost +# host: "0.0.0.0" +# hostPort: "5000" +# proxy: # omit this to have a "normal" registry, set this to create a registry proxy (pull-through cache) +# remoteURL: https://registry-1.docker.io # mirror the DockerHub registry +# username: "" # unauthenticated +# password: "" # unauthenticated +# volumes: +# - /some/path:/var/lib/registry # persist registry data locally +# use: +# - k3d-myotherregistry:5000 # some other k3d-managed registry; same as `--registry-use 'k3d-myotherregistry:5000'` +# config: | # define contents of the `registries.yaml` file (or reference a file); same as `--registry-config /path/to/config.yaml` +# mirrors: +# "my.company.registry": +# endpoint: +# - http://my.company.registry:5000 + +options: + k3s: # options passed on to K3s itself + extraArgs: # additional arguments passed to the `k3s server|agent` command; same as `--k3s-arg` + - arg: "--tls-san=k3d.localhost" + nodeFilters: + - server:* + kubeconfig: + updateDefaultKubeconfig: true # add new cluster to your default Kubeconfig; same as `--kubeconfig-update-default` (default: true) + switchCurrentContext: true # also set current-context to the new cluster's context; same as `--kubeconfig-switch-context` (default: true) diff --git a/lecture9/README.md b/lecture9/README.md new file mode 100644 index 0000000..ef02a19 --- /dev/null +++ b/lecture9/README.md @@ -0,0 +1,120 @@ +# Lecture 9 + +**- implemented adding at least one environment value and prove it is being read by the application** + +Go to `kubernetes/deployments/pacman-deployment.yaml` and add an environment variable to the `env:` key. E.g.: + +``` +- name: MY_ENV_VAR + value: "Lecture9" +``` + +Apply the changes with `kubectl apply -f ` +You can check the environment variable of your deployed pod with this command +`kubectl exec -- printenv | grep MY_ENV_VAR` + + +**- discuss for what env should be used (think about the 12-factor)** + +Configs that varry significantly between different deploys (e.g. resource handles, credentials, canonical hostnames in DNS records), should be separated from code in order to be flexible and avoid information leaks. E.g. config files have the problem that they can accidently checked into version control and potentially leak credentials for external services. + +Environment variables can easily be changed between separate deployments without changing the code, centralize configurations for an app and allow for grouping into different environments. + +**- delete or modify mongo's pvc and explain what happens (check the pv)** + +When we `kubectl delete pvc `, the pvc goes from status BOUND into TERMINATING. This is due to [Storage Object in Use Protection](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#storage-object-in-use-protection), which postpones the deletion until the pvc is no longer in use by the pod, to protect from accidental data loss. You can see if a pvc or pv is protected when you let kubernetes describe it and see `kubernetes.io/pvc-protection` or `kubernetes.io/pv-protection` in the finalizers. + +**- explain the difference between a pv and a pvc** + +A PV is an independend resouce in the cluster that represents some type of storage. It has a separate lifecycle from any pod but can be bound to one using a PVC. + +A PVC represents a request for storage in the cluster. PVC configurations are applied to a pod. Kubernetes then searches for any PVs that satisfy the requirements in the PVC and binds a matching PV to the pod. + +**- inspect mongodb contents without using any ingress to the mongo-pod: write down how you achieved that** + +Directly via the mongo shell from inside the mongodb pod: +```bash +kubectl exec -it deployment/mongo -- bash -c 'mongo -u root -p $MONGODB_ROOT_PASSWORD' +kubectl exec -it deployment/mongo -- bash -c 'mongo -u $MONGODB_USERNAME -p $MONGODB_PASSWORD --authenticationDatabase $MONGODB_DATABASE' +``` +Export the credentials from the secret and connect to the mongo pod via port forwarding: +```bash +kubectl get secret mongodb-users-secret -o go-template='{{range $k,$v := .data}}{{printf "export %s=" $k}}{{if not $v}}{{$v}}{{else}}{{$v | base64decode}}{{end}}{{"\n"}}{{end}}' +kubectl port-forward svc/mongo 27017:27017 +``` + +**- try to alter the secret , explain what happened** + +If we update just the secret directly in k8s, nothing changes. This is because the secret is exposed as an environment variable in the pod and the pod needs to be restarted for the changes to take effect. We can do this by deleting the pod, which will be recreated by the deployment. + +**- modify the replication factor while altering the deployment strategy, what happens ? (did this make sense?, discuss)** + +Recreate: +- all pods are deleted and then recreated. There is a short downtime where no pod is available. + +RollingUpdate: +- the pods are updated sequentially, depending on the max surge factor (maximum number of Pods that can be created over the desired number of Pods) and the max unavailable pods (max unavailable pods at a time) +- both factors can be absolute numbers or percentages of the desired number of pods + +As the default value was 25% for both factors and the number of replicas was 3, the pods were updated one by one. +But as we set the replicas to 4 we had at one point 2 pods running the old version and 2 pods running the new version. + +**Scenarios:** + +- 3 replicas, MaxSurge: 1, MaxUnavailable: 0 - one after another: + - 1 new pod gets created before 1 old pod is deleted + +- 3 replicas, MaxSurge: 0, MaxUnavailable: 1 - one after another: + - 1 old pod gets deleted before 1 new pod is created + + +**- redeploy the application after some minor change, alter the deployment strategy , decide which deployment strategy is best for mongodb vs which is best for pacman ? Why did you make this choice?** + +In this case we cannot scale mongodb horizontally, as it is a single node database. So we need to use the recreate strategy, as the rolling update strategy would not work. (There is a replicaset solution for mongodb, but this is not covered in this course) + +For pacman we can scale horizontally, so we can use the rolling update strategy. + +**- explain the difference between liveness health and readiness probe, modify the manifests and show clearly how they behave. Is it like you expected? Discuss how having them (or some of them) is differently important for the mongodb vs pacman deployment strategy (see point above)** + +Liveness probe: +Periodic probe of container liveness. Container will be restarted if the probe fails. This is useful in situations where the container may become stuck in a non-responsive state but the underlying infrastructure is still running. + +Readiness probe: +Periodic probe of container service readiness. Container will be removed from service endpoints if the probe fails, and traffic will be redirected to other available containers This is useful in situations where a container needs time to start up properly, or if a container needs to perform some initial setup before it can start accepting traffic. + +If the deployment strategy is set to recreate (as with mongodb) and the readiness probe is set to `initialDelaySeconds: 60`, the pod will be in status `Running` but not ready for 60 seconds. This is because the readiness probe is only executed after the initial delay. This would mean that the pod is at least not ready for 60 seconds after the deployment. + +In case of a rolling update strategy (as with pacman), the update just takes longer. But the other pods are still ready and can serve traffic if the updating pod can not get ready. So liveness is more important for pacman than mongodb here. + +The difference between pacman and mongodb are also exec vs http probes. The exec probe just executes a command inside the container and checks the exit code. The http probe executes a http request against a specified endpoint and checks the response code. +Furthermore the delays differ between the two deployments. + +**- what use case do you see for a post start hook for a database deployment?** + +- creating databases and users respectively +- filling up databases with mock data, e.g., for testing +- migrating data from another database + +**- last but not least: make the pod die from a OOM (out of memory) by setting resource limits and resource requests . Discuss which setting does what and how to calculate the memory limit** + +```yaml +spec: + ... + resources: + limits: + memory: 100Mi + requests: + cpu: 10m + memory: 10Mi +``` + +```bash +# this will write 200M to memory and trigger an OOM kill (137) +kubectl exec -it deployment/pacman -- bash -c "cat /dev/zero | head -c 200M | tail" +``` + +The memory limit is the maximum amount of memory that a container can use. If the container tries to use more memory than the limit, the container is killed. The memory limit is enforced by the kernel. + +To calculate the memory limit, we need to take into account the memory usage of the container and the memory usage of the processes running inside the container. The memory usage of the processes running inside the container is the most important factor. The memory usage of the container itself is usually very small. + +Monitoring the memory usage of the processes running inside the container is not easy. We can use the `top` command to get a rough estimate of the memory usage of the processes running inside the container. But this is not very accurate. The `top` command shows the memory usage of the processes running inside the container at a certain point in time. The memory usage of the processes running inside the container can change over time. So we need to monitor the memory usage of the processes running inside the container over a longer period of time. This is not easy to do. \ No newline at end of file From 887314edf3c500e26ef8a787ccfb719e9b34fef7 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Daniel=20Hofst=C3=A4tter?= Date: Thu, 1 Jun 2023 15:47:36 +0200 Subject: [PATCH 2/3] updtate readme --- README.md | 13 +++++ lecture9/README.md | 120 --------------------------------------------- 2 files changed, 13 insertions(+), 120 deletions(-) delete mode 100644 lecture9/README.md diff --git a/README.md b/README.md index 8a2c0e7..7b1ddf9 100644 --- a/README.md +++ b/README.md @@ -2,6 +2,19 @@ execDAT - remote code execution for research +## TL;DR + +We enable researchers to run their source code on remote execution environments through a single interface. + +Normally this would require a researcher to complete the following steps: +- build a container image with their source code +- push the image to a registry +- push the data to somewhere the cluster has access to +- create k8s manifests to deploy everything in a cluster +- publish the results somewhere to access afterwards + +With our application, these steps are simplified to writing a single standardized development environment configuration file and applying it using our provided CLI tool. + ## Getting Started ### Prerequisites diff --git a/lecture9/README.md b/lecture9/README.md deleted file mode 100644 index ef02a19..0000000 --- a/lecture9/README.md +++ /dev/null @@ -1,120 +0,0 @@ -# Lecture 9 - -**- implemented adding at least one environment value and prove it is being read by the application** - -Go to `kubernetes/deployments/pacman-deployment.yaml` and add an environment variable to the `env:` key. E.g.: - -``` -- name: MY_ENV_VAR - value: "Lecture9" -``` - -Apply the changes with `kubectl apply -f ` -You can check the environment variable of your deployed pod with this command -`kubectl exec -- printenv | grep MY_ENV_VAR` - - -**- discuss for what env should be used (think about the 12-factor)** - -Configs that varry significantly between different deploys (e.g. resource handles, credentials, canonical hostnames in DNS records), should be separated from code in order to be flexible and avoid information leaks. E.g. config files have the problem that they can accidently checked into version control and potentially leak credentials for external services. - -Environment variables can easily be changed between separate deployments without changing the code, centralize configurations for an app and allow for grouping into different environments. - -**- delete or modify mongo's pvc and explain what happens (check the pv)** - -When we `kubectl delete pvc `, the pvc goes from status BOUND into TERMINATING. This is due to [Storage Object in Use Protection](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#storage-object-in-use-protection), which postpones the deletion until the pvc is no longer in use by the pod, to protect from accidental data loss. You can see if a pvc or pv is protected when you let kubernetes describe it and see `kubernetes.io/pvc-protection` or `kubernetes.io/pv-protection` in the finalizers. - -**- explain the difference between a pv and a pvc** - -A PV is an independend resouce in the cluster that represents some type of storage. It has a separate lifecycle from any pod but can be bound to one using a PVC. - -A PVC represents a request for storage in the cluster. PVC configurations are applied to a pod. Kubernetes then searches for any PVs that satisfy the requirements in the PVC and binds a matching PV to the pod. - -**- inspect mongodb contents without using any ingress to the mongo-pod: write down how you achieved that** - -Directly via the mongo shell from inside the mongodb pod: -```bash -kubectl exec -it deployment/mongo -- bash -c 'mongo -u root -p $MONGODB_ROOT_PASSWORD' -kubectl exec -it deployment/mongo -- bash -c 'mongo -u $MONGODB_USERNAME -p $MONGODB_PASSWORD --authenticationDatabase $MONGODB_DATABASE' -``` -Export the credentials from the secret and connect to the mongo pod via port forwarding: -```bash -kubectl get secret mongodb-users-secret -o go-template='{{range $k,$v := .data}}{{printf "export %s=" $k}}{{if not $v}}{{$v}}{{else}}{{$v | base64decode}}{{end}}{{"\n"}}{{end}}' -kubectl port-forward svc/mongo 27017:27017 -``` - -**- try to alter the secret , explain what happened** - -If we update just the secret directly in k8s, nothing changes. This is because the secret is exposed as an environment variable in the pod and the pod needs to be restarted for the changes to take effect. We can do this by deleting the pod, which will be recreated by the deployment. - -**- modify the replication factor while altering the deployment strategy, what happens ? (did this make sense?, discuss)** - -Recreate: -- all pods are deleted and then recreated. There is a short downtime where no pod is available. - -RollingUpdate: -- the pods are updated sequentially, depending on the max surge factor (maximum number of Pods that can be created over the desired number of Pods) and the max unavailable pods (max unavailable pods at a time) -- both factors can be absolute numbers or percentages of the desired number of pods - -As the default value was 25% for both factors and the number of replicas was 3, the pods were updated one by one. -But as we set the replicas to 4 we had at one point 2 pods running the old version and 2 pods running the new version. - -**Scenarios:** - -- 3 replicas, MaxSurge: 1, MaxUnavailable: 0 - one after another: - - 1 new pod gets created before 1 old pod is deleted - -- 3 replicas, MaxSurge: 0, MaxUnavailable: 1 - one after another: - - 1 old pod gets deleted before 1 new pod is created - - -**- redeploy the application after some minor change, alter the deployment strategy , decide which deployment strategy is best for mongodb vs which is best for pacman ? Why did you make this choice?** - -In this case we cannot scale mongodb horizontally, as it is a single node database. So we need to use the recreate strategy, as the rolling update strategy would not work. (There is a replicaset solution for mongodb, but this is not covered in this course) - -For pacman we can scale horizontally, so we can use the rolling update strategy. - -**- explain the difference between liveness health and readiness probe, modify the manifests and show clearly how they behave. Is it like you expected? Discuss how having them (or some of them) is differently important for the mongodb vs pacman deployment strategy (see point above)** - -Liveness probe: -Periodic probe of container liveness. Container will be restarted if the probe fails. This is useful in situations where the container may become stuck in a non-responsive state but the underlying infrastructure is still running. - -Readiness probe: -Periodic probe of container service readiness. Container will be removed from service endpoints if the probe fails, and traffic will be redirected to other available containers This is useful in situations where a container needs time to start up properly, or if a container needs to perform some initial setup before it can start accepting traffic. - -If the deployment strategy is set to recreate (as with mongodb) and the readiness probe is set to `initialDelaySeconds: 60`, the pod will be in status `Running` but not ready for 60 seconds. This is because the readiness probe is only executed after the initial delay. This would mean that the pod is at least not ready for 60 seconds after the deployment. - -In case of a rolling update strategy (as with pacman), the update just takes longer. But the other pods are still ready and can serve traffic if the updating pod can not get ready. So liveness is more important for pacman than mongodb here. - -The difference between pacman and mongodb are also exec vs http probes. The exec probe just executes a command inside the container and checks the exit code. The http probe executes a http request against a specified endpoint and checks the response code. -Furthermore the delays differ between the two deployments. - -**- what use case do you see for a post start hook for a database deployment?** - -- creating databases and users respectively -- filling up databases with mock data, e.g., for testing -- migrating data from another database - -**- last but not least: make the pod die from a OOM (out of memory) by setting resource limits and resource requests . Discuss which setting does what and how to calculate the memory limit** - -```yaml -spec: - ... - resources: - limits: - memory: 100Mi - requests: - cpu: 10m - memory: 10Mi -``` - -```bash -# this will write 200M to memory and trigger an OOM kill (137) -kubectl exec -it deployment/pacman -- bash -c "cat /dev/zero | head -c 200M | tail" -``` - -The memory limit is the maximum amount of memory that a container can use. If the container tries to use more memory than the limit, the container is killed. The memory limit is enforced by the kernel. - -To calculate the memory limit, we need to take into account the memory usage of the container and the memory usage of the processes running inside the container. The memory usage of the processes running inside the container is the most important factor. The memory usage of the container itself is usually very small. - -Monitoring the memory usage of the processes running inside the container is not easy. We can use the `top` command to get a rough estimate of the memory usage of the processes running inside the container. But this is not very accurate. The `top` command shows the memory usage of the processes running inside the container at a certain point in time. The memory usage of the processes running inside the container can change over time. So we need to monitor the memory usage of the processes running inside the container over a longer period of time. This is not easy to do. \ No newline at end of file From b8dfb60fce6deb28aca4a17f71d73a597b430365 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Daniel=20Hofst=C3=A4tter?= Date: Thu, 1 Jun 2023 15:52:59 +0200 Subject: [PATCH 3/3] update readmea again --- README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.md b/README.md index 7b1ddf9..fe64918 100644 --- a/README.md +++ b/README.md @@ -15,6 +15,9 @@ Normally this would require a researcher to complete the following steps: With our application, these steps are simplified to writing a single standardized development environment configuration file and applying it using our provided CLI tool. +## Architecture +![Architecture](docs/images/architecture_overview.png) + ## Getting Started ### Prerequisites