Skip to content

Refactor fetching of wathola receiver's delivery report using special batch Job#4460

Merged
knative-prow-robot merged 6 commits into
knative:masterfrom
cardil:feature/wathola-fetch-report-by-job
Nov 6, 2020
Merged

Refactor fetching of wathola receiver's delivery report using special batch Job#4460
knative-prow-robot merged 6 commits into
knative:masterfrom
cardil:feature/wathola-fetch-report-by-job

Conversation

@cardil
Copy link
Copy Markdown
Contributor

@cardil cardil commented Nov 4, 2020

This change targets the problem of how to get report from cluster. Clusters may have different networking setup, and it might not be possible to directly make HTTP request from outside of cluster.

Previous approach used to guess an external address of cluster. That for sure fails on OpenShift deployed on AWS.

This approach deploys a special Job that, being inside cluster, can download a report and print it in its logs. Then test client can fetch logs of completed job, and parse it, replay the logs, and process report further.

Fixes #3175
Closes #4430

Proposed Changes

  • Use K8s job to fetch Wathola report, via job pod's logs
  • Removal of guessing of node external address

This change targets the problem of how to get report from cluster.
Clusters may have different networking setup, and it might not be
possible to directly make HTTP request from outside of cluster.

Previous approach used to guess an external address of cluster. That for
sure fails on OpenShift deployed on AWS.

This approach deploys a special Job that, being inside cluster, can
download a report and print it in it's logs. Then test client can fetch
logs of completed job, and parse it, replay the logs, and process report
further.
@knative-prow-robot knative-prow-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 4, 2020
@google-cla google-cla Bot added the cla: yes Indicates the PR's author has signed the CLA. label Nov 4, 2020
@knative-prow-robot
Copy link
Copy Markdown
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@knative-prow-robot knative-prow-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. area/test-and-release Test infrastructure, tests or release labels Nov 4, 2020
@codecov
Copy link
Copy Markdown

codecov Bot commented Nov 4, 2020

Codecov Report

Merging #4460 into master will increase coverage by 0.07%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4460      +/-   ##
==========================================
+ Coverage   81.19%   81.27%   +0.07%     
==========================================
  Files         281      282       +1     
  Lines        7981     8004      +23     
==========================================
+ Hits         6480     6505      +25     
  Misses       1112     1112              
+ Partials      389      387       -2     
Impacted Files Coverage Δ
pkg/kncloudevents/message_sender.go 78.00% <0.00%> (-1.67%) ⬇️
pkg/channel/message_dispatcher.go 77.31% <0.00%> (-0.24%) ⬇️
...econciler/inmemorychannel/dispatcher/controller.go 78.26% <0.00%> (ø)
pkg/kncloudevents/http_client.go 100.00% <0.00%> (ø)
...iler/inmemorychannel/dispatcher/inmemorychannel.go 89.39% <0.00%> (+0.86%) ⬆️
pkg/mtbroker/filter/filter_handler.go 79.51% <0.00%> (+0.99%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b1706b6...8294d9f. Read the comment docs.

@cardil cardil force-pushed the feature/wathola-fetch-report-by-job branch 2 times, most recently from 12dbfe2 to 5e31ffb Compare November 4, 2020 17:56
@cardil cardil force-pushed the feature/wathola-fetch-report-by-job branch from 5e31ffb to a4124eb Compare November 4, 2020 18:06
@cardil cardil changed the title Reimplementing fetching of wathola report with K8s job Refactor fetching of wathola report with K8s job Nov 4, 2020
@cardil cardil marked this pull request as ready for review November 4, 2020 21:49
@knative-prow-robot knative-prow-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 4, 2020
@cardil cardil changed the title Refactor fetching of wathola report with K8s job Refactor fetching of wathola receiver's delivery report using special batch Job Nov 4, 2020
Copy link
Copy Markdown
Contributor

@devguyio devguyio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/assign
Thanks for doing this @cardil ! Initial review with small comments for the README language. Hope they make sense. Reviewing the rest now

Comment thread test/upgrade/README.md Outdated
Comment thread test/upgrade/README.md Outdated
Comment thread test/upgrade/README.md Outdated
@zhongduo
Copy link
Copy Markdown
Contributor

zhongduo commented Nov 5, 2020

/assign

Thanks for doing this. This is close to my question in the previous PR about using a "curl" pod to make the report request, like what we have in knative getting started doc. Not sure that can do the same as the fetcher. Another possible way is to simply ask the receiver pod to send out the report once it gets the last event, do you think that is feasible?

Comment thread test/upgrade/prober/wathola/fetcher/operations.go
Comment thread test/upgrade/prober/wathola/fetcher/operations.go
Comment thread test/upgrade/prober/verify.go Outdated
Comment thread test/upgrade/prober/verify.go
Comment thread test/upgrade/prober/verify.go Outdated
Comment thread test/upgrade/prober/verify.go Outdated
@cardil
Copy link
Copy Markdown
Contributor Author

cardil commented Nov 5, 2020

Re @zhongduo:

I don't think so it's possible.

First of all, there no guarantee that finished event will get to receiver. I saw that happened when using interval of 2ms.

Secondly, how receiver would send the message. It's unlikely that test runner , outside of k8s cluster, be network reachable. It might create a configmap with response, but the he would need to have lube config injected. At that point we can notify him from outside by creating k8s event. That's what was proposed as solution in the issue this addresses.

The approach in this PR is simple and don't require additional kubeconfig injection.

@zhongduo
Copy link
Copy Markdown
Contributor

zhongduo commented Nov 5, 2020

Re @zhongduo:

I don't think so it's possible.

First of all, there no guarantee that finished event will get to receiver. I saw that happened when using interval of 2ms.

Secondly, how receiver would send the message. It's unlikely that test runner , outside of k8s cluster, be network reachable. It might create a configmap with response, but the he would need to have lube config injected. At that point we can notify him from outside by creating k8s event. That's what was proposed as solution in the issue this addresses.

The approach in this PR is simple and don't require additional kubeconfig injection.

Thanks for the response, make sense. I was more thinking about merging the fetcher and receiver so that the receiver prints out to the logger directly the same way that the fetcher is doing now. And if we couldn't find the log entry, that means sth is wrong and can be considered as an error.

Comment thread test/upgrade/README.md
cardil and others added 2 commits November 5, 2020 18:50
Co-authored-by: Ahmed Abdalla Abdelrehim <aabdelre@redhat.com>
@cardil cardil requested a review from devguyio November 6, 2020 12:44
@devguyio
Copy link
Copy Markdown
Contributor

devguyio commented Nov 6, 2020

/lgtm

@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 6, 2020
@AlexandraRoatis
Copy link
Copy Markdown
Contributor

Thank you for the thorough explanation and documentation of the changes!

/lgtm

Copy link
Copy Markdown
Member

@pierDipi pierDipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@knative-prow-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cardil, devguyio, pierDipi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow-robot knative-prow-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 6, 2020
@knative-prow-robot knative-prow-robot merged commit daf0d0f into knative:master Nov 6, 2020
@cardil cardil deleted the feature/wathola-fetch-report-by-job branch November 6, 2020 16:19
cardil added a commit to cardil/knative-eventing that referenced this pull request Nov 18, 2020
… batch Job (knative#4460)

* Reimplementing fetching of wathola report with K8s job

This change targets the problem of how to get report from cluster.
Clusters may have different networking setup, and it might not be
possible to directly make HTTP request from outside of cluster.

Previous approach used to guess an external address of cluster. That for
sure fails on OpenShift deployed on AWS.

This approach deploys a special Job that, being inside cluster, can
download a report and print it in it's logs. Then test client can fetch
logs of completed job, and parse it, replay the logs, and process report
further.

* Removal of unneeded external node address package

* Fixing lints & boilerplate

* spec.template.spec.restartPolicy=never

* Apply @devguyio suggestions for test/upgrade/README.md

Co-authored-by: Ahmed Abdalla Abdelrehim <aabdelre@redhat.com>

* Changes after review

Co-authored-by: Ahmed Abdalla Abdelrehim <aabdelre@redhat.com>
openshift-merge-robot pushed a commit to openshift/knative-eventing that referenced this pull request Nov 19, 2020
* Eventing upgrade tests prober fully configurable (knative#4421)

* Eventing upgrade tests prober fully configurable

* Embedding configuration structs

* Reduce a test name length to prevent DNS label too long error (knative#4442)

Having too long namespace or kservice name can lead to an error like:

```
$ host wathola-receiver-test-continuous-events-propagation-with-prober-zxmkp.apps.example.org
host: 'wathola-receiver-test-continuous-events-propagation-with-prober-zxmkp.apps.example.org' is not a legal IDN name (domain label longer than 63 characters), use +noidnin
```

In this case my namespace is test-continuous-events-propagation-with-prober-zxmkp
and knative service name is wathola-receiver. The namespace is taken
from Go test method name. The limit is 63 characters. In this example
the subdomain is 69 characters.

This does affect OpenShift Serverless as kservices there have a URL
format of `${ksvc.name}-${ksvc.namespace}` to enable usage of TLS
wildcard certificates.

Reducing this test method name length will help fit within this strict
limit of 63 chars.

* Use deployment to avoid disparity in effective user (knative#4445)

On OpenShift we've observed a disparity when using pods vs deployments.
Using both of those can lead to having different effective user for a
bare pods and pods managed by deployment.

That leads to differences in reading a config file by wathola
components, as `~` points to different places sender and
receiver+forwarder.

This changes the code to avoid using bare pods for wathola components.

* Refactor fetching of wathola receiver's delivery report using special batch Job (knative#4460)

* Reimplementing fetching of wathola report with K8s job

This change targets the problem of how to get report from cluster.
Clusters may have different networking setup, and it might not be
possible to directly make HTTP request from outside of cluster.

Previous approach used to guess an external address of cluster. That for
sure fails on OpenShift deployed on AWS.

This approach deploys a special Job that, being inside cluster, can
download a report and print it in it's logs. Then test client can fetch
logs of completed job, and parse it, replay the logs, and process report
further.

* Removal of unneeded external node address package

* Fixing lints & boilerplate

* spec.template.spec.restartPolicy=never

* Apply @devguyio suggestions for test/upgrade/README.md

Co-authored-by: Ahmed Abdalla Abdelrehim <aabdelre@redhat.com>

* Changes after review

Co-authored-by: Ahmed Abdalla Abdelrehim <aabdelre@redhat.com>

Co-authored-by: Ahmed Abdalla Abdelrehim <aabdelre@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test-and-release Test infrastructure, tests or release cla: yes Indicates the PR's author has signed the CLA. lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use K8s etcd database to publish and fetch upgrade tests reports

6 participants