Skip to content

Conversation

@snjypl
Copy link
Contributor

@snjypl snjypl commented Dec 15, 2022

Fix #28391 manual task trigger from UI fails for k8s executor

Manual task trigger from UI fails for k8s executor. the executor.job_id
is currently set to "manual". the task instance queued_by_job_id field
is expected to be None|Integer. this causes the filter query in
clear_not_launched_queued_tasks method in kubernetes_executor to fail
with psycopg2.errors.InvalidTextRepresentation invalid input syntax for integer: "manual" error.

setting the job_id to None fixes the issue.

Fixes: #28391


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added area:CLI area:webserver Webserver related Issues labels Dec 15, 2022
@snjypl snjypl changed the title Fix manual trigger failing for k8s. Fix manual task trigger failing for k8s. Dec 15, 2022
@snjypl snjypl force-pushed the bugfix/28391-manual-trigger-fails-for-k8s branch from bf09d07 to cacec62 Compare January 6, 2023 22:09
@snjypl snjypl marked this pull request as ready for review January 7, 2023 13:46
@arjunanan6
Copy link
Contributor

arjunanan6 commented Jan 9, 2023

@snjypl To test if this works, I've patched our installation of airflow with the changes in this pr, and unfortunately there are still problems when executing a task manually:

[2023-01-09T11:34:50.901+0000] {app.py:1741} ERROR - Exception on /run [POST]
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.9/site-packages/flask/app.py", line 2525, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/airflow/.local/lib/python3.9/site-packages/flask/app.py", line 1822, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/airflow/.local/lib/python3.9/site-packages/flask/app.py", line 1820, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/airflow/.local/lib/python3.9/site-packages/flask/app.py", line 1796, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "/home/airflow/.local/lib/python3.9/site-packages/airflow/www/auth.py", line 47, in decorated
    return func(*args, **kwargs)
  File "/home/airflow/.local/lib/python3.9/site-packages/airflow/www/decorators.py", line 125, in wrapper
    return f(*args, **kwargs)
  File "/home/airflow/.local/lib/python3.9/site-packages/airflow/utils/session.py", line 75, in wrapper
    return func(*args, session=session, **kwargs)
  File "/home/airflow/.local/lib/python3.9/site-packages/airflow/www/views.py", line 1822, in run
    executor.start()
  File "/home/airflow/.local/lib/python3.9/site-packages/airflow/executors/kubernetes_executor.py", line 529, in start
    raise AirflowException("Could not get scheduler_job_id")
airflow.exceptions.AirflowException: Could not get scheduler_job_id

@snjypl snjypl force-pushed the bugfix/28391-manual-trigger-fails-for-k8s branch from cacec62 to d2da7c4 Compare January 9, 2023 12:29
@snjypl
Copy link
Contributor Author

snjypl commented Jan 9, 2023

@arjunanan6 can you please try the the latest code? i removed the job_id check.

also, while viewing the log you might face this issue depending on your logging configuration: #28817

@arjunanan6
Copy link
Contributor

@snjypl I will test it later this evening. Logging-wise, we should be okay as we use remote logging (WASB).

@eladkal eladkal added this to the Airflow 2.5.1 milestone Jan 11, 2023
@snjypl snjypl force-pushed the bugfix/28391-manual-trigger-fails-for-k8s branch 2 times, most recently from ddf4336 to 44640d0 Compare January 13, 2023 19:56
@arjunanan6
Copy link
Contributor

@snjypl Just got time now... I can't run your patch though, it fails to build:

#10 0.625 patching file airflow/utils/log/file_task_handler.py
#10 0.625 Hunk #1 FAILED at 215.
#10 0.625 1 out of 1 hunk FAILED -- saving rejects to file /tmp/rejects
#10 0.632 patching file airflow/utils/log/file_task_handler.py
#10 0.632 Hunk #1 FAILED at 215.
#10 0.633 1 out of 1 hunk FAILED -- saving rejects to file /tmp/rejects
#10 0.634 --- airflow/utils/log/file_task_handler.py
#10 0.634 +++ airflow/utils/log/file_task_handler.py
#10 0.634 @@ -215,7 +215,7 @@ def _read(self, ti: TaskInstance, try_number: int, metadata: dict[str, Any] | No
#10 0.634                  selector = PodGenerator.build_selector_for_k8s_executor_pod(
#10 0.634                      dag_id=ti.dag_id,
#10 0.634                      task_id=ti.task_id,
#10 0.634 -                    try_number=ti.try_number,
#10 0.634 +                    try_number=try_number,
#10 0.634                      map_index=ti.map_index,
#10 0.634                      run_id=ti.run_id,
#10 0.634                      airflow_worker=ti.queued_by_job_id,
#10 0.634 --- airflow/utils/log/file_task_handler.py
#10 0.634 +++ airflow/utils/log/file_task_handler.py
#10 0.634 @@ -215,7 +215,7 @@ def _read(self, ti: TaskInstance, try_number: int, metadata: dict[str, Any] | No
#10 0.634                  selector = PodGenerator.build_selector_for_k8s_executor_pod(
#10 0.634                      dag_id=ti.dag_id,
#10 0.634                      task_id=ti.task_id,
#10 0.634 -                    try_number=try_number,
#10 0.634 +                    try_number=ti.try_number,
#10 0.634                      map_index=ti.map_index,
#10 0.634                      run_id=ti.run_id,
#10 0.634                      airflow_worker=ti.queued_by_job_id,

@snjypl snjypl force-pushed the bugfix/28391-manual-trigger-fails-for-k8s branch 3 times, most recently from bbd5c69 to 653083a Compare January 17, 2023 20:57
@snjypl
Copy link
Contributor Author

snjypl commented Jan 17, 2023

@snjypl Just got time now... I can't run your patch though, it fails to build:

#10 0.625 patching file airflow/utils/log/file_task_handler.py
#10 0.625 Hunk #1 FAILED at 215.
#10 0.625 1 out of 1 hunk FAILED -- saving rejects to file /tmp/rejects
#10 0.632 patching file airflow/utils/log/file_task_handler.py
#10 0.632 Hunk #1 FAILED at 215.
#10 0.633 1 out of 1 hunk FAILED -- saving rejects to file /tmp/rejects
#10 0.634 --- airflow/utils/log/file_task_handler.py
#10 0.634 +++ airflow/utils/log/file_task_handler.py
#10 0.634 @@ -215,7 +215,7 @@ def _read(self, ti: TaskInstance, try_number: int, metadata: dict[str, Any] | No
#10 0.634                  selector = PodGenerator.build_selector_for_k8s_executor_pod(
#10 0.634                      dag_id=ti.dag_id,
#10 0.634                      task_id=ti.task_id,
#10 0.634 -                    try_number=ti.try_number,
#10 0.634 +                    try_number=try_number,
#10 0.634                      map_index=ti.map_index,
#10 0.634                      run_id=ti.run_id,
#10 0.634                      airflow_worker=ti.queued_by_job_id,
#10 0.634 --- airflow/utils/log/file_task_handler.py
#10 0.634 +++ airflow/utils/log/file_task_handler.py
#10 0.634 @@ -215,7 +215,7 @@ def _read(self, ti: TaskInstance, try_number: int, metadata: dict[str, Any] | No
#10 0.634                  selector = PodGenerator.build_selector_for_k8s_executor_pod(
#10 0.634                      dag_id=ti.dag_id,
#10 0.634                      task_id=ti.task_id,
#10 0.634 -                    try_number=try_number,
#10 0.634 +                    try_number=ti.try_number,
#10 0.634                      map_index=ti.map_index,
#10 0.634                      run_id=ti.run_id,
#10 0.634                      airflow_worker=ti.queued_by_job_id,

@arjunanan6 can you please try it now?

@arjunanan6
Copy link
Contributor

@snjypl That solved the previous issue, and the execution is attempted now. However, manual runs still fail because the service account gets rejected from creating a pod for the execution:

[2023-01-18T07:20:26.236+0000] {kubernetes_executor.py:527} INFO - Start Kubernetes executor
[2023-01-18T07:20:26.261+0000] {kubernetes_executor.py:130} INFO - Event: and now my watch begins starting at resource_version: 0
[2023-01-18T07:20:26.395+0000] {kubernetes_executor.py:476} INFO - Found 0 queued task instances
[2023-01-18T07:20:26.438+0000] {base_executor.py:95} INFO - Adding to queue: ['airflow', 'tasks', 'run', 'x-y-z', 'X-Y', 'scheduled__2023-01-18T07:00:00+00:00', '--ignore
-all-dependencies', '--ignore-dependencies', '--local', '--pool', 'default_pool', '--subdir', 'DAGS_FOLDER/airflow-dags-sap/X-Y/x-y-z.py']
[2023-01-18T07:20:26.438+0000] {base_executor.py:215} INFO - task TaskInstanceKey(dag_id='x-y-z', task_id='x-y-z', run_id='scheduled__2023-01-18T07:00:00+00:00', try_number
=3, map_index=-1) is still running
[2023-01-18T07:20:26.508+0000] {kubernetes_executor.py:339} INFO - Creating kubernetes pod for job is TaskInstanceKey(dag_id='x-y-z', task_id='X-Y', run_id='scheduled__20
23-01-18T07:00:00+00:00', try_number=3, map_index=-1), with pod name x-y-z-78e3092210f94420bb0e98a744969f29
[2023-01-18T07:20:26.538+0000] {kubernetes_executor.py:274} ERROR - Exception when attempting to create Namespaced Pod: {
.
.
.
kubernetes.client.exceptions.ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'ddc99dcf-9d70-4f88-8c7c-77f543879844', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-
Options': 'nosniff', 'X-Kubernetes-Pf-Flowschema-Uid': 'e7834783-2050-421a-b99e-0615f85f6e92', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'e9e2e589-5d4c-442b-8568-f7bfbdbfaafd', 'Date': '
Wed, 18 Jan 2023 07:20:26 GMT', 'Content-Length': '315'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods is forbidden: User \"system:serviceaccount:airflow-test-ns:airflow-test-webse
rver\" cannot create resource \"pods\" in API group \"\" in the namespace \"airflow-test-ns\"","reason":"Forbidden","details":{"kind":"pods"},"code":403}

Which is strange, because this SA is allowed to create pods, and other tasks are being executed by the same service account. I trimmed out the pod definition, but nothing looks particularly out of place there. Any idea why this fails only on a manual run attempt?

@snjypl
Copy link
Contributor Author

snjypl commented Jan 18, 2023

@snjypl That solved the previous issue, and the execution is attempted now. However, manual runs still fail because the service account gets rejected from creating a pod for the execution:

[2023-01-18T07:20:26.236+0000] {kubernetes_executor.py:527} INFO - Start Kubernetes executor
[2023-01-18T07:20:26.261+0000] {kubernetes_executor.py:130} INFO - Event: and now my watch begins starting at resource_version: 0
[2023-01-18T07:20:26.395+0000] {kubernetes_executor.py:476} INFO - Found 0 queued task instances
[2023-01-18T07:20:26.438+0000] {base_executor.py:95} INFO - Adding to queue: ['airflow', 'tasks', 'run', 'x-y-z', 'X-Y', 'scheduled__2023-01-18T07:00:00+00:00', '--ignore
-all-dependencies', '--ignore-dependencies', '--local', '--pool', 'default_pool', '--subdir', 'DAGS_FOLDER/airflow-dags-sap/X-Y/x-y-z.py']
[2023-01-18T07:20:26.438+0000] {base_executor.py:215} INFO - task TaskInstanceKey(dag_id='x-y-z', task_id='x-y-z', run_id='scheduled__2023-01-18T07:00:00+00:00', try_number
=3, map_index=-1) is still running
[2023-01-18T07:20:26.508+0000] {kubernetes_executor.py:339} INFO - Creating kubernetes pod for job is TaskInstanceKey(dag_id='x-y-z', task_id='X-Y', run_id='scheduled__20
23-01-18T07:00:00+00:00', try_number=3, map_index=-1), with pod name x-y-z-78e3092210f94420bb0e98a744969f29
[2023-01-18T07:20:26.538+0000] {kubernetes_executor.py:274} ERROR - Exception when attempting to create Namespaced Pod: {
.
.
.
kubernetes.client.exceptions.ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'ddc99dcf-9d70-4f88-8c7c-77f543879844', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-
Options': 'nosniff', 'X-Kubernetes-Pf-Flowschema-Uid': 'e7834783-2050-421a-b99e-0615f85f6e92', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'e9e2e589-5d4c-442b-8568-f7bfbdbfaafd', 'Date': '
Wed, 18 Jan 2023 07:20:26 GMT', 'Content-Length': '315'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods is forbidden: User \"system:serviceaccount:airflow-test-ns:airflow-test-webse
rver\" cannot create resource \"pods\" in API group \"\" in the namespace \"airflow-test-ns\"","reason":"Forbidden","details":{"kind":"pods"},"code":403}

Which is strange, because this SA is allowed to create pods, and other tasks are being executed by the same service account. I trimmed out the pod definition, but nothing looks particularly out of place there. Any idea why this fails only on a manual run attempt?

this issue is caused by airflow-webserver not haveing pod-launcher-role, you can try the fix in this PR: #29012

@arjunanan6
Copy link
Contributor

arjunanan6 commented Jan 18, 2023

@snjypl Tried out your patch in 29012 as well, but still no luck. I'm still getting the same forbidden error. If you want, I can move pod creation specific comments to the discussion on #29012.

@snjypl
Copy link
Contributor Author

snjypl commented Jan 18, 2023

@snjypl Tried out your patch in 29012 as well, but still no luck. I'm still getting the same forbidden error. If you want, I can move pod creation specific comments to the discussion on #29012.

yes @arjunanan6 i think, it would be better if you could open a new issue with details about your environment, deployment method etc. it will help in debugging and resolving the issue. we can link that issue to the other PR #29012 and address it there.

@arjunanan6
Copy link
Contributor

@snjypl A little update, I've tried out this fix in combination with #29012 locally, and still get the same forbidden error. I'll add more details on that PR since it's more relevant there.

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thikn we need a ot more context in commit - what hapens here.

@snjypl snjypl force-pushed the bugfix/28391-manual-trigger-fails-for-k8s branch from 92f3ce0 to 4de0b0a Compare January 23, 2023 08:06
@snjypl
Copy link
Contributor Author

snjypl commented Jan 23, 2023

I thikn we need a ot more context in commit - what hapens here.

i have added a more descriptive commit message.

@snjypl snjypl force-pushed the bugfix/28391-manual-trigger-fails-for-k8s branch from 4de0b0a to aa0fedf Compare January 23, 2023 13:51
Manual task trigger from UI fails for k8s executor. the executor.job_id
is currently set to "manual". the task instance queued_by_job_id field
is expected to be None|Integer. this causes the filter query in
clear_not_launched_queued_tasks method in kubernetes_executor to fail
with psycopg2.errors.InvalidTextRepresentation invalid input syntax for integer: "manual" error.

setting the job_id to None fixes the issue.
@snjypl snjypl force-pushed the bugfix/28391-manual-trigger-fails-for-k8s branch from aa0fedf to dfccccd Compare January 24, 2023 11:51
@potiuk potiuk merged commit 9510043 into apache:main Jan 24, 2023
@snjypl snjypl deleted the bugfix/28391-manual-trigger-fails-for-k8s branch January 24, 2023 15:35
@pierrejeambrun pierrejeambrun added the type:bug-fix Changelog: Bug Fixes label Feb 27, 2023
pierrejeambrun pushed a commit that referenced this pull request Mar 6, 2023
Manual task trigger from UI fails for k8s executor. the executor.job_id
is currently set to "manual". the task instance queued_by_job_id field
is expected to be None|Integer. this causes the filter query in
clear_not_launched_queued_tasks method in kubernetes_executor to fail
with psycopg2.errors.InvalidTextRepresentation invalid input syntax for integer: "manual" error.

setting the job_id to None fixes the issue.

(cherry picked from commit 9510043)
pierrejeambrun pushed a commit that referenced this pull request Mar 8, 2023
Manual task trigger from UI fails for k8s executor. the executor.job_id
is currently set to "manual". the task instance queued_by_job_id field
is expected to be None|Integer. this causes the filter query in
clear_not_launched_queued_tasks method in kubernetes_executor to fail
with psycopg2.errors.InvalidTextRepresentation invalid input syntax for integer: "manual" error.

setting the job_id to None fixes the issue.

(cherry picked from commit 9510043)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:CLI area:webserver Webserver related Issues type:bug-fix Changelog: Bug Fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Manual task trigger fails for kubernetes executor with psycopg2 InvalidTextRepresentation error

7 participants