Fix Unable to fetch logs from worker pod error in UI for k8s executor #28817

snjypl · 2023-01-09T22:04:40Z

[ note: i have set delete_worker_pods to False so that the pod is not deleted after completion]

for k8s executor while trying to view the task log from the UI. we are getting the following error.

airflow-webserver-5bf48475c-zdjxv
*** Trying to get logs (last 100 lines) from worker pod airflow-webserver-5bf48475c-zdjxv ***

*** Unable to fetch logs from worker pod airflow-webserver-5bf48475c-zdjxv ***
('Cannot find pod for ti %s', <TaskInstance: dataset_produces_2.producing_task_2 manual__2022-12-28T21:13:27.229615+00:00 [success]>)

i think, the issue is, while calling PodGenerator.build_selector_for_k8s_executor_pod we are passing ti.try_number instead of the try_number that was passed to the _read method.

airflow/airflow/utils/log/file_task_handler.py

Line 167 in 3ececb2

    
           def _read(self, ti: TaskInstance, try_number: int, metadata: dict[str, Any] | None = None):

airflow/airflow/utils/log/file_task_handler.py

Lines 215 to 222 in 3ececb2

    
           selector = PodGenerator.build_selector_for_k8s_executor_pod( 
        
               dag_id=ti.dag_id, 
        
               task_id=ti.task_id, 
        
               try_number=ti.try_number, 
        
               map_index=ti.map_index, 
        
               run_id=ti.run_id, 
        
               airflow_worker=ti.queued_by_job_id, 
        
           )

^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

ashb · 2023-01-11T11:35:19Z

Paging @dstandish

dstandish · 2023-02-10T16:46:37Z

Looks reasonable to me, but needs a rebase

o-nikolas

lgtm, pending passing tests of course.

dstandish · 2023-02-14T21:25:54Z

airflow/executors/base_executor.py

wait this bit doesn't make sense to me. if we already have the TI, why is it we also need to supply the try number?

There is some explanation in the code for read:

airflow/airflow/utils/log/file_task_handler.py

Lines 376 to 410 in c266aac

def read(self, task_instance, try_number=None, metadata=None):

"""

Read logs of given task instance from local machine.

:param task_instance: task instance object

:param try_number: task instance try_number to read logs from. If None

it returns all logs separated by try_number

:param metadata: log metadata, can be used for steaming log reading and auto-tailing.

:return: a list of listed tuples which order log string by host

"""

# Task instance increments its try number when it starts to run.

# So the log for a particular task try will only show up when

# try number gets incremented in DB, i.e logs produced the time

# after cli run and before try_number + 1 in DB will not be displayed.

if try_number is None:

next_try = task_instance.next_try_number

try_numbers = list(range(1, next_try))

elif try_number < 1:

logs = [

[("default_host", f"Error fetching the logs. Try number {try_number} is invalid.")],

]

return logs, [{"end_of_log": True}]

else:

try_numbers = [try_number]

logs = [""] * len(try_numbers)

metadata_array = [{}] * len(try_numbers)

# subclasses implement _read and may not have log_type, which was added recently

for i, try_number_element in enumerate(try_numbers):

log, out_metadata = self._read(task_instance, try_number_element, metadata)

# es_task_handler return logs grouped by host. wrap other handler returning log string

# with default/ empty host so that UI can render the response in the same way

logs[i] = log if self._read_grouped_logs() else [(task_instance.hostname, log)]

metadata_array[i] = out_metadata

try_number is plumbed in from read to _read and then should have been sent to the executors, but it was a miss:

airflow/airflow/utils/log/file_task_handler.py

Lines 274 to 315 in c266aac

def _read(

self,

ti: TaskInstance,

try_number: int,

metadata: dict[str, Any] | None = None,

):

"""

Template method that contains custom logic of reading

logs given the try_number.

:param ti: task instance record

:param try_number: current try_number to read log from

:param metadata: log metadata,

can be used for steaming log reading and auto-tailing.

Following attributes are used:

log_pos: (absolute) Char position to which the log

which was retrieved in previous calls, this

part will be skipped and only following test

returned to be added to tail.

:return: log message as a string and metadata.

Following attributes are used in metadata:

end_of_log: Boolean, True if end of log is reached or False

if further calls might get more log text.

This is determined by the status of the TaskInstance

log_pos: (absolute) Char position to which the log is retrieved

"""

# Task instance here might be different from task instance when

# initializing the handler. Thus explicitly getting log location

# is needed to get correct log path.

worker_log_rel_path = self._render_filename(ti, try_number)

messages_list: list[str] = []

remote_logs: list[str] = []

running_logs: list[str] = []

local_logs: list[str] = []

executor_messages: list[str] = []

executor_logs: list[str] = []

served_logs: list[str] = []

with suppress(NotImplementedError):

remote_messages, remote_logs = self._read_remote_logs(ti, try_number, metadata)

messages_list.extend(remote_messages)

if ti.state == TaskInstanceState.RUNNING:

response = self._executor_get_task_log(ti)

ti.try_number was used for fetching log from k8s pod. it was causing incorrect log being returned for k8s pod. fixed by passing try_number from _read to get_task_log method use try_number argument instead of ti.try_number for selecting pod in k8s executor

stale

o-nikolas · 2023-03-01T01:20:16Z

Anyone have more feedback for this one or shall I merge it?

potiuk · 2023-03-01T06:36:35Z

@dstandish ?

…pod-error-in-UI-for-k8s-executor

o-nikolas · 2023-03-09T00:18:54Z

Shall we just merge this fix then @potiuk @eladkal? Worst case we can always follow-up with more changes.

pierrejeambrun · 2023-03-23T17:23:41Z

Conflicting, requires #29482 and #28161. Marking for 2.6.

boring-cyborg bot added the area:logging label Jan 9, 2023

snjypl mentioned this pull request Jan 9, 2023

Fix manual task trigger failing for k8s. #28394

Merged

ashb requested a review from dstandish January 11, 2023 11:26

snjypl force-pushed the bugfix/Fix-Unable-to-fetch-logs-from-worker-pod-error-in-UI-for-k8s-executor branch 4 times, most recently from 752c046 to 14724ce Compare January 18, 2023 12:17

snjypl force-pushed the bugfix/Fix-Unable-to-fetch-logs-from-worker-pod-error-in-UI-for-k8s-executor branch 2 times, most recently from 05f0d7d to aa64a4f Compare January 23, 2023 10:50

eladkal added this to the Airflow 2.5.2 milestone Jan 23, 2023

snjypl force-pushed the bugfix/Fix-Unable-to-fetch-logs-from-worker-pod-error-in-UI-for-k8s-executor branch 2 times, most recently from b54cdcc to 6d1d744 Compare January 24, 2023 11:52

snjypl force-pushed the bugfix/Fix-Unable-to-fetch-logs-from-worker-pod-error-in-UI-for-k8s-executor branch from 6d1d744 to bb17bf5 Compare February 14, 2023 20:17

snjypl requested review from XD-DENG, ashb, jedcunningham, kaxil and o-nikolas as code owners February 14, 2023 20:17

o-nikolas approved these changes Feb 14, 2023

View reviewed changes

dstandish previously requested changes Feb 14, 2023

View reviewed changes

snjypl force-pushed the bugfix/Fix-Unable-to-fetch-logs-from-worker-pod-error-in-UI-for-k8s-executor branch 5 times, most recently from 190d348 to 60f7d12 Compare February 16, 2023 14:45

snjypl force-pushed the bugfix/Fix-Unable-to-fetch-logs-from-worker-pod-error-in-UI-for-k8s-executor branch from 60f7d12 to 7357b53 Compare February 27, 2023 07:03

potiuk approved these changes Mar 1, 2023

View reviewed changes

pierrejeambrun added the type:bug-fix Changelog: Bug Fixes label Mar 1, 2023

Merge branch 'main' into bugfix/Fix-Unable-to-fetch-logs-from-worker-…

75a4507

…pod-error-in-UI-for-k8s-executor

eladkal modified the milestones: Airflow 2.5.2, Airflow 2.5.3 Mar 9, 2023

eladkal approved these changes Mar 9, 2023

View reviewed changes

eladkal merged commit f5ed4d5 into apache:main Mar 9, 2023

snjypl deleted the bugfix/Fix-Unable-to-fetch-logs-from-worker-pod-error-in-UI-for-k8s-executor branch March 9, 2023 15:13

pierrejeambrun modified the milestones: Airflow 2.5.3, Airflow 2.6.0 Mar 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Unable to fetch logs from worker pod error in UI for k8s executor #28817

Fix Unable to fetch logs from worker pod error in UI for k8s executor #28817

Uh oh!

snjypl commented Jan 9, 2023 •

edited

Loading

Uh oh!

ashb commented Jan 11, 2023

Uh oh!

dstandish commented Feb 10, 2023

Uh oh!

o-nikolas left a comment

Uh oh!

dstandish Feb 14, 2023

Uh oh!

o-nikolas Feb 14, 2023

Uh oh!

o-nikolas commented Mar 1, 2023

Uh oh!

potiuk commented Mar 1, 2023

Uh oh!

o-nikolas commented Mar 9, 2023

Uh oh!

pierrejeambrun commented Mar 23, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

	selector = PodGenerator.build_selector_for_k8s_executor_pod(
	dag_id=ti.dag_id,
	task_id=ti.task_id,
	try_number=ti.try_number,
	map_index=ti.map_index,
	run_id=ti.run_id,
	airflow_worker=ti.queued_by_job_id,
	)

	def read(self, task_instance, try_number=None, metadata=None):
	"""
	Read logs of given task instance from local machine.

	:param task_instance: task instance object
	:param try_number: task instance try_number to read logs from. If None
	it returns all logs separated by try_number
	:param metadata: log metadata, can be used for steaming log reading and auto-tailing.
	:return: a list of listed tuples which order log string by host
	"""
	# Task instance increments its try number when it starts to run.
	# So the log for a particular task try will only show up when
	# try number gets incremented in DB, i.e logs produced the time
	# after cli run and before try_number + 1 in DB will not be displayed.
	if try_number is None:
	next_try = task_instance.next_try_number
	try_numbers = list(range(1, next_try))
	elif try_number < 1:
	logs = [
	[("default_host", f"Error fetching the logs. Try number {try_number} is invalid.")],
	]
	return logs, [{"end_of_log": True}]
	else:
	try_numbers = [try_number]

	logs = [""] * len(try_numbers)
	metadata_array = [{}] * len(try_numbers)

	# subclasses implement _read and may not have log_type, which was added recently
	for i, try_number_element in enumerate(try_numbers):
	log, out_metadata = self._read(task_instance, try_number_element, metadata)
	# es_task_handler return logs grouped by host. wrap other handler returning log string
	# with default/ empty host so that UI can render the response in the same way
	logs[i] = log if self._read_grouped_logs() else [(task_instance.hostname, log)]
	metadata_array[i] = out_metadata

	def _read(
	self,
	ti: TaskInstance,
	try_number: int,
	metadata: dict[str, Any] \| None = None,
	):
	"""
	Template method that contains custom logic of reading
	logs given the try_number.

	:param ti: task instance record
	:param try_number: current try_number to read log from
	:param metadata: log metadata,
	can be used for steaming log reading and auto-tailing.
	Following attributes are used:
	log_pos: (absolute) Char position to which the log
	which was retrieved in previous calls, this
	part will be skipped and only following test
	returned to be added to tail.
	:return: log message as a string and metadata.
	Following attributes are used in metadata:
	end_of_log: Boolean, True if end of log is reached or False
	if further calls might get more log text.
	This is determined by the status of the TaskInstance
	log_pos: (absolute) Char position to which the log is retrieved
	"""
	# Task instance here might be different from task instance when
	# initializing the handler. Thus explicitly getting log location
	# is needed to get correct log path.
	worker_log_rel_path = self._render_filename(ti, try_number)
	messages_list: list[str] = []
	remote_logs: list[str] = []
	running_logs: list[str] = []
	local_logs: list[str] = []
	executor_messages: list[str] = []
	executor_logs: list[str] = []
	served_logs: list[str] = []
	with suppress(NotImplementedError):
	remote_messages, remote_logs = self._read_remote_logs(ti, try_number, metadata)
	messages_list.extend(remote_messages)
	if ti.state == TaskInstanceState.RUNNING:
	response = self._executor_get_task_log(ti)

Fix Unable to fetch logs from worker pod error in UI for k8s executor #28817

Fix Unable to fetch logs from worker pod error in UI for k8s executor #28817

Uh oh!

Conversation

snjypl commented Jan 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ashb commented Jan 11, 2023

Uh oh!

dstandish commented Feb 10, 2023

Uh oh!

o-nikolas left a comment

Choose a reason for hiding this comment

Uh oh!

dstandish Feb 14, 2023

Choose a reason for hiding this comment

Uh oh!

o-nikolas Feb 14, 2023

Choose a reason for hiding this comment

Uh oh!

o-nikolas commented Mar 1, 2023

Uh oh!

potiuk commented Mar 1, 2023

Uh oh!

o-nikolas commented Mar 9, 2023

Uh oh!

pierrejeambrun commented Mar 23, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

snjypl commented Jan 9, 2023 •

edited

Loading