-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Closed
Labels
kind:bugThis is a clearly a bugThis is a clearly a bugneeds-triagelabel for new issues that we didn't triage yetlabel for new issues that we didn't triage yettelemetryTelemetry-related issuesTelemetry-related issues
Description
Apache Airflow version
2.7.1
What happened
When running a dag an error ocurred. The error says that there is a metric with an invalid name. This causes that the task of the dag is set up for retry. Then the task executes again and is marked as success.
2023-10-10 13:05:21 [2023-10-10T11:05:21.738+0000] {local_executor.py:135} ERROR - Failed to execute task Invalid stat name: ***.dag.cwf_path_inspector_generator.delete-xcom-task.queued_duration. Please see https://opentelemetry.io/docs/reference/specification/metrics/api/#instrument-name-syntax.
2023-10-10 13:05:21 Traceback (most recent call last):
2023-10-10 13:05:21 File "/usr/local/lib/python3.10/dist-packages/airflow/executors/local_executor.py", line 131, in _execute_work_in_fork
2023-10-10 13:05:21 args.func(args)
2023-10-10 13:05:21 File "/usr/local/lib/python3.10/dist-packages/airflow/cli/cli_config.py", line 49, in command
2023-10-10 13:05:21 return func(*args, **kwargs)
2023-10-10 13:05:21 File "/usr/local/lib/python3.10/dist-packages/airflow/utils/cli.py", line 113, in wrapper
2023-10-10 13:05:21 return f(*args, **kwargs)
2023-10-10 13:05:21 File "/usr/local/lib/python3.10/dist-packages/airflow/cli/commands/task_command.py", line 430, in task_run
2023-10-10 13:05:21 task_return_code = _run_task_by_selected_method(args, _dag, ti)
2023-10-10 13:05:21 File "/usr/local/lib/python3.10/dist-packages/airflow/cli/commands/task_command.py", line 208, in _run_task_by_selected_method
2023-10-10 13:05:21 return _run_task_by_local_task_job(args, ti)
2023-10-10 13:05:21 File "/usr/local/lib/python3.10/dist-packages/airflow/cli/commands/task_command.py", line 270, in _run_task_by_local_task_job
2023-10-10 13:05:21 ret = run_job(job=job_runner.job, execute_callable=job_runner._execute)
2023-10-10 13:05:21 File "/usr/local/lib/python3.10/dist-packages/airflow/utils/session.py", line 77, in wrapper
2023-10-10 13:05:21 return func(*args, session=session, **kwargs)
2023-10-10 13:05:21 File "/usr/local/lib/python3.10/dist-packages/airflow/jobs/job.py", line 289, in run_job
2023-10-10 13:05:21 return execute_job(job, execute_callable=execute_callable)
2023-10-10 13:05:21 File "/usr/local/lib/python3.10/dist-packages/airflow/jobs/job.py", line 318, in execute_job
2023-10-10 13:05:21 ret = execute_callable()
2023-10-10 13:05:21 File "/usr/local/lib/python3.10/dist-packages/airflow/jobs/local_task_job_runner.py", line 143, in _execute
2023-10-10 13:05:21 if not self.task_instance.check_and_change_state_before_execution(
2023-10-10 13:05:21 File "/usr/local/lib/python3.10/dist-packages/airflow/utils/session.py", line 77, in wrapper
2023-10-10 13:05:21 return func(*args, session=session, **kwargs)
2023-10-10 13:05:21 File "/usr/local/lib/python3.10/dist-packages/airflow/models/taskinstance.py", line 1366, in check_and_change_state_before_execution
2023-10-10 13:05:21 self.emit_state_change_metric(TaskInstanceState.RUNNING)
2023-10-10 13:05:21 File "/usr/local/lib/python3.10/dist-packages/airflow/models/taskinstance.py", line 1450, in emit_state_change_metric
2023-10-10 13:05:21 Stats.timing(f"dag.{self.dag_id}.{self.task_id}.{metric_name}", timing)
2023-10-10 13:05:21 File "/usr/local/lib/python3.10/dist-packages/airflow/metrics/otel_logger.py", line 266, in timing
2023-10-10 13:05:21 if self.metrics_validator.test(stat) and name_is_otel_safe(self.prefix, stat):
2023-10-10 13:05:21 File "/usr/local/lib/python3.10/dist-packages/airflow/metrics/otel_logger.py", line 95, in name_is_otel_safe
2023-10-10 13:05:21 return bool(stat_name_otel_handler(prefix, name, max_length=OTEL_NAME_MAX_LENGTH))
2023-10-10 13:05:21 File "/usr/local/lib/python3.10/dist-packages/airflow/metrics/validators.py", line 142, in stat_name_otel_handler
2023-10-10 13:05:21 raise InvalidStatsNameException(
2023-10-10 13:05:21 airflow.exceptions.InvalidStatsNameException: Invalid stat name: ***.dag.cwf_path_inspector_generator.delete-xcom-task.queued_duration. Please see https://opentelemetry.io/docs/reference/specification/metrics/api/#instrument-name-syntaxWhat you think should happen instead
There should not be an error with the name of a default metric causing a task to retry.
How to reproduce
Enable opentelemetry in airflow.cfg:
otel_on = True
otel_host = breeze-otel-collector
otel_port = 4318
otel_prefix = airflow
otel_interval_milliseconds = 30000 # The interval between exports, defaults to 60000
otel_ssl_active = FalseRun opentelemetry collector docker:
otel-collector:
image: otel/opentelemetry-collector-contrib:0.70.0
container_name: "breeze-otel-collector"
command: [--config=/etc/otel-collector-config.yml]
volumes:
- ./otel-collector/otel-collector-config.yml:/etc/otel-collector-config.yml
# - ./otel-collector/keys:/etc/keys
ports:
- "24318:4318" # OTLP http receiver
- "28889:8889" # Prometheus exporter metricsOperating System
Ubuntu 22.04.3 LTS
Versions of Apache Airflow Providers
No response
Deployment
Docker-Compose
Deployment details
No response
Anything else
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct
Metadata
Metadata
Assignees
Labels
kind:bugThis is a clearly a bugThis is a clearly a bugneeds-triagelabel for new issues that we didn't triage yetlabel for new issues that we didn't triage yettelemetryTelemetry-related issuesTelemetry-related issues