Skip to content

Conversation

@insomnes
Copy link
Contributor

@insomnes insomnes commented Jan 30, 2025

Adding name and hostname to KubernetesPodOperator template_fields

  • Move name validation and normalization inside execute method
  • Expand KubernetesPodOperator tests
  • Refactor create application SparkKubernetesOperator due to bug in tests
  • Fix kubernetes_tests/test_kubernetes_pod_operator.py too long name testing flow and add this case to unit tests

closes: #43480
Based on previously staled PR


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added area:providers provider:cncf-kubernetes Kubernetes (k8s) provider related issues labels Jan 30, 2025
@insomnes
Copy link
Contributor Author

insomnes commented Jan 30, 2025

I was interested in this feature so I've bring back staled PR with some extra tests, maybe we can succeed with it this time

@insomnes
Copy link
Contributor Author

I think, I've found the bug in spark k8s operator test code. Will check and try to fix for affected test and bring the bug issue to fix any other occurrences

@insomnes
Copy link
Contributor Author

insomnes commented Jan 31, 2025

The mentioned bug in the spark k8s operator test is related to mocking and new changes.

Previously, tests were failed because the new KPO execute() validates the name set for a pod on an operator by calling _set_name (because of templating changes, we need to have context applied to templates before name validation).
Thus the tests in k8s spark operator with mock_create_job_name could not succeed. MagickMock is not a str instance, so validation failed.

This led me to understand that calls to assert op.name.startswith("...") are broken in these tests, otherwise, they would fail too (this is checked after operator execution). My investigation showed that changing prefix to anything like name.startswith("ABRA-CADABRA") didn't lead to test fails. Functions with applied unittest.patch actually return MagicMock object, so the op.name was a MagicMock instance in these tests.

Any method call on instance of MagickMock doesn't fail by design and returns another MagicMock object. They lead to positive assertions. That's why I have refactored affected spark k8s operator tests. I've extracted create_application tests to a separate class where create_job_name is not patched.

I believe that all affected tests are fixed now. So there is no need for a new issue.

@insomnes insomnes changed the title Add name and hostname to KPO template_fields Add name to KPO template_fields Jan 31, 2025
@insomnes insomnes changed the title Add name to KPO template_fields Add name and hostname to KPO template_fields Jan 31, 2025
@insomnes insomnes changed the title Add name and hostname to KPO template_fields Expand KPO template_fields, fix Spark k8s operator tests Jan 31, 2025
@shahar1
Copy link
Contributor

shahar1 commented Feb 1, 2025

LGTM!

@shahar1 shahar1 merged commit 6235002 into apache:main Feb 1, 2025
71 checks passed
@insomnes insomnes deleted the kpo-name-template branch February 1, 2025 20:51
amoghrajesh pushed a commit to astronomer/airflow that referenced this pull request Feb 3, 2025
* Add name and hostname to KPO template_fields

* Add fetch container mock to name normalization test

* Fix bugged tests

* Run execute in test_pod_name to get validation fail

* Add long name case to KPO unit tests
dabla pushed a commit to dabla/airflow that referenced this pull request Feb 3, 2025
* Add name and hostname to KPO template_fields

* Add fetch container mock to name normalization test

* Fix bugged tests

* Run execute in test_pod_name to get validation fail

* Add long name case to KPO unit tests
niklasr22 pushed a commit to niklasr22/airflow that referenced this pull request Feb 8, 2025
* Add name and hostname to KPO template_fields

* Add fetch container mock to name normalization test

* Fix bugged tests

* Run execute in test_pod_name to get validation fail

* Add long name case to KPO unit tests
ambika-garg pushed a commit to ambika-garg/airflow that referenced this pull request Feb 17, 2025
* Add name and hostname to KPO template_fields

* Add fetch container mock to name normalization test

* Fix bugged tests

* Run execute in test_pod_name to get validation fail

* Add long name case to KPO unit tests
brouberol added a commit to brouberol/airflow that referenced this pull request Mar 17, 2025
I would like to propose adding `base_container_name` to the
`KubernetesPodOperator` templated fields.

The rationale is that the base container name is part of the log lines
emitted by the KubernetesPodManager, which is a good opportunity to have
it give as much context as possible.

For example, in a Wikimedia DAG, we defined the following operators:

```python
class WikimediaDumpOperator(KubernetesPodOperator):
    """
    Base class for all types of wiki dumps run as Kubernetes Pods.
    """

    dump_type = "generic"

    def __init__(self, wiki: str, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.wiki = wiki

        # Name of the "dumps" container (default is base, which isn't super telling)
        self.base_container_name = f"mediawiki-{self.dump_type}-dump"

        # name of the pod itself
        # made templated in apache#46268
        self.name = f"{self.base_container_name}-{wiki}"

class WikimediaSqlXmlDumpsOperator(WikimediaDumpOperator):
    """Operator class running the sql/xml wiki dumps as Kubernetes Pods"""

    dump_type = "sql-xml"

class WikimediaWikidataDumpsOperator(WikimediaDumpOperator):
    """Operator class running the wikidata dumps as Kubernetes Pods"""

    dump_type = "wikidata"

```

Adding `base_container_name` to the templated fields would allow us to
rewrite the `WikimediaDumpOperator` to the following:

```python
class WikimediaDumpOperator(KubernetesPodOperator):
    """
    Base class for all types of wiki dumps run as Kubernetes Pods.
    """

    dump_type = "generic"

    def __init__(self, wiki: str, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.wiki = wiki
```

and we could invoke the operator as such:

```python
WikimediaSqlXmlOperator(
    ...,
    base_container_name='mediawiki-{{ task.dump_type }}-dump',
    name='{{ task.base_container_name }}-{{ task.wiki }}'
    ...
)
```

The endgame would be to have our logs contain as much context as
possible while avoiding mixing passing both keyword args to the
conttructor _and_ infering some attributes _within_ the `__init__`
method itself.
brouberol added a commit to brouberol/airflow that referenced this pull request Mar 17, 2025
I would like to propose adding `base_container_name` to the
`KubernetesPodOperator` templated fields.

The rationale is that the base container name is part of the log lines
emitted by the KubernetesPodManager, which is a good opportunity to have
it give as much context as possible.

For example, in a Wikimedia DAG, we defined the following operators:

```python
class WikimediaDumpOperator(KubernetesPodOperator):
    """
    Base class for all types of wiki dumps run as Kubernetes Pods.
    """

    dump_type = "generic"

    def __init__(self, wiki: str, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.wiki = wiki

        # Name of the "dumps" container (default is base, which isn't super telling)
        self.base_container_name = f"mediawiki-{self.dump_type}-dump"

        # name of the pod itself
        # made templated in apache#46268
        self.name = f"{self.base_container_name}-{wiki}"

class WikimediaSqlXmlDumpsOperator(WikimediaDumpOperator):
    """Operator class running the sql/xml wiki dumps as Kubernetes Pods"""

    dump_type = "sql-xml"

class WikimediaWikidataDumpsOperator(WikimediaDumpOperator):
    """Operator class running the wikidata dumps as Kubernetes Pods"""

    dump_type = "wikidata"

```

Adding `base_container_name` to the templated fields would allow us to
rewrite the `WikimediaDumpOperator` to the following:

```python
class WikimediaDumpOperator(KubernetesPodOperator):
    """
    Base class for all types of wiki dumps run as Kubernetes Pods.
    """

    dump_type = "generic"

    def __init__(self, wiki: str, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.wiki = wiki
```

and we could invoke the operator as such:

```python
WikimediaSqlXmlOperator(
    ...,
    base_container_name='mediawiki-{{ task.dump_type }}-dump',
    name='{{ task.base_container_name }}-{{ task.wiki }}'
    ...
)
```

The endgame would be to have our logs contain as much context as
possible while avoiding mixing passing both keyword args to the
conttructor _and_ infering some attributes _within_ the `__init__`
method itself.
brouberol added a commit to brouberol/airflow that referenced this pull request Mar 17, 2025
I would like to propose adding `base_container_name` to the
`KubernetesPodOperator` templated fields.

The rationale is that the base container name is part of the log lines
emitted by the KubernetesPodManager, which is a good opportunity to have
it give as much context as possible.

For example, in a Wikimedia DAG, we defined the following operators:

```python
class WikimediaDumpOperator(KubernetesPodOperator):
    """
    Base class for all types of wiki dumps run as Kubernetes Pods.
    """

    dump_type = "generic"

    def __init__(self, wiki: str, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.wiki = wiki

        # Name of the "dumps" container (default is base, which isn't super telling)
        self.base_container_name = f"mediawiki-{self.dump_type}-dump"

        # name of the pod itself
        # made templated in apache#46268
        self.name = f"{self.base_container_name}-{wiki}"

class WikimediaSqlXmlDumpsOperator(WikimediaDumpOperator):
    """Operator class running the sql/xml wiki dumps as Kubernetes Pods"""

    dump_type = "sql-xml"

class WikimediaWikidataDumpsOperator(WikimediaDumpOperator):
    """Operator class running the wikidata dumps as Kubernetes Pods"""

    dump_type = "wikidata"

```

Adding `base_container_name` to the templated fields would allow us to
rewrite the `WikimediaDumpOperator` to the following:

```python
class WikimediaDumpOperator(KubernetesPodOperator):
    """
    Base class for all types of wiki dumps run as Kubernetes Pods.
    """

    dump_type = "generic"

    def __init__(self, wiki: str, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.wiki = wiki
```

and we could invoke the operator as such:

```python
WikimediaSqlXmlOperator(
    ...,
    base_container_name='mediawiki-{{ task.dump_type }}-dump',
    name='{{ task.base_container_name }}-{{ task.wiki }}'
    ...
)
```

The endgame would be to have our logs contain as much context as
possible while avoiding mixing passing both keyword args to the
conttructor _and_ infering some attributes _within_ the `__init__`
method itself.
jscheffl pushed a commit that referenced this pull request Mar 22, 2025
#47864)

I would like to propose adding `base_container_name` to the
`KubernetesPodOperator` templated fields.

The rationale is that the base container name is part of the log lines
emitted by the KubernetesPodManager, which is a good opportunity to have
it give as much context as possible.

For example, in a Wikimedia DAG, we defined the following operators:

```python
class WikimediaDumpOperator(KubernetesPodOperator):
    """
    Base class for all types of wiki dumps run as Kubernetes Pods.
    """

    dump_type = "generic"

    def __init__(self, wiki: str, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.wiki = wiki

        # Name of the "dumps" container (default is base, which isn't super telling)
        self.base_container_name = f"mediawiki-{self.dump_type}-dump"

        # name of the pod itself
        # made templated in #46268
        self.name = f"{self.base_container_name}-{wiki}"

class WikimediaSqlXmlDumpsOperator(WikimediaDumpOperator):
    """Operator class running the sql/xml wiki dumps as Kubernetes Pods"""

    dump_type = "sql-xml"

class WikimediaWikidataDumpsOperator(WikimediaDumpOperator):
    """Operator class running the wikidata dumps as Kubernetes Pods"""

    dump_type = "wikidata"

```

Adding `base_container_name` to the templated fields would allow us to
rewrite the `WikimediaDumpOperator` to the following:

```python
class WikimediaDumpOperator(KubernetesPodOperator):
    """
    Base class for all types of wiki dumps run as Kubernetes Pods.
    """

    dump_type = "generic"

    def __init__(self, wiki: str, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.wiki = wiki
```

and we could invoke the operator as such:

```python
WikimediaSqlXmlOperator(
    ...,
    base_container_name='mediawiki-{{ task.dump_type }}-dump',
    name='{{ task.base_container_name }}-{{ task.wiki }}'
    ...
)
```

The endgame would be to have our logs contain as much context as
possible while avoiding mixing passing both keyword args to the
conttructor _and_ infering some attributes _within_ the `__init__`
method itself.
shubham-pyc pushed a commit to shubham-pyc/airflow that referenced this pull request Apr 2, 2025
apache#47864)

I would like to propose adding `base_container_name` to the
`KubernetesPodOperator` templated fields.

The rationale is that the base container name is part of the log lines
emitted by the KubernetesPodManager, which is a good opportunity to have
it give as much context as possible.

For example, in a Wikimedia DAG, we defined the following operators:

```python
class WikimediaDumpOperator(KubernetesPodOperator):
    """
    Base class for all types of wiki dumps run as Kubernetes Pods.
    """

    dump_type = "generic"

    def __init__(self, wiki: str, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.wiki = wiki

        # Name of the "dumps" container (default is base, which isn't super telling)
        self.base_container_name = f"mediawiki-{self.dump_type}-dump"

        # name of the pod itself
        # made templated in apache#46268
        self.name = f"{self.base_container_name}-{wiki}"

class WikimediaSqlXmlDumpsOperator(WikimediaDumpOperator):
    """Operator class running the sql/xml wiki dumps as Kubernetes Pods"""

    dump_type = "sql-xml"

class WikimediaWikidataDumpsOperator(WikimediaDumpOperator):
    """Operator class running the wikidata dumps as Kubernetes Pods"""

    dump_type = "wikidata"

```

Adding `base_container_name` to the templated fields would allow us to
rewrite the `WikimediaDumpOperator` to the following:

```python
class WikimediaDumpOperator(KubernetesPodOperator):
    """
    Base class for all types of wiki dumps run as Kubernetes Pods.
    """

    dump_type = "generic"

    def __init__(self, wiki: str, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.wiki = wiki
```

and we could invoke the operator as such:

```python
WikimediaSqlXmlOperator(
    ...,
    base_container_name='mediawiki-{{ task.dump_type }}-dump',
    name='{{ task.base_container_name }}-{{ task.wiki }}'
    ...
)
```

The endgame would be to have our logs contain as much context as
possible while avoiding mixing passing both keyword args to the
conttructor _and_ infering some attributes _within_ the `__init__`
method itself.
nailo2c pushed a commit to nailo2c/airflow that referenced this pull request Apr 4, 2025
apache#47864)

I would like to propose adding `base_container_name` to the
`KubernetesPodOperator` templated fields.

The rationale is that the base container name is part of the log lines
emitted by the KubernetesPodManager, which is a good opportunity to have
it give as much context as possible.

For example, in a Wikimedia DAG, we defined the following operators:

```python
class WikimediaDumpOperator(KubernetesPodOperator):
    """
    Base class for all types of wiki dumps run as Kubernetes Pods.
    """

    dump_type = "generic"

    def __init__(self, wiki: str, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.wiki = wiki

        # Name of the "dumps" container (default is base, which isn't super telling)
        self.base_container_name = f"mediawiki-{self.dump_type}-dump"

        # name of the pod itself
        # made templated in apache#46268
        self.name = f"{self.base_container_name}-{wiki}"

class WikimediaSqlXmlDumpsOperator(WikimediaDumpOperator):
    """Operator class running the sql/xml wiki dumps as Kubernetes Pods"""

    dump_type = "sql-xml"

class WikimediaWikidataDumpsOperator(WikimediaDumpOperator):
    """Operator class running the wikidata dumps as Kubernetes Pods"""

    dump_type = "wikidata"

```

Adding `base_container_name` to the templated fields would allow us to
rewrite the `WikimediaDumpOperator` to the following:

```python
class WikimediaDumpOperator(KubernetesPodOperator):
    """
    Base class for all types of wiki dumps run as Kubernetes Pods.
    """

    dump_type = "generic"

    def __init__(self, wiki: str, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.wiki = wiki
```

and we could invoke the operator as such:

```python
WikimediaSqlXmlOperator(
    ...,
    base_container_name='mediawiki-{{ task.dump_type }}-dump',
    name='{{ task.base_container_name }}-{{ task.wiki }}'
    ...
)
```

The endgame would be to have our logs contain as much context as
possible while avoiding mixing passing both keyword args to the
conttructor _and_ infering some attributes _within_ the `__init__`
method itself.
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request May 28, 2025
…s (#47864)

I would like to propose adding `base_container_name` to the
`KubernetesPodOperator` templated fields.

The rationale is that the base container name is part of the log lines
emitted by the KubernetesPodManager, which is a good opportunity to have
it give as much context as possible.

For example, in a Wikimedia DAG, we defined the following operators:

```python
class WikimediaDumpOperator(KubernetesPodOperator):
    """
    Base class for all types of wiki dumps run as Kubernetes Pods.
    """

    dump_type = "generic"

    def __init__(self, wiki: str, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.wiki = wiki

        # Name of the "dumps" container (default is base, which isn't super telling)
        self.base_container_name = f"mediawiki-{self.dump_type}-dump"

        # name of the pod itself
        # made templated in apache/airflow#46268
        self.name = f"{self.base_container_name}-{wiki}"

class WikimediaSqlXmlDumpsOperator(WikimediaDumpOperator):
    """Operator class running the sql/xml wiki dumps as Kubernetes Pods"""

    dump_type = "sql-xml"

class WikimediaWikidataDumpsOperator(WikimediaDumpOperator):
    """Operator class running the wikidata dumps as Kubernetes Pods"""

    dump_type = "wikidata"

```

Adding `base_container_name` to the templated fields would allow us to
rewrite the `WikimediaDumpOperator` to the following:

```python
class WikimediaDumpOperator(KubernetesPodOperator):
    """
    Base class for all types of wiki dumps run as Kubernetes Pods.
    """

    dump_type = "generic"

    def __init__(self, wiki: str, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.wiki = wiki
```

and we could invoke the operator as such:

```python
WikimediaSqlXmlOperator(
    ...,
    base_container_name='mediawiki-{{ task.dump_type }}-dump',
    name='{{ task.base_container_name }}-{{ task.wiki }}'
    ...
)
```

The endgame would be to have our logs contain as much context as
possible while avoiding mixing passing both keyword args to the
conttructor _and_ infering some attributes _within_ the `__init__`
method itself.

GitOrigin-RevId: 204020a329d954dca14ef30ea7f72c25782da85b
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Sep 24, 2025
…s (#47864)

I would like to propose adding `base_container_name` to the
`KubernetesPodOperator` templated fields.

The rationale is that the base container name is part of the log lines
emitted by the KubernetesPodManager, which is a good opportunity to have
it give as much context as possible.

For example, in a Wikimedia DAG, we defined the following operators:

```python
class WikimediaDumpOperator(KubernetesPodOperator):
    """
    Base class for all types of wiki dumps run as Kubernetes Pods.
    """

    dump_type = "generic"

    def __init__(self, wiki: str, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.wiki = wiki

        # Name of the "dumps" container (default is base, which isn't super telling)
        self.base_container_name = f"mediawiki-{self.dump_type}-dump"

        # name of the pod itself
        # made templated in apache/airflow#46268
        self.name = f"{self.base_container_name}-{wiki}"

class WikimediaSqlXmlDumpsOperator(WikimediaDumpOperator):
    """Operator class running the sql/xml wiki dumps as Kubernetes Pods"""

    dump_type = "sql-xml"

class WikimediaWikidataDumpsOperator(WikimediaDumpOperator):
    """Operator class running the wikidata dumps as Kubernetes Pods"""

    dump_type = "wikidata"

```

Adding `base_container_name` to the templated fields would allow us to
rewrite the `WikimediaDumpOperator` to the following:

```python
class WikimediaDumpOperator(KubernetesPodOperator):
    """
    Base class for all types of wiki dumps run as Kubernetes Pods.
    """

    dump_type = "generic"

    def __init__(self, wiki: str, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.wiki = wiki
```

and we could invoke the operator as such:

```python
WikimediaSqlXmlOperator(
    ...,
    base_container_name='mediawiki-{{ task.dump_type }}-dump',
    name='{{ task.base_container_name }}-{{ task.wiki }}'
    ...
)
```

The endgame would be to have our logs contain as much context as
possible while avoiding mixing passing both keyword args to the
conttructor _and_ infering some attributes _within_ the `__init__`
method itself.

GitOrigin-RevId: 204020a329d954dca14ef30ea7f72c25782da85b
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Oct 22, 2025
…s (#47864)

I would like to propose adding `base_container_name` to the
`KubernetesPodOperator` templated fields.

The rationale is that the base container name is part of the log lines
emitted by the KubernetesPodManager, which is a good opportunity to have
it give as much context as possible.

For example, in a Wikimedia DAG, we defined the following operators:

```python
class WikimediaDumpOperator(KubernetesPodOperator):
    """
    Base class for all types of wiki dumps run as Kubernetes Pods.
    """

    dump_type = "generic"

    def __init__(self, wiki: str, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.wiki = wiki

        # Name of the "dumps" container (default is base, which isn't super telling)
        self.base_container_name = f"mediawiki-{self.dump_type}-dump"

        # name of the pod itself
        # made templated in apache/airflow#46268
        self.name = f"{self.base_container_name}-{wiki}"

class WikimediaSqlXmlDumpsOperator(WikimediaDumpOperator):
    """Operator class running the sql/xml wiki dumps as Kubernetes Pods"""

    dump_type = "sql-xml"

class WikimediaWikidataDumpsOperator(WikimediaDumpOperator):
    """Operator class running the wikidata dumps as Kubernetes Pods"""

    dump_type = "wikidata"

```

Adding `base_container_name` to the templated fields would allow us to
rewrite the `WikimediaDumpOperator` to the following:

```python
class WikimediaDumpOperator(KubernetesPodOperator):
    """
    Base class for all types of wiki dumps run as Kubernetes Pods.
    """

    dump_type = "generic"

    def __init__(self, wiki: str, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.wiki = wiki
```

and we could invoke the operator as such:

```python
WikimediaSqlXmlOperator(
    ...,
    base_container_name='mediawiki-{{ task.dump_type }}-dump',
    name='{{ task.base_container_name }}-{{ task.wiki }}'
    ...
)
```

The endgame would be to have our logs contain as much context as
possible while avoiding mixing passing both keyword args to the
conttructor _and_ infering some attributes _within_ the `__init__`
method itself.

GitOrigin-RevId: 204020a329d954dca14ef30ea7f72c25782da85b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:cncf-kubernetes Kubernetes (k8s) provider related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add KubernetesPodOperator's name in templated fields

2 participants