Skip to content

Conversation

@brouberol
Copy link
Contributor

@brouberol brouberol commented Mar 17, 2025

I would like to propose adding base_container_name to the KubernetesPodOperator templated fields.

The rationale is that the base container name is part of the log lines emitted by the KubernetesPodManager, which is a good opportunity to have it give as much context as possible. It can also makes searching logs in observability tooling easier.

Screenshot 2025-03-17 at 15 44 22

For example, in a Wikimedia DAG, we defined the following operators:

class WikimediaDumpOperator(KubernetesPodOperator):
    """
    Base class for all types of wiki dumps run as Kubernetes Pods.
    """

    dump_type = "generic"

    def __init__(self, wiki: str, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.wiki = wiki

        # Name of the "dumps" container (default is base, which isn't super telling)
        self.base_container_name = f"mediawiki-{self.dump_type}-dump"

        # name of the pod itself
        # made templated in https://github.com/apache/airflow/pull/46268
        self.name = f"{self.base_container_name}-{wiki}"


class WikimediaSqlXmlDumpsOperator(WikimediaDumpOperator):
    """Operator class running the sql/xml wiki dumps as Kubernetes Pods"""

    dump_type = "sql-xml"


class WikimediaWikidataDumpsOperator(WikimediaDumpOperator):
    """Operator class running the wikidata dumps as Kubernetes Pods"""

    dump_type = "wikidata"

Adding base_container_name to the templated fields would allow us to rewrite the WikimediaDumpOperator to the following:

class WikimediaDumpOperator(KubernetesPodOperator):
    """
    Base class for all types of wiki dumps run as Kubernetes Pods.
    """

    dump_type = "generic"

    def __init__(self, wiki: str, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.wiki = wiki

and we could invoke the operator as such:

WikimediaSqlXmlOperator(
    ...,
    wiki="value",
    base_container_name='mediawiki-{{ task.dump_type }}-dump',
    name='{{ task.base_container_name }}-{{ task.wiki }}'
    ...
)

The endgame would be to have our logs contain as much context as possible while avoiding mixing passing both keyword args to the conttructor and infering some attributes within the __init__ method itself.

@boring-cyborg boring-cyborg bot added area:providers provider:cncf-kubernetes Kubernetes (k8s) provider related issues labels Mar 17, 2025
@brouberol brouberol force-pushed the k8spodoperator-base-container-name-tpl branch from a1202cd to 8d56835 Compare March 17, 2025 15:19
@brouberol brouberol marked this pull request as ready for review March 17, 2025 19:58
I would like to propose adding `base_container_name` to the
`KubernetesPodOperator` templated fields.

The rationale is that the base container name is part of the log lines
emitted by the KubernetesPodManager, which is a good opportunity to have
it give as much context as possible.

For example, in a Wikimedia DAG, we defined the following operators:

```python
class WikimediaDumpOperator(KubernetesPodOperator):
    """
    Base class for all types of wiki dumps run as Kubernetes Pods.
    """

    dump_type = "generic"

    def __init__(self, wiki: str, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.wiki = wiki

        # Name of the "dumps" container (default is base, which isn't super telling)
        self.base_container_name = f"mediawiki-{self.dump_type}-dump"

        # name of the pod itself
        # made templated in apache#46268
        self.name = f"{self.base_container_name}-{wiki}"

class WikimediaSqlXmlDumpsOperator(WikimediaDumpOperator):
    """Operator class running the sql/xml wiki dumps as Kubernetes Pods"""

    dump_type = "sql-xml"

class WikimediaWikidataDumpsOperator(WikimediaDumpOperator):
    """Operator class running the wikidata dumps as Kubernetes Pods"""

    dump_type = "wikidata"

```

Adding `base_container_name` to the templated fields would allow us to
rewrite the `WikimediaDumpOperator` to the following:

```python
class WikimediaDumpOperator(KubernetesPodOperator):
    """
    Base class for all types of wiki dumps run as Kubernetes Pods.
    """

    dump_type = "generic"

    def __init__(self, wiki: str, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.wiki = wiki
```

and we could invoke the operator as such:

```python
WikimediaSqlXmlOperator(
    ...,
    base_container_name='mediawiki-{{ task.dump_type }}-dump',
    name='{{ task.base_container_name }}-{{ task.wiki }}'
    ...
)
```

The endgame would be to have our logs contain as much context as
possible while avoiding mixing passing both keyword args to the
conttructor _and_ infering some attributes _within_ the `__init__`
method itself.
@brouberol brouberol force-pushed the k8spodoperator-base-container-name-tpl branch from 65b1e44 to 4238034 Compare March 17, 2025 20:04
Copy link
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable for me.

@jscheffl jscheffl merged commit 204020a into apache:main Mar 22, 2025
70 checks passed
shubham-pyc pushed a commit to shubham-pyc/airflow that referenced this pull request Apr 2, 2025
apache#47864)

I would like to propose adding `base_container_name` to the
`KubernetesPodOperator` templated fields.

The rationale is that the base container name is part of the log lines
emitted by the KubernetesPodManager, which is a good opportunity to have
it give as much context as possible.

For example, in a Wikimedia DAG, we defined the following operators:

```python
class WikimediaDumpOperator(KubernetesPodOperator):
    """
    Base class for all types of wiki dumps run as Kubernetes Pods.
    """

    dump_type = "generic"

    def __init__(self, wiki: str, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.wiki = wiki

        # Name of the "dumps" container (default is base, which isn't super telling)
        self.base_container_name = f"mediawiki-{self.dump_type}-dump"

        # name of the pod itself
        # made templated in apache#46268
        self.name = f"{self.base_container_name}-{wiki}"

class WikimediaSqlXmlDumpsOperator(WikimediaDumpOperator):
    """Operator class running the sql/xml wiki dumps as Kubernetes Pods"""

    dump_type = "sql-xml"

class WikimediaWikidataDumpsOperator(WikimediaDumpOperator):
    """Operator class running the wikidata dumps as Kubernetes Pods"""

    dump_type = "wikidata"

```

Adding `base_container_name` to the templated fields would allow us to
rewrite the `WikimediaDumpOperator` to the following:

```python
class WikimediaDumpOperator(KubernetesPodOperator):
    """
    Base class for all types of wiki dumps run as Kubernetes Pods.
    """

    dump_type = "generic"

    def __init__(self, wiki: str, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.wiki = wiki
```

and we could invoke the operator as such:

```python
WikimediaSqlXmlOperator(
    ...,
    base_container_name='mediawiki-{{ task.dump_type }}-dump',
    name='{{ task.base_container_name }}-{{ task.wiki }}'
    ...
)
```

The endgame would be to have our logs contain as much context as
possible while avoiding mixing passing both keyword args to the
conttructor _and_ infering some attributes _within_ the `__init__`
method itself.
nailo2c pushed a commit to nailo2c/airflow that referenced this pull request Apr 4, 2025
apache#47864)

I would like to propose adding `base_container_name` to the
`KubernetesPodOperator` templated fields.

The rationale is that the base container name is part of the log lines
emitted by the KubernetesPodManager, which is a good opportunity to have
it give as much context as possible.

For example, in a Wikimedia DAG, we defined the following operators:

```python
class WikimediaDumpOperator(KubernetesPodOperator):
    """
    Base class for all types of wiki dumps run as Kubernetes Pods.
    """

    dump_type = "generic"

    def __init__(self, wiki: str, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.wiki = wiki

        # Name of the "dumps" container (default is base, which isn't super telling)
        self.base_container_name = f"mediawiki-{self.dump_type}-dump"

        # name of the pod itself
        # made templated in apache#46268
        self.name = f"{self.base_container_name}-{wiki}"

class WikimediaSqlXmlDumpsOperator(WikimediaDumpOperator):
    """Operator class running the sql/xml wiki dumps as Kubernetes Pods"""

    dump_type = "sql-xml"

class WikimediaWikidataDumpsOperator(WikimediaDumpOperator):
    """Operator class running the wikidata dumps as Kubernetes Pods"""

    dump_type = "wikidata"

```

Adding `base_container_name` to the templated fields would allow us to
rewrite the `WikimediaDumpOperator` to the following:

```python
class WikimediaDumpOperator(KubernetesPodOperator):
    """
    Base class for all types of wiki dumps run as Kubernetes Pods.
    """

    dump_type = "generic"

    def __init__(self, wiki: str, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.wiki = wiki
```

and we could invoke the operator as such:

```python
WikimediaSqlXmlOperator(
    ...,
    base_container_name='mediawiki-{{ task.dump_type }}-dump',
    name='{{ task.base_container_name }}-{{ task.wiki }}'
    ...
)
```

The endgame would be to have our logs contain as much context as
possible while avoiding mixing passing both keyword args to the
conttructor _and_ infering some attributes _within_ the `__init__`
method itself.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:cncf-kubernetes Kubernetes (k8s) provider related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants