-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Expand KPO template_fields, fix Spark k8s operator tests #46268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I was interested in this feature so I've bring back staled PR with some extra tests, maybe we can succeed with it this time |
|
I think, I've found the bug in spark k8s operator test code. Will check and try to fix for affected test and bring the bug issue to fix any other occurrences |
2aeff58 to
0e5df15
Compare
|
The mentioned bug in the spark k8s operator test is related to mocking and new changes. Previously, tests were failed because the new KPO This led me to understand that calls to Any method call on instance of I believe that all affected tests are fixed now. So there is no need for a new issue. |
0e5df15 to
4016c3d
Compare
|
LGTM! |
* Add name and hostname to KPO template_fields * Add fetch container mock to name normalization test * Fix bugged tests * Run execute in test_pod_name to get validation fail * Add long name case to KPO unit tests
* Add name and hostname to KPO template_fields * Add fetch container mock to name normalization test * Fix bugged tests * Run execute in test_pod_name to get validation fail * Add long name case to KPO unit tests
* Add name and hostname to KPO template_fields * Add fetch container mock to name normalization test * Fix bugged tests * Run execute in test_pod_name to get validation fail * Add long name case to KPO unit tests
* Add name and hostname to KPO template_fields * Add fetch container mock to name normalization test * Fix bugged tests * Run execute in test_pod_name to get validation fail * Add long name case to KPO unit tests
I would like to propose adding `base_container_name` to the
`KubernetesPodOperator` templated fields.
The rationale is that the base container name is part of the log lines
emitted by the KubernetesPodManager, which is a good opportunity to have
it give as much context as possible.
For example, in a Wikimedia DAG, we defined the following operators:
```python
class WikimediaDumpOperator(KubernetesPodOperator):
"""
Base class for all types of wiki dumps run as Kubernetes Pods.
"""
dump_type = "generic"
def __init__(self, wiki: str, *args, **kwargs):
super().__init__(*args, **kwargs)
self.wiki = wiki
# Name of the "dumps" container (default is base, which isn't super telling)
self.base_container_name = f"mediawiki-{self.dump_type}-dump"
# name of the pod itself
# made templated in apache#46268
self.name = f"{self.base_container_name}-{wiki}"
class WikimediaSqlXmlDumpsOperator(WikimediaDumpOperator):
"""Operator class running the sql/xml wiki dumps as Kubernetes Pods"""
dump_type = "sql-xml"
class WikimediaWikidataDumpsOperator(WikimediaDumpOperator):
"""Operator class running the wikidata dumps as Kubernetes Pods"""
dump_type = "wikidata"
```
Adding `base_container_name` to the templated fields would allow us to
rewrite the `WikimediaDumpOperator` to the following:
```python
class WikimediaDumpOperator(KubernetesPodOperator):
"""
Base class for all types of wiki dumps run as Kubernetes Pods.
"""
dump_type = "generic"
def __init__(self, wiki: str, *args, **kwargs):
super().__init__(*args, **kwargs)
self.wiki = wiki
```
and we could invoke the operator as such:
```python
WikimediaSqlXmlOperator(
...,
base_container_name='mediawiki-{{ task.dump_type }}-dump',
name='{{ task.base_container_name }}-{{ task.wiki }}'
...
)
```
The endgame would be to have our logs contain as much context as
possible while avoiding mixing passing both keyword args to the
conttructor _and_ infering some attributes _within_ the `__init__`
method itself.
I would like to propose adding `base_container_name` to the
`KubernetesPodOperator` templated fields.
The rationale is that the base container name is part of the log lines
emitted by the KubernetesPodManager, which is a good opportunity to have
it give as much context as possible.
For example, in a Wikimedia DAG, we defined the following operators:
```python
class WikimediaDumpOperator(KubernetesPodOperator):
"""
Base class for all types of wiki dumps run as Kubernetes Pods.
"""
dump_type = "generic"
def __init__(self, wiki: str, *args, **kwargs):
super().__init__(*args, **kwargs)
self.wiki = wiki
# Name of the "dumps" container (default is base, which isn't super telling)
self.base_container_name = f"mediawiki-{self.dump_type}-dump"
# name of the pod itself
# made templated in apache#46268
self.name = f"{self.base_container_name}-{wiki}"
class WikimediaSqlXmlDumpsOperator(WikimediaDumpOperator):
"""Operator class running the sql/xml wiki dumps as Kubernetes Pods"""
dump_type = "sql-xml"
class WikimediaWikidataDumpsOperator(WikimediaDumpOperator):
"""Operator class running the wikidata dumps as Kubernetes Pods"""
dump_type = "wikidata"
```
Adding `base_container_name` to the templated fields would allow us to
rewrite the `WikimediaDumpOperator` to the following:
```python
class WikimediaDumpOperator(KubernetesPodOperator):
"""
Base class for all types of wiki dumps run as Kubernetes Pods.
"""
dump_type = "generic"
def __init__(self, wiki: str, *args, **kwargs):
super().__init__(*args, **kwargs)
self.wiki = wiki
```
and we could invoke the operator as such:
```python
WikimediaSqlXmlOperator(
...,
base_container_name='mediawiki-{{ task.dump_type }}-dump',
name='{{ task.base_container_name }}-{{ task.wiki }}'
...
)
```
The endgame would be to have our logs contain as much context as
possible while avoiding mixing passing both keyword args to the
conttructor _and_ infering some attributes _within_ the `__init__`
method itself.
I would like to propose adding `base_container_name` to the
`KubernetesPodOperator` templated fields.
The rationale is that the base container name is part of the log lines
emitted by the KubernetesPodManager, which is a good opportunity to have
it give as much context as possible.
For example, in a Wikimedia DAG, we defined the following operators:
```python
class WikimediaDumpOperator(KubernetesPodOperator):
"""
Base class for all types of wiki dumps run as Kubernetes Pods.
"""
dump_type = "generic"
def __init__(self, wiki: str, *args, **kwargs):
super().__init__(*args, **kwargs)
self.wiki = wiki
# Name of the "dumps" container (default is base, which isn't super telling)
self.base_container_name = f"mediawiki-{self.dump_type}-dump"
# name of the pod itself
# made templated in apache#46268
self.name = f"{self.base_container_name}-{wiki}"
class WikimediaSqlXmlDumpsOperator(WikimediaDumpOperator):
"""Operator class running the sql/xml wiki dumps as Kubernetes Pods"""
dump_type = "sql-xml"
class WikimediaWikidataDumpsOperator(WikimediaDumpOperator):
"""Operator class running the wikidata dumps as Kubernetes Pods"""
dump_type = "wikidata"
```
Adding `base_container_name` to the templated fields would allow us to
rewrite the `WikimediaDumpOperator` to the following:
```python
class WikimediaDumpOperator(KubernetesPodOperator):
"""
Base class for all types of wiki dumps run as Kubernetes Pods.
"""
dump_type = "generic"
def __init__(self, wiki: str, *args, **kwargs):
super().__init__(*args, **kwargs)
self.wiki = wiki
```
and we could invoke the operator as such:
```python
WikimediaSqlXmlOperator(
...,
base_container_name='mediawiki-{{ task.dump_type }}-dump',
name='{{ task.base_container_name }}-{{ task.wiki }}'
...
)
```
The endgame would be to have our logs contain as much context as
possible while avoiding mixing passing both keyword args to the
conttructor _and_ infering some attributes _within_ the `__init__`
method itself.
#47864) I would like to propose adding `base_container_name` to the `KubernetesPodOperator` templated fields. The rationale is that the base container name is part of the log lines emitted by the KubernetesPodManager, which is a good opportunity to have it give as much context as possible. For example, in a Wikimedia DAG, we defined the following operators: ```python class WikimediaDumpOperator(KubernetesPodOperator): """ Base class for all types of wiki dumps run as Kubernetes Pods. """ dump_type = "generic" def __init__(self, wiki: str, *args, **kwargs): super().__init__(*args, **kwargs) self.wiki = wiki # Name of the "dumps" container (default is base, which isn't super telling) self.base_container_name = f"mediawiki-{self.dump_type}-dump" # name of the pod itself # made templated in #46268 self.name = f"{self.base_container_name}-{wiki}" class WikimediaSqlXmlDumpsOperator(WikimediaDumpOperator): """Operator class running the sql/xml wiki dumps as Kubernetes Pods""" dump_type = "sql-xml" class WikimediaWikidataDumpsOperator(WikimediaDumpOperator): """Operator class running the wikidata dumps as Kubernetes Pods""" dump_type = "wikidata" ``` Adding `base_container_name` to the templated fields would allow us to rewrite the `WikimediaDumpOperator` to the following: ```python class WikimediaDumpOperator(KubernetesPodOperator): """ Base class for all types of wiki dumps run as Kubernetes Pods. """ dump_type = "generic" def __init__(self, wiki: str, *args, **kwargs): super().__init__(*args, **kwargs) self.wiki = wiki ``` and we could invoke the operator as such: ```python WikimediaSqlXmlOperator( ..., base_container_name='mediawiki-{{ task.dump_type }}-dump', name='{{ task.base_container_name }}-{{ task.wiki }}' ... ) ``` The endgame would be to have our logs contain as much context as possible while avoiding mixing passing both keyword args to the conttructor _and_ infering some attributes _within_ the `__init__` method itself.
apache#47864) I would like to propose adding `base_container_name` to the `KubernetesPodOperator` templated fields. The rationale is that the base container name is part of the log lines emitted by the KubernetesPodManager, which is a good opportunity to have it give as much context as possible. For example, in a Wikimedia DAG, we defined the following operators: ```python class WikimediaDumpOperator(KubernetesPodOperator): """ Base class for all types of wiki dumps run as Kubernetes Pods. """ dump_type = "generic" def __init__(self, wiki: str, *args, **kwargs): super().__init__(*args, **kwargs) self.wiki = wiki # Name of the "dumps" container (default is base, which isn't super telling) self.base_container_name = f"mediawiki-{self.dump_type}-dump" # name of the pod itself # made templated in apache#46268 self.name = f"{self.base_container_name}-{wiki}" class WikimediaSqlXmlDumpsOperator(WikimediaDumpOperator): """Operator class running the sql/xml wiki dumps as Kubernetes Pods""" dump_type = "sql-xml" class WikimediaWikidataDumpsOperator(WikimediaDumpOperator): """Operator class running the wikidata dumps as Kubernetes Pods""" dump_type = "wikidata" ``` Adding `base_container_name` to the templated fields would allow us to rewrite the `WikimediaDumpOperator` to the following: ```python class WikimediaDumpOperator(KubernetesPodOperator): """ Base class for all types of wiki dumps run as Kubernetes Pods. """ dump_type = "generic" def __init__(self, wiki: str, *args, **kwargs): super().__init__(*args, **kwargs) self.wiki = wiki ``` and we could invoke the operator as such: ```python WikimediaSqlXmlOperator( ..., base_container_name='mediawiki-{{ task.dump_type }}-dump', name='{{ task.base_container_name }}-{{ task.wiki }}' ... ) ``` The endgame would be to have our logs contain as much context as possible while avoiding mixing passing both keyword args to the conttructor _and_ infering some attributes _within_ the `__init__` method itself.
apache#47864) I would like to propose adding `base_container_name` to the `KubernetesPodOperator` templated fields. The rationale is that the base container name is part of the log lines emitted by the KubernetesPodManager, which is a good opportunity to have it give as much context as possible. For example, in a Wikimedia DAG, we defined the following operators: ```python class WikimediaDumpOperator(KubernetesPodOperator): """ Base class for all types of wiki dumps run as Kubernetes Pods. """ dump_type = "generic" def __init__(self, wiki: str, *args, **kwargs): super().__init__(*args, **kwargs) self.wiki = wiki # Name of the "dumps" container (default is base, which isn't super telling) self.base_container_name = f"mediawiki-{self.dump_type}-dump" # name of the pod itself # made templated in apache#46268 self.name = f"{self.base_container_name}-{wiki}" class WikimediaSqlXmlDumpsOperator(WikimediaDumpOperator): """Operator class running the sql/xml wiki dumps as Kubernetes Pods""" dump_type = "sql-xml" class WikimediaWikidataDumpsOperator(WikimediaDumpOperator): """Operator class running the wikidata dumps as Kubernetes Pods""" dump_type = "wikidata" ``` Adding `base_container_name` to the templated fields would allow us to rewrite the `WikimediaDumpOperator` to the following: ```python class WikimediaDumpOperator(KubernetesPodOperator): """ Base class for all types of wiki dumps run as Kubernetes Pods. """ dump_type = "generic" def __init__(self, wiki: str, *args, **kwargs): super().__init__(*args, **kwargs) self.wiki = wiki ``` and we could invoke the operator as such: ```python WikimediaSqlXmlOperator( ..., base_container_name='mediawiki-{{ task.dump_type }}-dump', name='{{ task.base_container_name }}-{{ task.wiki }}' ... ) ``` The endgame would be to have our logs contain as much context as possible while avoiding mixing passing both keyword args to the conttructor _and_ infering some attributes _within_ the `__init__` method itself.
…s (#47864)
I would like to propose adding `base_container_name` to the
`KubernetesPodOperator` templated fields.
The rationale is that the base container name is part of the log lines
emitted by the KubernetesPodManager, which is a good opportunity to have
it give as much context as possible.
For example, in a Wikimedia DAG, we defined the following operators:
```python
class WikimediaDumpOperator(KubernetesPodOperator):
"""
Base class for all types of wiki dumps run as Kubernetes Pods.
"""
dump_type = "generic"
def __init__(self, wiki: str, *args, **kwargs):
super().__init__(*args, **kwargs)
self.wiki = wiki
# Name of the "dumps" container (default is base, which isn't super telling)
self.base_container_name = f"mediawiki-{self.dump_type}-dump"
# name of the pod itself
# made templated in apache/airflow#46268
self.name = f"{self.base_container_name}-{wiki}"
class WikimediaSqlXmlDumpsOperator(WikimediaDumpOperator):
"""Operator class running the sql/xml wiki dumps as Kubernetes Pods"""
dump_type = "sql-xml"
class WikimediaWikidataDumpsOperator(WikimediaDumpOperator):
"""Operator class running the wikidata dumps as Kubernetes Pods"""
dump_type = "wikidata"
```
Adding `base_container_name` to the templated fields would allow us to
rewrite the `WikimediaDumpOperator` to the following:
```python
class WikimediaDumpOperator(KubernetesPodOperator):
"""
Base class for all types of wiki dumps run as Kubernetes Pods.
"""
dump_type = "generic"
def __init__(self, wiki: str, *args, **kwargs):
super().__init__(*args, **kwargs)
self.wiki = wiki
```
and we could invoke the operator as such:
```python
WikimediaSqlXmlOperator(
...,
base_container_name='mediawiki-{{ task.dump_type }}-dump',
name='{{ task.base_container_name }}-{{ task.wiki }}'
...
)
```
The endgame would be to have our logs contain as much context as
possible while avoiding mixing passing both keyword args to the
conttructor _and_ infering some attributes _within_ the `__init__`
method itself.
GitOrigin-RevId: 204020a329d954dca14ef30ea7f72c25782da85b
…s (#47864)
I would like to propose adding `base_container_name` to the
`KubernetesPodOperator` templated fields.
The rationale is that the base container name is part of the log lines
emitted by the KubernetesPodManager, which is a good opportunity to have
it give as much context as possible.
For example, in a Wikimedia DAG, we defined the following operators:
```python
class WikimediaDumpOperator(KubernetesPodOperator):
"""
Base class for all types of wiki dumps run as Kubernetes Pods.
"""
dump_type = "generic"
def __init__(self, wiki: str, *args, **kwargs):
super().__init__(*args, **kwargs)
self.wiki = wiki
# Name of the "dumps" container (default is base, which isn't super telling)
self.base_container_name = f"mediawiki-{self.dump_type}-dump"
# name of the pod itself
# made templated in apache/airflow#46268
self.name = f"{self.base_container_name}-{wiki}"
class WikimediaSqlXmlDumpsOperator(WikimediaDumpOperator):
"""Operator class running the sql/xml wiki dumps as Kubernetes Pods"""
dump_type = "sql-xml"
class WikimediaWikidataDumpsOperator(WikimediaDumpOperator):
"""Operator class running the wikidata dumps as Kubernetes Pods"""
dump_type = "wikidata"
```
Adding `base_container_name` to the templated fields would allow us to
rewrite the `WikimediaDumpOperator` to the following:
```python
class WikimediaDumpOperator(KubernetesPodOperator):
"""
Base class for all types of wiki dumps run as Kubernetes Pods.
"""
dump_type = "generic"
def __init__(self, wiki: str, *args, **kwargs):
super().__init__(*args, **kwargs)
self.wiki = wiki
```
and we could invoke the operator as such:
```python
WikimediaSqlXmlOperator(
...,
base_container_name='mediawiki-{{ task.dump_type }}-dump',
name='{{ task.base_container_name }}-{{ task.wiki }}'
...
)
```
The endgame would be to have our logs contain as much context as
possible while avoiding mixing passing both keyword args to the
conttructor _and_ infering some attributes _within_ the `__init__`
method itself.
GitOrigin-RevId: 204020a329d954dca14ef30ea7f72c25782da85b
…s (#47864)
I would like to propose adding `base_container_name` to the
`KubernetesPodOperator` templated fields.
The rationale is that the base container name is part of the log lines
emitted by the KubernetesPodManager, which is a good opportunity to have
it give as much context as possible.
For example, in a Wikimedia DAG, we defined the following operators:
```python
class WikimediaDumpOperator(KubernetesPodOperator):
"""
Base class for all types of wiki dumps run as Kubernetes Pods.
"""
dump_type = "generic"
def __init__(self, wiki: str, *args, **kwargs):
super().__init__(*args, **kwargs)
self.wiki = wiki
# Name of the "dumps" container (default is base, which isn't super telling)
self.base_container_name = f"mediawiki-{self.dump_type}-dump"
# name of the pod itself
# made templated in apache/airflow#46268
self.name = f"{self.base_container_name}-{wiki}"
class WikimediaSqlXmlDumpsOperator(WikimediaDumpOperator):
"""Operator class running the sql/xml wiki dumps as Kubernetes Pods"""
dump_type = "sql-xml"
class WikimediaWikidataDumpsOperator(WikimediaDumpOperator):
"""Operator class running the wikidata dumps as Kubernetes Pods"""
dump_type = "wikidata"
```
Adding `base_container_name` to the templated fields would allow us to
rewrite the `WikimediaDumpOperator` to the following:
```python
class WikimediaDumpOperator(KubernetesPodOperator):
"""
Base class for all types of wiki dumps run as Kubernetes Pods.
"""
dump_type = "generic"
def __init__(self, wiki: str, *args, **kwargs):
super().__init__(*args, **kwargs)
self.wiki = wiki
```
and we could invoke the operator as such:
```python
WikimediaSqlXmlOperator(
...,
base_container_name='mediawiki-{{ task.dump_type }}-dump',
name='{{ task.base_container_name }}-{{ task.wiki }}'
...
)
```
The endgame would be to have our logs contain as much context as
possible while avoiding mixing passing both keyword args to the
conttructor _and_ infering some attributes _within_ the `__init__`
method itself.
GitOrigin-RevId: 204020a329d954dca14ef30ea7f72c25782da85b
Adding name and hostname to KubernetesPodOperator template_fields
kubernetes_tests/test_kubernetes_pod_operator.pytoo long name testing flow and add this case to unit testscloses: #43480
Based on previously staled PR
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in newsfragments.