Skip to content

Conversation

@amoghrajesh
Copy link
Contributor

Building basic framework to build a doc for publishing docs in terms of a breeze command. For discussions see: #32491.

This PR attempts to build a simple framework that does the package copying to the airflow-site repo which is cloned locally:
What is pending:

  1. Checking out to main of airflow-site
  2. Running the post-docs script to the copied files and folders.

^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

)
from airflow_breeze.utils.shared_options import get_dry_run, get_verbose, set_forced_answer
from airflow_breeze.utils.visuals import ASCIIART, ASCIIART_STYLE, CHEATSHEET, CHEATSHEET_STYLE
from docs.exts.docs_build.docs_builder import AirflowDocsBuilder
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment: The thing here is that whatever we have here in "docs.exts", it needs to be copied to "breeze". Breeze should be largely self-containing (except occasional retrieval of the information from "airflow" sources, importing the code should always be done as

from airflow_breeze.....

And the code imported should live in dev/breeze/src/airflow_breeze folder.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(that's part of the problem to solve - to get rid of the "docs" folder that needs to be added on PYTHONPATH)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me. Will re work this

)

# set AIRFLOW_SITE_DIR env variable
os.environ["AIRFLOW_SITE_DIR"] = airflow_site_directory
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another comment: You do not need to do that. the AIRLFOW_SITE_DIRECTORY variable should be defined in click option. Click option has envvar field and you can set env variable there. So we should revert the behaviour:

Instead of relying on AIRFLOW_SITE_DRECTORY in env, the directory can be set either by AIRLFOW_SITE_DIRECTORY or by --airflow-site-directory flag and then it should be passed directy to (modified) AirlfowDocsBuilder as parameter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW. It's OK To partially copy the code for now and have different "builder" in publish docs and in "build_docs.py" - we can later re-join them when we also get rid of the build_docs.py from the "docs" folder and move it to breeze (which will eventually happen most likely).

os.environ["AIRFLOW_SITE_DIR"] = airflow_site_directory

available_packages = get_available_packages()
package_filters = package_filter
Copy link
Contributor

@Adaverse Adaverse Jul 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a redundant reassignment. Maybe we can have package_filters in the function signature itself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, will make the changes here

@amoghrajesh
Copy link
Contributor Author

The easiest way to achieve the task was the migrate the file structures as-is with required tweaks. @potiuk can you take a look?

I think the CI is definitely going to fail, need some mentoring on fixing that too :)

@amoghrajesh
Copy link
Contributor Author

For starters, I have been hitting this:

Run breeze ci free-space
Traceback (most recent call last):
  File "/opt/pipx_bin/breeze", line 5, in <module>
    from airflow_breeze.breeze import main
  File "/home/runner/work/airflow/airflow/dev/breeze/src/airflow_breeze/breeze.py", line 31, in <module>
    from airflow_breeze.commands import developer_commands  # noqa
  File "/home/runner/work/airflow/airflow/dev/breeze/src/airflow_breeze/commands/developer_commands.py", line 89, in <module>
    from airflow_breeze.utils.publish_docs_builder import PublishDocsBuilder
  File "/home/runner/work/airflow/airflow/dev/breeze/src/airflow_breeze/utils/publish_docs_builder.py", line 29, in <module>
    from .errors import DocBuildError, parse_sphinx_warnings
  File "/home/runner/work/airflow/airflow/dev/breeze/src/airflow_breeze/utils/errors.py", line 26, in <module>
    from airflow_breeze.utils.publish_docs_helpers import prepare_code_snippet
  File "/home/runner/work/airflow/airflow/dev/breeze/src/airflow_breeze/utils/publish_docs_helpers.py", line 27, in <module>
    import jsonschema
ModuleNotFoundError: No module named 'jsonschema'

Added it to requirements, but probably needs addition elsewhere too

@amoghrajesh amoghrajesh requested a review from potiuk July 11, 2023 08:35
@amoghrajesh
Copy link
Contributor Author

My speculation on this task is that we send in a skeleton first and then continually improve/refactor it

@potiuk
Copy link
Member

potiuk commented Jul 11, 2023

You need to add requirements to setup.py/.conf of breeze - breeze is just a regular Python package that follows the usual way how package definition is done (so that pipx insstall -e ./dev/breeze can install it in editatable mode.

The requirements.txt is a convention many people use and it's format is supported by pip install -r , but it's not standardised via PEP's to be part of package definition. The dev requirements are only for regular scripts you run from dev env, but breeze is a completely standard Python package following the https://docs.python.org/3/distributing/index.html#distributing-index

@potiuk
Copy link
Member

potiuk commented Jul 11, 2023

My speculation on this task is that we send in a skeleton first and then continually improve/refactor it

Quite the contrary, I think it will be ok to iterate until it's ready and we can use it.

There is no big value in having a new command that cannot provide a good, usable output. And by doing that you can also learn about all the different pieces, so it makes sense to iterate and do it "right" rather than "fast". Also Breeze is integral part of the CI process and there are many pre-commits that guard the right things to be done there - for example they will make sure that any command added to breeze needs to be described what it does in BREEZE.rst, including automatically generated screenshot, so we cannot merge a "failing" PR which will skip those steps (and this is very deliberate choice). Any PR merged should contain:

  • new command
  • documentation
  • tests if applicable
  • and provide a usable command that does something useful
  • and that command should be part of CI and (in this case) release process.

This is the "complete" implementation of a new command, which I consider as necessary for the PR to be mergable.

@potiuk
Copy link
Member

potiuk commented Jul 11, 2023

The easiest way to achieve the task was the migrate the file structures as-is with required tweaks. @potiuk can you take a look?

Yep. Looks right. I might want to take a closer look once the build is "green" :)

@amoghrajesh
Copy link
Contributor Author

My speculation on this task is that we send in a skeleton first and then continually improve/refactor it

Quite the contrary, I think it will be ok to iterate until it's ready and we can use it.

There is no big value in having a new command that cannot provide a good, usable output. And by doing that you can also learn about all the different pieces, so it makes sense to iterate and do it "right" rather than "fast". Also Breeze is integral part of the CI process and there are many pre-commits that guard the right things to be done there - for example they will make sure that any command added to breeze needs to be described what it does in BREEZE.rst, including automatically generated screenshot, so we cannot merge a "failing" PR which will skip those steps (and this is very deliberate choice). Any PR merged should contain:

  • new command
  • documentation
  • tests if applicable
  • and provide a usable command that does something useful
  • and that command should be part of CI and (in this case) release process.

This is the "complete" implementation of a new command, which I consider as necessary for the PR to be mergable.

Thanks for the point. Now, I agree with you.

In this case, having a complete PR is more important than having parts and pieces going in. Working on it!

@amoghrajesh amoghrajesh changed the title Basic skeleton for publish docs breeze command New breeze command to publish docs Jul 11, 2023
@amoghrajesh
Copy link
Contributor Author

I did add the jsonschema to setup.cfg by referring the link for packages: https://docs.python.org/3/distributing/index.html#distributing-index

I think I am still running into this issue:

Traceback (most recent call last):
  File "/Users/adesai/Documents/OSS/airflow/dev/breeze/src/airflow_breeze/breeze.py", line 31, in <module>
    from airflow_breeze.commands import developer_commands  # noqa
  File "/Users/adesai/Documents/OSS/airflow/dev/breeze/src/airflow_breeze/commands/developer_commands.py", line 89, in <module>
    from airflow_breeze.utils.publish_docs_builder import PublishDocsBuilder
  File "/Users/adesai/Documents/OSS/airflow/dev/breeze/src/airflow_breeze/utils/publish_docs_builder.py", line 29, in <module>
    from .errors import DocBuildError, parse_sphinx_warnings
  File "/Users/adesai/Documents/OSS/airflow/dev/breeze/src/airflow_breeze/utils/errors.py", line 26, in <module>
    from airflow_breeze.utils.publish_docs_helpers import prepare_code_snippet
  File "/Users/adesai/Documents/OSS/airflow/dev/breeze/src/airflow_breeze/utils/publish_docs_helpers.py", line 27, in <module>
    import jsonschema
ModuleNotFoundError: No module named 'jsonschema'

@potiuk
Copy link
Member

potiuk commented Jul 11, 2023

I did add the jsonschema to setup.cfg by referring the link for packages: https://docs.python.org/3/distributing/index.html#distributing-index

I think I am still running into this issue:

Traceback (most recent call last):
  File "/Users/adesai/Documents/OSS/airflow/dev/breeze/src/airflow_breeze/breeze.py", line 31, in <module>
    from airflow_breeze.commands import developer_commands  # noqa
  File "/Users/adesai/Documents/OSS/airflow/dev/breeze/src/airflow_breeze/commands/developer_commands.py", line 89, in <module>
    from airflow_breeze.utils.publish_docs_builder import PublishDocsBuilder
  File "/Users/adesai/Documents/OSS/airflow/dev/breeze/src/airflow_breeze/utils/publish_docs_builder.py", line 29, in <module>
    from .errors import DocBuildError, parse_sphinx_warnings
  File "/Users/adesai/Documents/OSS/airflow/dev/breeze/src/airflow_breeze/utils/errors.py", line 26, in <module>
    from airflow_breeze.utils.publish_docs_helpers import prepare_code_snippet
  File "/Users/adesai/Documents/OSS/airflow/dev/breeze/src/airflow_breeze/utils/publish_docs_helpers.py", line 27, in <module>
    import jsonschema
ModuleNotFoundError: No module named 'jsonschema'

Nope. it seems fixed. Now unit tests are failing

@@ -0,0 +1,117 @@
# Licensed to the Apache Software Foundation (ASF) under one
Copy link
Member

@potiuk potiuk Jul 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be renamed to "docs_errors.py" I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah makes sense. Renaming

@click.option(
"-s", "--airflow-site-directory", help="Local directory path of cloned airflow-site repo.", required=True
)
@click.option(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be moved to "common_options" - same option is now used in build_docs so this can be made DRY

@potiuk
Copy link
Member

potiuk commented Jul 11, 2023

As part of this change, we should also (to make it really complete) to add a step of publishing docs to this CI job:

- name: "Build docs"

This way the new command will be automatically tested on CI and we will not have regressions at least as a "smoke test". This is the approach we have for pretty much all other commands in breeze - they are executed as part of the CI and this way we make sure they continue working.

This CI job already built all the docs and we have them locally, so It should be done in few separate steps.

  • cloning the "apache/airflow-site" repo (the AIRFLOW_SITE_DIRECTORY environment variable should point to to the directory so that it can be used automatically by the breeze command

  • running the three publish-docs with --override-versioned flag (this is because in many cases we will be overriding documentation for already relased packages. The --package-filter flags for the command can be taken from ${{ needs.build-info.outputs.docs-filter-list-as-string }} which is already used in build-docs - those --package-filter strings are generated automatically during selective checks based on which files are changed in the PR, so documentation will be generated only for those packages and we can publish the generated documentation safely with those filters.

@amoghrajesh
Copy link
Contributor Author

As part of this change, we should also (to make it really complete) to add a step of publishing docs to this CI job:

- name: "Build docs"

This way the new command will be automatically tested on CI and we will not have regressions at least as a "smoke test". This is the approach we have for pretty much all other commands in breeze - they are executed as part of the CI and this way we make sure they continue working.

This CI job already built all the docs and we have them locally, so It should be done in few separate steps.

  • cloning the "apache/airflow-site" repo (the AIRFLOW_SITE_DIRECTORY environment variable should point to to the directory so that it can be used automatically by the breeze command
  • running the three publish-docs with --override-versioned flag (this is because in many cases we will be overriding documentation for already relased packages. The --package-filter flags for the command can be taken from ${{ needs.build-info.outputs.docs-filter-list-as-string }} which is already used in build-docs - those --package-filter strings are generated automatically during selective checks based on which files are changed in the PR, so documentation will be generated only for those packages and we can publish the generated documentation safely with those filters.

Awesome idea, let us integrate the CI to do this as well. I will try this portion out too!

@potiuk
Copy link
Member

potiuk commented Jul 11, 2023

Awesome idea, let us integrate the CI to do this as well. I will try this portion out too!

You will get full circle dev-env + CI :). They are really tightly integrated in our case and one helps to keep the other "working".

@amoghrajesh
Copy link
Contributor Author

@potiuk tried to handle all your comments. The CI passed for the first time in my dev env. Hopefully it will here too!

@amoghrajesh amoghrajesh requested a review from potiuk July 12, 2023 08:11
@amoghrajesh
Copy link
Contributor Author

I deleted the docs/publish_docs.py. Handled all occurences of it. There is one however, in publish_provider_documentation.sh. Any guidance here?

@potiuk
Copy link
Member

potiuk commented Jul 12, 2023

I deleted the docs/publish_docs.py. Handled all occurences of it. There is one however, in publish_provider_documentation.sh. Any guidance here?

Just replace it with the new command the script is referred to in the dev/RELEASE_PROVIDERS* - as a way to pass list of providers and build --package-filter from those, so using breeze release-management build-docs makes sense in the shell script.

@potiuk
Copy link
Member

potiuk commented Jul 12, 2023

    builder.publish(override_versioned=override_versioned, airflow_site_dir=airflow_site_directory)
  File "/home/runner/work/airflow/airflow/dev/breeze/src/airflow_breeze/utils/publish_docs_builder.py", line 292, in publish
    shutil.copytree(self._build_dir, output_dir)
  File "/opt/hostedtoolcache/Python/3.8.17/x64/lib/python3.8/shutil.py", line 555, in copytree
    with os.scandir(src) as itr:
FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/airflow/airflow/docs/_build/docs/apache-airflow/stable'
Error: Process completed with exit code 1.

This error is because the build-docs command before does not have --for-production switch. Just add it and it should be solved.

@amoghrajesh
Copy link
Contributor Author

@potiuk just handled those comments 👍🏽

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks PERFECT!

@potiuk potiuk merged commit 4d3c832 into apache:main Jul 12, 2023
@potiuk
Copy link
Member

potiuk commented Jul 12, 2023

🎉 🎉 🎉 🎉 🎉 🎉

@amoghrajesh
Copy link
Contributor Author

amoghrajesh commented Jul 12, 2023

Thanks for closely following up on this one with me @potiuk. I will definitely raise some refactors on this one soon

cd "${AIRFLOW_REPO_ROOT}"

./docs/publish_docs.py \
breeze release-management publish-docs \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the whole section above?
Now that the command in breeze we probably don't need the above section?


**NOTE** In order to run the publish documentation you need to activate virtualenv where you installed
apache-airflow with doc extra:
* `pip install '.[doc_gen]'`
If you don't have virtual env set you can do:
```shell script
cd <path_you_want_to_save_your_virtual_env>
virtualenv providers
source venv/providers/bin/activate
pip install 'apache-airflow[doc_gen]'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I don't think we'd need that anymore. I'll raise a PR soon to get it fixed. Thanks for noticing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool.
Please note that lines 432-434 also needs to be edited.
It's alternative approach to run the publishing command

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah. I missed that we had a bit more details in PROVIDERS docs :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we need to edit lines 432-434. Those changes have been handled in this PR already

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool. Please note that lines 432-434 also needs to be edited. It's alternative approach to run the publishing command

Yep. The bash script in question already uses the new command

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eladkal @potiuk the PR having this fix is here: #32559

Copy link
Contributor

@eladkal eladkal Jul 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why do we need bash script to begin with?
Its executing the same command with differnt kind of parameters to breeze should do the mitigation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a helper script converting list of ids which you already have (amazon google ... ) to list of flags (--package-filter apache-airlfow-providers-amazon --package-filter apache-airlfow-providers-google ....) .

If you prefer to do it manually fine, but I think having script is useful. We could of course add it to the command but --package-filter is traditionally there and serves other purposes too (not everything is a provider for one).

potiuk added a commit to potiuk/airflow that referenced this pull request Jul 12, 2023
The apache#32495 changed the way to run docs - with using --for-production
flag, which means that the docs are generated under "stable" link
rather than "latest" - in order to test document publishing.

We could change it back to latest, but the problem revealed that
we do not have to actually make a distinction there. It's ok
to keep it all as "stable" - no problem with that - there is no
distinction between latest and stable other than different link
and we never mix the two (our production docs have only stable),
so it's ok to switch everything to stable.

As a follow-up `--for-production` flag will be removed and we will
switch everything to stable for all documentation building. For
now fetching docs from stable link should fix the problem.
potiuk added a commit that referenced this pull request Jul 12, 2023
The #32495 changed the way to run docs - with using --for-production
flag, which means that the docs are generated under "stable" link
rather than "latest" - in order to test document publishing.

We could change it back to latest, but the problem revealed that
we do not have to actually make a distinction there. It's ok
to keep it all as "stable" - no problem with that - there is no
distinction between latest and stable other than different link
and we never mix the two (our production docs have only stable),
so it's ok to switch everything to stable.

As a follow-up `--for-production` flag will be removed and we will
switch everything to stable for all documentation building. For
now fetching docs from stable link should fix the problem.
potiuk added a commit to potiuk/airflow that referenced this pull request Jul 12, 2023
We had two types of documentation building:
 * latest
 * for production

But in fact they never overlapped and we were never mixing the two
We used the latest for all development work and for production for
releasing to "airflow.apache.org".

However as we saw in apache#32562 (triggered by apache#32495 changing the build
in main to run with `--for-production` flag - we actually do not
need the "latest" builds at all. Everything can be build with
"for production" by default.

This change removes the `--for-production` flag entirely, leaving
the "for production" build mode as the only one available.
potiuk added a commit that referenced this pull request Jul 12, 2023
We had two types of documentation building:
 * latest
 * for production

But in fact they never overlapped and we were never mixing the two
We used the latest for all development work and for production for
releasing to "airflow.apache.org".

However as we saw in #32562 (triggered by #32495 changing the build
in main to run with `--for-production` flag - we actually do not
need the "latest" builds at all. Everything can be build with
"for production" by default.

This change removes the `--for-production` flag entirely, leaving
the "for production" build mode as the only one available.
pateash pushed a commit to pateash/airflow that referenced this pull request Jul 23, 2023
@ephraimbuddy ephraimbuddy added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Aug 2, 2023
potiuk added a commit to potiuk/airflow that referenced this pull request Aug 8, 2023
The apache#32495 mistakeny replaced the "build" script with a
"release-management build-docs" command where there is no such
command - there is just "build-docs"
eladkal pushed a commit that referenced this pull request Aug 8, 2023
The #32495 mistakeny replaced the "build" script with a
"release-management build-docs" command where there is no such
command - there is just "build-docs"
ephraimbuddy pushed a commit that referenced this pull request Aug 9, 2023
The #32495 mistakeny replaced the "build" script with a
"release-management build-docs" command where there is no such
command - there is just "build-docs"

(cherry picked from commit 199c604)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:dev-tools changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants