Skip to content

Conversation

@turbaszek
Copy link
Member

This is first step to address #12508


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@turbaszek turbaszek requested review from ashb and potiuk November 20, 2020 15:55
@turbaszek turbaszek changed the title Run pip check when to verify production images Run pip check to verify production images Nov 20, 2020
@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Nov 20, 2020
@github-actions
Copy link

The PR needs to run all tests because it modifies core of Airflow! Please rebase it to latest master or ask committer to re-run it!

@turbaszek
Copy link
Member Author

@potiuk do I have to make a change in airflow sources to trigger this step or is there other way?

@potiuk
Copy link
Member

potiuk commented Nov 20, 2020

@potiuk do I have to make a change in airflow sources to trigger this step or is there other way?

No - the step is queued. Whenever any of the "scripts" changes , all tests are run.

@potiuk
Copy link
Member

potiuk commented Nov 20, 2020

@potiuk do I have to make a change in airflow sources to trigger this step or is there other way?

No - the step is queued. Whenever any of the "scripts" changes , all tests are run.

Ah. But in this case, the "master" script is run (because building image is done in master). You need to push it to your own fork as master to get it run if you want to check it. 'git push yourbranch:master' (just be careful not to push to Airflow)

But after we merge it, it will run for all PRs.

@turbaszek
Copy link
Member Author

turbaszek commented Nov 20, 2020

I got this:

botocore 1.17.44 has requirement docutils<0.16,>=0.10, but you have docutils 0.16.

and I'm not sure how this should be addressed as the docutils are only devel dependency in setup.py. Should this constraint be added to constraints or to aws section?

https://github.com/PolideaInternal/airflow/runs/1432065455?check_suite_focus=true

@potiuk
Copy link
Member

potiuk commented Nov 20, 2020

It is likely required by another dependency transitively. We have to run "pipdeptree" to find out.

@potiuk
Copy link
Member

potiuk commented Nov 20, 2020

Those cases should be analysed one-by-one and decision should be taken individually in each case I am afraid.

@ashb
Copy link
Member

ashb commented Nov 20, 2020

Upgrading boto3 should update the version of botocore we use

@turbaszek
Copy link
Member Author

turbaszek commented Nov 21, 2020

Upgrading boto3 should update the version of botocore we use

Yup, using boto3>=1.15.0 should use botocore>=1.18.0 which does not require docutils. I updated this in setup.py however I'm not sure if this will solve the issues as we probably need update constraints and aws provider @potiuk ?

@potiuk
Copy link
Member

potiuk commented Nov 21, 2020

I think we will only find out if we have problems after running (and possibly fixing) all the tests (and then possibly running some system tests with real AWS. This is the "individual" part of each such upgrade.

Regarding constraints - if we have a "green" (all tests passing) build in master with such a setup.py change, then constraints will be updated automatically. The condition is that the PR passes all the tests after merging to master -> this automatically triggers constraints update for master. Same with v1-10-stable. Once a test passes in "v1-10-stable" (those are run with "eager" pip update that update to latest version of dependencies matching setup.py) the constraints-1-10 are also updated automatically.

@potiuk
Copy link
Member

potiuk commented Nov 21, 2020

So the process is simple:

  1. we update limitations in setup.py
  2. we get PR with those
  3. we make the PR green
  4. we merge it to master/v1-10-stable and after it is green there constraints are updated

@turbaszek
Copy link
Member Author

turbaszek commented Nov 21, 2020

So what should I do now? The setup.py was updated, the dependencies in image were not so the check is failing. And because it is failing we won't run tests 😄

@potiuk
Copy link
Member

potiuk commented Nov 22, 2020

So what should I do now? The setup.py was updated, the dependencies in image were not so the check is failing. And because it is failing we won't run tests

Push the fork as master in your own fork. When the test run with your own master it will run eager upgrade check after first installing it, and it will run all tests there

@potiuk
Copy link
Member

potiuk commented Nov 22, 2020

And we can also think about handling this case automatically. Selective checks should allow that:
If setup.py or setup.cfg change would need to be done: if the setup.py/setup.cfg gets modified the "upgradeToLatestConstraints" should be set to "true".

if you look in build-images-workflow-run,yml this is the condition now:

      - name: "Set upgrade to latest constraints"
        id: upgrade-constraints
        run: |
          if [[ ${{ steps.cancel.outputs.sourceEvent == 'push' ||
              steps.cancel.outputs.sourceEvent == 'scheduled' }} == 'true' ]]; then
              echo "::set-output name=upgradeToLatestConstraints::${{ github.sha }}"
          else
              echo "::set-output name=upgradeToLatestConstraints::false"
          fi

Maybe that's a good time you implement it :)? We'd have to move it to "selective_checks" because there we set conditional variables based on which files changed. In this case, the logic should be:

#  Needed by default to prevent regular PRs from failing when transitive dependencies cause problem
upgradeToLatestConstraints = false 

# handle the case that a change in setup.py/setup.cfg causes necessity of upgrade
if setup.py or setup.py changed -> upgradeToLatestConstraints = "true" 

# we want to keep track of failing transitive dependencies as soo as they appear, therefore we want to attempt to upgrade to 
# latest constraints. Also if all test are OK in the "master" build, constraints are then automatically updated to the latest ones 
# (but this is a separate step in CI). The github.sha is needed because we want to upgrade as soon as possible
# And not when setup.py changes only. This is because we need to invalidate the layer in docker when we first install the
# pip dependencies for caching (so that we can run the subsequent `pip install http://github'. If we use "SHA" as the value of
# upgradeToLatestConstraints, it will still upgrade them every time wen new commit is merged - otherwise Docker cache is
# used and no upgrade is done.

if push or schedule -> upgradeToLatestConstraints = github.sha

But note again -> this logic in "build-workflow-image" will only work in master so in order to check it and test all the changes, you have to push it to your fork's master

@turbaszek
Copy link
Member Author

Push the fork as master in your own fork. When the test run with your own master it will run eager upgrade check after first installing it, and it will run all tests there

I'm doing this since you suggested it in one of the first comments. For example here:
https://github.com/turbaszek/airflow/runs/1438401698?check_suite_focus=true

But the dependencies are still not updated even if the flag is set to true:
https://github.com/turbaszek/airflow/runs/1438439661?check_suite_focus=true

@potiuk
Copy link
Member

potiuk commented Nov 22, 2020

There is a bug in the code - result of one of the latest refactors (and main reason why I opened a support ticket to GitHub). IT's extremely difficult to see such bugs in GA when you have a typo in name of an output because this is silently ignored:

The "Set upgrade to latest constraints" is not executed because the output has earlier version of the job name,
steps.cancel.outputs.sourceEvent should be steps.source-run-info.outputs.sourceEvent similarly as in the previous step!

In this code (this bug is in two lines):

      - name: "Set upgrade to latest constraints"
        id: upgrade-constraints
        run: |
          if [[ ${{ steps.cancel.outputs.sourceEvent == 'push' ||
              steps.cancel.outputs.sourceEvent == 'scheduled' }} == 'true' ]]; then
              echo "::set-output name=upgradeToLatestConstraints::${{ github.sha }}"
          else
              echo "::set-output name=upgradeToLatestConstraints::false"
          fi

@turbaszek turbaszek added the area:dependencies Issues related to dependencies problems label Nov 22, 2020
@turbaszek
Copy link
Member Author

In this build:
https://github.com/turbaszek/airflow/runs/1438634661
I see:

Image build variables:
    UPGRADE_TO_LATEST_CONSTRAINTS: true
    CHECK_IMAGE_FOR_REBUILD: true

but the problem still occurs. Update of constraints is in ci build not in build image workflow, so I'm wodnering if I'm doing changes in right place

@potiuk
Copy link
Member

potiuk commented Nov 22, 2020

Image build variables:
UPGRADE_TO_LATEST_CONSTRAINTS: true
CHECK_IMAGE_FOR_REBUILD: true

That's good!

IThe problem is that we are running the tests in CI environment and UPGRADE_TO_LATEST_CONSTRAINTS works only in the CI image. The pip-check should be running there!

@potiuk
Copy link
Member

potiuk commented Nov 22, 2020

I generally think failing Prod image by PIP check in workflow build is a bad idea. Because we don't even see the tests running then. I think such a check should only run AFTER all the tests are run and before we Push the new constraints. That is also better, because we can then make sure that we are doing the check in the very place we need. Give me a few moments and I will provide guidance:

@potiuk
Copy link
Member

potiuk commented Nov 22, 2020

Proposal:

In ci.yaml we have this: (added some comments):

  constraints-push:
      ......
       # here we have all the constrains generated in CI image in "artifacts" folder . So in case if 
       # "UPGRADE_TO_LATEST_CONSTRAINTS" was derived, we should get the new constraints here 
      - name: "Get all artifacts (constraints)"
        uses: actions/download-artifact@v2
        with:
          path: 'artifacts'

      # you can see how to retrieve the artifacts in the ./scripts/ci/constraints/ci_commit_constraints.sh below
      # In order to build image with those new constraints you need to copy the constraints to "./docker-context-files" and 
      # set AIRFLOW_CONSTRAINTS_LOCATION="./docker-context-files/constraints-${PYTHON_MAJOR_MINOR_VERSION}" 
      # This will build the production image with those new constraints and then you will be ablet to run `pip check`

     "Commit changed constraint files for ${{needs.build-info.outputs.pythonVersions}}"
        run: ./scripts/ci/constraints/ci_commit_constraints.sh

I think you will have to add a new matrix job (inject it before 'constraint push" and add the same prerequisites). Because we want to run pip check for all python versions. Current "constraint_push" is not a matrix job -it simply performs one commit with all the constraints for all python files that were generated in the build.

It should have likely few steps similar to those:

    env:
      - name: "Free space"
        run: ./scripts/ci/tools/ci_free_space_on_ci.sh
      - name: "Get all artifacts (constraints)"
        uses: actions/download-artifact@v2
        with:
          path: 'artifacts'
      - name: "Extract constraint files to ./docker-context-files
         run: .... # Here you need to copy the constraint files from "artifacts" . You want to copy only constraint files as "artifacts" will be far too big (and it takes a lot of time to upload big context to docker engine before the build)

      - name: "Build PROD images ${{ matrix.python-version }}:${{ github.event.workflow_run.id }}"
        run: ./scripts/ci/images/ci_prepare_prod_image_on_ci.sh
        env:  
           - AIRFLOW_CONSTRAINTS_LOCATION: "./docker-context-files/constraints-...."
      - name: "Run PIP check"
         run: ...

Then you should make the "constraints-push" depend on the above job (and only this job). This way we will only update the constraints in "constraints-master" branch when both CI tests passes, and pip-check passes.

@turbaszek
Copy link
Member Author

turbaszek commented Nov 22, 2020

I will take a look tomorrow. But this PR aimed to validate only prod image. And I think having this check in the same place where we run other validations for image make sense. And it does not require additional matrix = less jobs to run in parallel.

@potiuk
Copy link
Member

potiuk commented Nov 22, 2020

I will take a look tomorrow. But this PR aimed to validate only prod image. And I think having this check in the same place where we run other validations for image makes sense. And it does not require an additional matrix = less jobs to run in parallel.

I am afraid it has to be done in the matrix and using the upgraded constraints from CI builds.

We have quite a different set of dependencies for different python versions. And if a pip check succeeds on one there is no guarantee it will succeed with the other. We also cannot upgrade the constraints when we are building the image for regular PRs - because of our transitive dependency problems.

If we do this, then we go back to the situation we had where PRs start failing because someone released a new version of a transitive library we are using (remember werkzeug drama ?).

The constraint mechanism is specially designed to prevent this case. Do you remember the last time it happened recently for us ? Probably not because we are preventing this from happening and the constraint mechanism does it.

So we have to only run upgrade constraints in case of the "master push/schedule builds" and the constraints are now only really updated after all tests pass for them. And this is the perfect time to do the pip check - not sooner, not later. Because this is the moment where the constraints get updated. And we must do a pip check there, otherwise pip check will start failing in master after such constraint push. So we really have to do the pip-check just before we push updated constraints.

@turbaszek
Copy link
Member Author

Thanks @potiuk for explanation. I'm testing suggested approach:
https://github.com/turbaszek/airflow/runs/1441614288

@github-actions

This comment has been minimized.

@turbaszek
Copy link
Member Author

turbaszek commented Nov 23, 2020

@potiuk it seems that we have exactly the same problem but in different place:
https://github.com/turbaszek/airflow/runs/1443863231

I would say that the problem can be with the image tag apache/airflow:master-python3.6 against which we run the check. The image that was build and pushed to registry is docker push docker.pkg.github.com/turbaszek/airflow/master-python3.6:379534179
Nope, that image still rises error.

Digging deeper the problem seems to be in the fact that we are using master constraints to build the image (--build-arg AIRFLOW_CONSTRAINTS_REFERENCE=constraints-master)? And that's something I don't get.

I changed setup.py but the image is still building using old dependencies. Can you explain (or point to docs) how changes to setup.py|cfg gets propagated to prod images?

@apache apache deleted a comment from github-actions bot Nov 23, 2020
@apache apache deleted a comment from github-actions bot Nov 23, 2020
@potiuk
Copy link
Member

potiuk commented Nov 24, 2020

Digging deeper the problem seems to be in the fact that we are using master constraints to build the image (--build-arg AIRFLOW_CONSTRAINTS_REFERENCE=constraints-master)? And that's something I don't get.

I changed setup.py but the image is still building using old dependencies. Can you explain (or point to docs) how changes to setup.py|cfg gets propagated to prod images?

Sure. It is something new we want to do, so It needs likely local running first, to see all the details and then transferring that into CI setup with all the right environment variables etc. It has a lot of variable parts, because we have complex constraint requirements and local optimizations that make the image builds faster in the CI - this is why it needs some special treatment,

The technical details and architecture of how images are built are here: https://github.com/apache/airflow/blob/master/IMAGES.rst#technical-details-of-airflow-images. Building production image from the user perspective is described here: https://github.com/apache/airflow/blob/master/docs/production-deployment.rst#production-container-images because this is not a developer, but user-facing feature.

But I perfectly understand it can be overwhelming initially. It's also good to take a look at the Dockerfile: https://github.com/apache/airflow/blob/master/Dockerfile

Basically in the CI environment you will see the docker build command that is used to build the image and you can just run it locally and modify to get what you want (with a new thing like this you might also end-up with modifying the Dockrfile itself). Eventually, it ends up with the right combination of the build args and sometimes new features to add.

The installation you mentioned is controlled by this: AIRFLOW_PRE_CACHED_PIP_PACKAGES , When it is set to true, we are first installing airflow from the current "master". This is an optimization step for rebuilding the image. it will install the "latest master" version of the dependencies so that in subsequent steps it will only incrementally add new dependencies or upgrade them.

# In case of Production build image segment we want to pre-install master version of airflow
# dependencies from GitHub so that we do not have to always reinstall it from the scratch.
RUN if [[ ${AIRFLOW_PRE_CACHED_PIP_PACKAGES} == "true" ]]; then \
       if [[ ${INSTALL_MYSQL_CLIENT} != "true" ]]; then \
          AIRFLOW_EXTRAS=${AIRFLOW_EXTRAS/mysql,}; \
       fi; \
       pip install --user \
          "https://github.com/${AIRFLOW_REPO}/archive/${AIRFLOW_BRANCH}.tar.gz#egg=apache-airflow[${AIRFLOW_EXTRAS}]" \
          --constraint "https://raw.githubusercontent.com/apache/airflow/${AIRFLOW_CONSTRAINTS_REFERENCE}/constraints-${PYTHON_MAJOR_MINOR_VERSION}.txt" \
          && pip uninstall --yes apache-airflow; \
    fi

This step can eb entirely skipped by setting AIRFLOW_PRE_CACHED_PIP_PACKAGES to "false". In which case the only installation step that will happen is this. Default value of INSTALL_AIRFLOW_VIA_PIP is "true" so the next step should be installation of Airflow from this step. And this step should install airflow from the constraints specified.

# remove mysql from extras if client is not installed
RUN if [[ ${INSTALL_MYSQL_CLIENT} != "true" ]]; then \
        AIRFLOW_EXTRAS=${AIRFLOW_EXTRAS/mysql,}; \
    fi; \
    if [[ ${INSTALL_AIRFLOW_VIA_PIP} == "true" ]]; then \
        pip install --user "${AIRFLOW_INSTALL_SOURCES}[${AIRFLOW_EXTRAS}]${AIRFLOW_INSTALL_VERSION}" \
            --constraint "${AIRFLOW_CONSTRAINTS_LOCATION}"; \
    fi; \
    if [[ -n "${ADDITIONAL_PYTHON_DEPS}" ]]; then \
        pip install --user ${ADDITIONAL_PYTHON_DEPS} --constraint "${AIRFLOW_CONSTRAINTS_LOCATION}"; \
    fi; \
    if [[ ${AIRFLOW_LOCAL_PIP_WHEELS} == "true" ]]; then \
        if ls /docker-context-files/*.whl 1> /dev/null 2>&1; then \
            pip install --user --no-deps /docker-context-files/*.whl; \
        fi ; \
    fi; \
    find /root/.local/ -name '*.pyc' -print0 | xargs -0 rm -r || true ; \
    find /root/.local/ -type d -name '__pycache__' -print0 | xargs -0 rm -r || true

So to cut long story short - you need to set AIRFLOW_PRE_CACHED_PIP_PACKAGES to false and AIRFLOW_CONSTRAINTS_LOCATION to your files located in "./docker-context-files"

@turbaszek
Copy link
Member Author

After discussion with @potiuk we decided to move the check to ci image. And that's are the problem we have:

➜ docker run --rm --entrypoint /bin/bash apache/airflow:master-python3.6-ci -c "pip check"
snowflake-connector-python 2.2.10 has requirement azure-storage-blob<13.0.0,>=12.0.0; python_version >= "3.5.2", but you have azure-storage-blob 2.1.0.
snowflake-connector-python 2.2.10 has requirement cryptography<3.0.0,>=2.5.0, but you have cryptography 3.0.
snowflake-connector-python 2.2.10 has requirement idna<2.10, but you have idna 2.10.
snowflake-connector-python 2.2.10 has requirement requests<2.24.0, but you have requests 2.24.0.
moto 1.3.14 has requirement idna<2.9,>=2.5, but you have idna 2.10.
cfn-lint 0.35.0 has requirement importlib-resources~=1.4; python_version < "3.7" and python_version != "3.4", but you have importlib-resources 3.0.0.
botocore 1.17.44 has requirement docutils<0.16,>=0.10, but you have docutils 0.16.
astroid 2.4.2 has requirement lazy-object-proxy==1.4.*, but you have lazy-object-proxy 1.5.1.

The main problem is snowflake due to snowflakedb/snowflake-connector-python#324 and this comment:

airflow/setup.py

Lines 385 to 392 in ce91991

snowflake = [
# snowflake is not compatible with latest version.
# This library monkey patches the requests library, so SSL is broken globally.
# See: https://github.com/snowflakedb/snowflake-connector-python/issues/324
'requests<2.24.0',
'snowflake-connector-python>=1.5.2',
'snowflake-sqlalchemy>=1.1.0',
]

what's more this library has constraint for boto3, 'boto3>=1.4.4,<1.16'
https://github.com/snowflakedb/snowflake-connector-python/blob/29ad620f7d8bb14f2bb0bc1d95b4dc35b65447bb/setup.py#L195

I'm wondering how we should solve it. Or what should be our criteria when selecting versions?

@potiuk
Copy link
Member

potiuk commented Nov 24, 2020

I am fixing some of those in a moment. It is rather easy to do manually.

What I tried to do is to run those in Breeze:

pip install --upgrade snowflake
pip install --upgrade moto
pip install --upgrade astroid 

And after that:

pip check
No broken requirements found.

seems tht simply downgrading request to 2.23.0 and docutils to 0.15.2 should fix the problem for requests (but there will still be problems for azure that needs fixing.

The problem comes from snowflake's adding the limitation after we upgraded to 2.24.0 it seems. But now, with the latest version of snowflake request will not be upgraded. then we can move on to the others.

@potiuk
Copy link
Member

potiuk commented Nov 24, 2020

I am double-checking that and pushing new images shortly (BTW. once we add pip check to the pipeline and fix all of those this should never happen in the future).

@turbaszek
Copy link
Member Author

Indeed this seems to be working on 3.8:

pip install --upgrade "requests<2.24.0
pip install --upgrade snowflake-connector-python
pip install --upgrade moto
pip install --upgrade astroid

@potiuk
Copy link
Member

potiuk commented Nov 24, 2020

It's not that easy though. I found that on 3.7 there is more to it and we might break some tests when we do it. What I am going to do is to add a special case that we will be able to test new set of constraints separately so that we do not break people's code. Bear with me.

@turbaszek turbaszek force-pushed the run-check-on-prod branch 2 times, most recently from e791357 to f8d4eff Compare November 25, 2020 16:20
This is first step to address apache#12508

fixup! Run pip check when to verify production images

fixup! Run pip check when to verify production images
@github-actions
Copy link

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@github-actions
Copy link

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@github-actions
Copy link

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@potiuk
Copy link
Member

potiuk commented Nov 28, 2020

Closing as it has been merged into #12664

@potiuk potiuk closed this Nov 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:dependencies Issues related to dependencies problems area:dev-tools full tests needed We need to run full set of tests for this PR to merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants