Skip to content

Conversation

@potiuk
Copy link
Member

@potiuk potiuk commented Dec 18, 2022

Previously, the Azure Service Bus had to be disabled in order to get ARM compatibility (it failed to build cleanly as uampq did not have binary wheels released and they failed to compile cleanly on debian). But the last problem is fixed now, so we can re-enable it for ARM.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@potiuk
Copy link
Member Author

potiuk commented Dec 18, 2022

Actually - when I tried it on ARM now, the azure-service-bus package built cleanly on ARM without any setup and special libraries. I think it is quite ok to add it - we have quite a few packages that are built during installation on ARM, and this generally is not a big problem as long as the packages CAN be cleanly built without special workarounds or libraries.

Similarly plyvel - plyvel is optional dependency of Google Package and as long as we can build it cleanly (will try your solution from #28432 @Taragolis) I will re-add it.

@Taragolis
Copy link
Contributor

Hmmm.. I've unable to install it into the regular airflow image

docker run -it --rm apache/airflow:2.5.0-python3.9 bash

airflow@7e0bc7525889:/opt/airflow$ uname -a
Linux 7e0bc7525889 5.10.76-linuxkit #1 SMP PREEMPT Mon Nov 8 11:22:26 UTC 2021 aarch64 GNU/Linux

airflow@7e0bc7525889:/opt/airflow$ pip install azure-servicebus
Defaulting to user installation because normal site-packages is not writeable
Collecting azure-servicebus
  Downloading azure_servicebus-7.8.1-py3-none-any.whl (210 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 210.4/210.4 kB 2.3 MB/s eta 0:00:00
Requirement already satisfied: azure-core<2.0.0,>=1.14.0 in /home/airflow/.local/lib/python3.9/site-packages (from azure-servicebus) (1.26.1)
Requirement already satisfied: typing-extensions>=3.7.4.3 in /home/airflow/.local/lib/python3.9/site-packages (from azure-servicebus) (4.4.0)
Requirement already satisfied: six>=1.11.0 in /home/airflow/.local/lib/python3.9/site-packages (from azure-servicebus) (1.16.0)
Requirement already satisfied: isodate>=0.6.0 in /home/airflow/.local/lib/python3.9/site-packages (from azure-servicebus) (0.6.1)
Requirement already satisfied: azure-common~=1.1 in /home/airflow/.local/lib/python3.9/site-packages (from azure-servicebus) (1.1.28)
Requirement already satisfied: msrest<2.0.0,>=0.6.17 in /home/airflow/.local/lib/python3.9/site-packages (from azure-servicebus) (0.7.1)
Collecting uamqp<2.0.0,>=1.5.1
  Downloading uamqp-1.6.3.tar.gz (4.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.4/4.4 MB 3.7 MB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: requests>=2.18.4 in /home/airflow/.local/lib/python3.9/site-packages (from azure-core<2.0.0,>=1.14.0->azure-servicebus) (2.28.1)
Requirement already satisfied: certifi>=2017.4.17 in /home/airflow/.local/lib/python3.9/site-packages (from msrest<2.0.0,>=0.6.17->azure-servicebus) (2022.9.24)
Requirement already satisfied: requests-oauthlib>=0.5.0 in /home/airflow/.local/lib/python3.9/site-packages (from msrest<2.0.0,>=0.6.17->azure-servicebus) (1.3.1)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/airflow/.local/lib/python3.9/site-packages (from requests>=2.18.4->azure-core<2.0.0,>=1.14.0->azure-servicebus) (1.26.13)
Requirement already satisfied: charset-normalizer<3,>=2 in /home/airflow/.local/lib/python3.9/site-packages (from requests>=2.18.4->azure-core<2.0.0,>=1.14.0->azure-servicebus) (2.1.1)
Requirement already satisfied: idna<4,>=2.5 in /home/airflow/.local/lib/python3.9/site-packages (from requests>=2.18.4->azure-core<2.0.0,>=1.14.0->azure-servicebus) (3.4)
Requirement already satisfied: oauthlib>=3.0.0 in /home/airflow/.local/lib/python3.9/site-packages (from requests-oauthlib>=0.5.0->msrest<2.0.0,>=0.6.17->azure-servicebus) (3.2.2)
Building wheels for collected packages: uamqp
  Building wheel for uamqp (pyproject.toml) ... error
  error: subprocess-exited-with-error
  
  × Building wheel for uamqp (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [101 lines of output]
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.linux-aarch64-cpython-39
      creating build/lib.linux-aarch64-cpython-39/uamqp
      copying uamqp/mgmt_operation.py -> build/lib.linux-aarch64-cpython-39/uamqp
      copying uamqp/message.py -> build/lib.linux-aarch64-cpython-39/uamqp
      copying uamqp/address.py -> build/lib.linux-aarch64-cpython-39/uamqp
      copying uamqp/errors.py -> build/lib.linux-aarch64-cpython-39/uamqp
      copying uamqp/session.py -> build/lib.linux-aarch64-cpython-39/uamqp
      copying uamqp/receiver.py -> build/lib.linux-aarch64-cpython-39/uamqp
      copying uamqp/utils.py -> build/lib.linux-aarch64-cpython-39/uamqp
      copying uamqp/__init__.py -> build/lib.linux-aarch64-cpython-39/uamqp
      copying uamqp/types.py -> build/lib.linux-aarch64-cpython-39/uamqp
      copying uamqp/connection.py -> build/lib.linux-aarch64-cpython-39/uamqp
      copying uamqp/compat.py -> build/lib.linux-aarch64-cpython-39/uamqp
      copying uamqp/sender.py -> build/lib.linux-aarch64-cpython-39/uamqp
      copying uamqp/constants.py -> build/lib.linux-aarch64-cpython-39/uamqp
      copying uamqp/client.py -> build/lib.linux-aarch64-cpython-39/uamqp
      creating build/lib.linux-aarch64-cpython-39/uamqp/authentication
      copying uamqp/authentication/common.py -> build/lib.linux-aarch64-cpython-39/uamqp/authentication
      copying uamqp/authentication/cbs_auth.py -> build/lib.linux-aarch64-cpython-39/uamqp/authentication
      copying uamqp/authentication/__init__.py -> build/lib.linux-aarch64-cpython-39/uamqp/authentication
      copying uamqp/authentication/cbs_auth_async.py -> build/lib.linux-aarch64-cpython-39/uamqp/authentication
      creating build/lib.linux-aarch64-cpython-39/uamqp/async_ops
      copying uamqp/async_ops/session_async.py -> build/lib.linux-aarch64-cpython-39/uamqp/async_ops
      copying uamqp/async_ops/connection_async.py -> build/lib.linux-aarch64-cpython-39/uamqp/async_ops
      copying uamqp/async_ops/receiver_async.py -> build/lib.linux-aarch64-cpython-39/uamqp/async_ops
      copying uamqp/async_ops/utils.py -> build/lib.linux-aarch64-cpython-39/uamqp/async_ops
      copying uamqp/async_ops/__init__.py -> build/lib.linux-aarch64-cpython-39/uamqp/async_ops
      copying uamqp/async_ops/mgmt_operation_async.py -> build/lib.linux-aarch64-cpython-39/uamqp/async_ops
      copying uamqp/async_ops/client_async.py -> build/lib.linux-aarch64-cpython-39/uamqp/async_ops
      copying uamqp/async_ops/sender_async.py -> build/lib.linux-aarch64-cpython-39/uamqp/async_ops
      running egg_info
      writing uamqp.egg-info/PKG-INFO
      writing dependency_links to uamqp.egg-info/dependency_links.txt
      writing requirements to uamqp.egg-info/requires.txt
      writing top-level names to uamqp.egg-info/top_level.txt
      reading manifest file 'uamqp.egg-info/SOURCES.txt'
      reading manifest template 'MANIFEST.in'
      writing manifest file 'uamqp.egg-info/SOURCES.txt'
      copying uamqp/c_uamqp.c -> build/lib.linux-aarch64-cpython-39/uamqp
      running build_ext
      will build uamqp in build/temp.linux-aarch64-cpython-39/cmake
      Building with generator flags: -G "Unix Makefiles"
      calling cmake /tmp/pip-install-bblny97h/uamqp_6cd81584b2bb407a9ad505cc6952e587/src/vendor/azure-uamqp-c/ -G "Unix Makefiles" -Duse_openssl:bool=ON -Duse_default_uuid:bool=ON  -Duse_builtin_httpapi:bool=ON  -Dskip_samples:bool=ON -DCMAKE_POSITION_INDEPENDENT_CODE=TRUE -DCMAKE_BUILD_TYPE=Release
      CMake Error: CMake was unable to find a build program corresponding to "Unix Makefiles".  CMAKE_MAKE_PROGRAM is not set.  You probably need to select a different build tool.
      CMake Error: CMAKE_C_COMPILER not set, after EnableLanguage
      CMake Error: CMAKE_CXX_COMPILER not set, after EnableLanguage
      -- Configuring incomplete, errors occurred!
      See also "/tmp/pip-install-bblny97h/uamqp_6cd81584b2bb407a9ad505cc6952e587/build/temp.linux-aarch64-cpython-39/cmake/CMakeFiles/CMakeOutput.log".
      Traceback (most recent call last):
        File "/home/airflow/.local/lib/python3.9/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 351, in <module>
          main()
        File "/home/airflow/.local/lib/python3.9/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 333, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/home/airflow/.local/lib/python3.9/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 249, in build_wheel
          return _build_backend().build_wheel(wheel_directory, config_settings,
        File "/tmp/pip-build-env-3uhdze9t/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 413, in build_wheel
          return self._build_with_temp_dir(['bdist_wheel'], '.whl',
        File "/tmp/pip-build-env-3uhdze9t/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 398, in _build_with_temp_dir
          self.run_setup()
        File "/tmp/pip-build-env-3uhdze9t/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 484, in run_setup
          super(_BuildMetaLegacyBackend,
        File "/tmp/pip-build-env-3uhdze9t/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 335, in run_setup
          exec(code, locals())
        File "<string>", line 260, in <module>
        File "/tmp/pip-build-env-3uhdze9t/overlay/lib/python3.9/site-packages/setuptools/__init__.py", line 87, in setup
          return distutils.core.setup(**attrs)
        File "/tmp/pip-build-env-3uhdze9t/overlay/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 185, in setup
          return run_commands(dist)
        File "/tmp/pip-build-env-3uhdze9t/overlay/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
          dist.run_commands()
        File "/tmp/pip-build-env-3uhdze9t/overlay/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
          self.run_command(cmd)
        File "/tmp/pip-build-env-3uhdze9t/overlay/lib/python3.9/site-packages/setuptools/dist.py", line 1208, in run_command
          super().run_command(command)
        File "/tmp/pip-build-env-3uhdze9t/overlay/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/tmp/pip-build-env-3uhdze9t/overlay/lib/python3.9/site-packages/wheel/bdist_wheel.py", line 325, in run
          self.run_command("build")
        File "/tmp/pip-build-env-3uhdze9t/overlay/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
          self.distribution.run_command(command)
        File "/tmp/pip-build-env-3uhdze9t/overlay/lib/python3.9/site-packages/setuptools/dist.py", line 1208, in run_command
          super().run_command(command)
        File "/tmp/pip-build-env-3uhdze9t/overlay/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/tmp/pip-build-env-3uhdze9t/overlay/lib/python3.9/site-packages/setuptools/_distutils/command/build.py", line 132, in run
          self.run_command(cmd_name)
        File "/tmp/pip-build-env-3uhdze9t/overlay/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
          self.distribution.run_command(command)
        File "/tmp/pip-build-env-3uhdze9t/overlay/lib/python3.9/site-packages/setuptools/dist.py", line 1208, in run_command
          super().run_command(command)
        File "/tmp/pip-build-env-3uhdze9t/overlay/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "<string>", line 134, in run
        File "<string>", line 183, in build_cmake
        File "/usr/local/lib/python3.9/subprocess.py", line 373, in check_call
          raise CalledProcessError(retcode, cmd)
      subprocess.CalledProcessError: Command 'cmake /tmp/pip-install-bblny97h/uamqp_6cd81584b2bb407a9ad505cc6952e587/src/vendor/azure-uamqp-c/ -G "Unix Makefiles" -Duse_openssl:bool=ON -Duse_default_uuid:bool=ON  -Duse_builtin_httpapi:bool=ON  -Dskip_samples:bool=ON -DCMAKE_POSITION_INDEPENDENT_CODE=TRUE -DCMAKE_BUILD_TYPE=Release' returned non-zero exit status 1.
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for uamqp
Failed to build uamqp
ERROR: Could not build wheels for uamqp, which is required to install pyproject.toml-based projects

Maybe better keep requirement without any changes for now, create additional extra where we do not limit platform and later I could investigate and add into the Azure provider documentation about additional requirements for install it on Linux ARM.

WDYT?

@potiuk
Copy link
Member Author

potiuk commented Dec 18, 2022

Hmmm.. I've unable to install it into the regular airflow image

We have a number of requirements that require customising image rather than extending it. We have good instructions on it including how to customize image (if you want optimal size of the image):
https://airflow.apache.org/docs/docker-stack/build.html#extending-the-image

Or how to add a dependcy that requires compilation (I guess if you follow it, you will be able to install it):

https://airflow.apache.org/docs/docker-stack/build.html#example-when-you-add-packages-requiring-compilation

Maybe better keep requirement without any changes for now, create additional extra where we do not limit platform and later I could investigate and add into the Azure provider documentation about additional requirements for install it on Linux ARM.

I think this is rather fine when a package has no wheel to expect it is going to be compiled - and when it compiles cleanly without too much of a hassle, I think it's perfectly fine. We have a number of other deps and providers that expect the "customize/add build-essentials" approach. And this one is only in case of ARM so I am not too worried - someone who wants to install azure provider on ARM is anyhow walking an experimental path and should be well aware of what they are doing.

But maybe indeed in case of Azure we should not make it a requirement and make it optional extra of the provider - same as plyvel for google provider?

That would be - however - backwards incompatible for x86 users - because installing the provider will not pull azure-service-bus as requirement, so I am not too sure (we could likely come with some mixed approach where we have != aarch64 as "core" requirement and then have an extra to install it without that limitation - but I feel that would be rather cumbersome.

@Taragolis
Copy link
Contributor

Taragolis commented Dec 18, 2022

Oh, I just repeat step which probably most of user do first, just try to install it by pip into the regular image.

Some of them would read documentation and found this info, which really easy to find https://www.google.com/search?q=airflow+docker+install+python+packages, other would open issue, discussion, question on SO or into the Slack.

Another problem for users what packages they need to install for build it, so that is why I suggest to create per-provider information about each extra/optional dependency (I could do that)

That would be - however - backwards incompatible for x86 users - because installing the provider will not pull azure-service-bus as requirement, so I am not too sure (we could likely come with some mixed approach where we have != aarch64 as "core" requirement and then have an extra to install it without that limitation - but I feel that would be rather cumbersome.

I think better to keep compatibility and add as extra section even if it looks dumb. I bet that x86 main users now and for ARM we just have experimental support

@potiuk
Copy link
Member Author

potiuk commented Dec 18, 2022

Oh, I just repeat step which probably most of user do first, just try to install it by pip into the regular image.

I see what you mean. But the same error might happen for a number of other packages - not even our providers. There are plenty of packages there that require being compiled (unfortunately). And even some of our providers do. And some of them require some system packages to be installed. I do not think we can reasonably describe it for each provider separately - especially that many of our users do not use the image at all - they install airflow in their own virtualenv and on different distributions and there different dependencies are needed for those different distributions.

I am afraid (but maybe I am exaggerating) that if we start describing what else is needed, we might make people think that we should describe all the prerequisites for all the possible distributions, and that this description is "complete" - i.e. explains all the necessary requirements. For example you won't be able to install mysql provider without having mysql client libraries installed - and there are different ways how those can be installed on centos, redhat, debian, mint, etc. not even mentioning MacOS installation - which brings the whole new host of problems (additionally M1 + Intel).

I think if we start describing things like that in the docs, we are going down the rabbit hole. Should we verify and check all the prerequisites and describe them for all providers? Doing it for one provider will - pretty inevitably - lead to people asking:

My provider fails to install and I see those other providers have instructions - surely you should have instruction here and I should not need to solve it myself when I am installing it on <TYPE_YOUR_WEIRD_DISTRIBUTION_HERE>. - what are the instructions?"

I think it's a dangerous path to walk.

BuI just thought about another idea.

Something that I have thought about before. Maybe - on top of having optimized apache/airflow image and "slim" image we should have "fat" image (maybe a bit better named) that will have build-essentials installed and all of the librarires CI image has installed in "dev". Then the solution for anyone who wants to install such package will be - "customize" (you will have optimized image), "extend with build essentials" (less optimized) and "use fat image" (huge image, but will install any provider by default).

Then we can even add a generic FAQ on "build error" when adding your package (or maybe even we could try to invent some smart way of capturing the error in the PROD image and displaying the instruction to the user. I think it could be done by changing shelll in the image to our custom command.

I think building such "generic" message what to do in case of build error and three options to follow - one of them as easy as changing FROM apache/airflow:2.5.0 to FROM apache/airflow:fat-2.5.0 - should do the job nicely.

WDYT?

@Taragolis
Copy link
Contributor

I do not think we can reasonably describe it for each provider separately - especially that many of our users do not use the image at all - they install airflow in their own virtualenv and on different distributions and there different dependencies are needed for those different distributions.

We could try to do with simple cases. It is not mandatory I'd rather say it nice to have, if we know that package could be install on Linux (glib-based, not musl) but required install from sources then would be nice to have this information which save some time.

My provider fails to install and I see those other providers have instructions - surely you should have instruction here and I should not need to solve it myself when I am installing it on <TYPE_YOUR_WEIRD_DISTRIBUTION_HERE>. - what are the instructions?"

I think it's a dangerous path to walk.

I thought you have some kind template about how many contributors we already have and would be nice if someone who find become a contributor and improve this part 🤣

There is always would be someone who want to everything and now. I more focus on users which might be never appear into issues/discussions and slack and find this info by themself (Airflow / providers documentation or in some search engine).

Something that I have thought about before. Maybe - on top of having optimized apache/airflow image and "slim" image we should have "fat"

I think this would be nice if it not required a lot of effort from our side.
BTW, do we have some statistic about downloads particular image from Docker Hub?

Previously, the Azure Service Bus had to be disabled in order
to get ARM compatibility (it failed to build cleanly as uampq did
not have binary wheels released and they failed to compile cleanly
on debian). But the last problem is fixed now, so we can re-enable
it for ARM.
@potiuk potiuk force-pushed the re-enable-azure-service-bus-for-arm branch from 1228eac to 194232a Compare December 19, 2022 20:57
@potiuk
Copy link
Member Author

potiuk commented Dec 19, 2022

We could try to do with simple cases. It is not mandatory I'd rather say it nice to have, if we know that package could be install on Linux (glib-based, not musl) but required install from sources then would be nice to have this information which save some time.

A better option will be to add links to installation instructions for those packages that we know might be problematic. For example for Plyvel linking https://plyvel.readthedocs.io/en/latest/installation.html would be fine. NOTE - even Plyvel developer limited those to "Ubuntu and Debian".

I thought you have some kind template about how many contributors we already have and would be nice if someone who find become a contributor and improve this part 🤣

Well. I do.. But this one is tricky :). Any contribution there will at most explain what is needed for the OS/distribution of that particular user. Which might be even more misleading for other distros/MacOS/ARM. It's super-hard to write a generic installation instructions even if we limit to apt/yum (debian/RedHat). Because the same OS packages are often even named differently. This is a true rabbit hole we want to avoid. For some reason even creators of the libraries are super vague sometimes and only limit it to some distros.

I think this would be nice if it not required a lot of effort from our side.
BTW, do we have some statistic about downloads particular image from Docker Hub?

Very little. We know the total for apache/airflow:

curl -s https://hub.docker.com/v2/repositories/apache/airflow | jq -r ".pull_count"

Result: 95575880

Also we can see "per tag" last pull - and usually it is between few seconds and 2 hours (8-10 hrs sometimes) last when I checked. But this is unscientific really. We do not pull the images during the CI (we only use ghcr.io) so we know it's not skewed by our CI though.

@Taragolis
Copy link
Contributor

A better option will be to add links to installation instructions for those packages that we know might be problematic. For example for Plyvel linking https://plyvel.readthedocs.io/en/latest/installation.html would be fine. NOTE - even Plyvel developer limited those to "Ubuntu and Debian".

Yep initially i tried to use that doc, and unfortunately they refers to outdated name of package, from Debian 9 I guess 🤦‍♂️
https://packages.debian.org/search?suite=bullseye&arch=any&searchon=names&keywords=libleveldb

Ok, lets better see if any kind of issue appear, it is possible that no one actually affected or found solution by their own.

@potiuk potiuk merged commit 6e3cee1 into apache:main Dec 19, 2022
@potiuk potiuk deleted the re-enable-azure-service-bus-for-arm branch December 19, 2022 22:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants