Skip to content
This repository was archived by the owner on Mar 6, 2023. It is now read-only.

Setup CI#177

Merged
LukeWeidenwalker merged 96 commits into
Open-EO:masterfrom
LukeWeidenwalker:master
Jul 1, 2022
Merged

Setup CI#177
LukeWeidenwalker merged 96 commits into
Open-EO:masterfrom
LukeWeidenwalker:master

Conversation

@LukeWeidenwalker
Copy link
Copy Markdown
Contributor

@LukeWeidenwalker LukeWeidenwalker commented Jun 7, 2022

Sets up a CI pipeline to resolve #99.

More things to do before this can be merged:

  • Clean up the build + dependencies
    • Note: I removed the xarray-extras dependency, because I couldn't get it to build with everything else + the functionality we use it for now seems to be in xarray
    • needs documentation
    • add gdal python dependency
    • add sensible build variants
  • Adds a devcontainer to develop in
    • Needs documentation
    • git isn't properly installed, so the vcs extension doesn't work
  • Adds a GH actions workflow that runs mypy and pytest for the supported Python versions.

@LukeWeidenwalker LukeWeidenwalker mentioned this pull request Jun 8, 2022
4 tasks
@LukeWeidenwalker
Copy link
Copy Markdown
Contributor Author

@soxofaan @clausmichele
Hey both - I've recently been working on setting up a CI pipeline for this repo. Now that the basic flow works and @SerRichard has helped me iron out the obvious flaws, I think these changes are ready to be looked upon/tried out by both of you.

Some notes:

  • I've been using docker version 20.10.15 to build this
  • I will extend the range of Python versions to be tested across all of 3.6-3.10, just wanted to save some compute while I was still experimenting
  • I'll add proper linting in a separate PR
  • Some tests and type checking turned out to be currently failing - I'm currently skipping those, just to get to an initial green build, but we'll have to fix these asap after this is merged.
  • I've recently taken a liking to poetry and have been using it here - I like how all the settings are in a single file, that there's well-defined dev-dependencies, that it resolves fast and most importantly that it will make it much easier to publish this package to PyPI later! If you can't be persuaded into using it, we can look at options to include a frozen requirements.txt. I'd like the source of truth for dependencies to be managed by poetry though, for the reasons outlined above.
  • I realize that using docker to setup GDAL adds complexity. I also realize that you might not have used the Remote Containers extension, or even be using VSCode for development at all. However, I've come at this from various angles already and as long as this repo depends on GDAL I truly think that just running everything in docker is the best option to ensure a consistent build across contributors and CI runs. It's also quite nice to use once setup. If you run into any troubles, please feel free to ping me, I'm more than happy to jump on as many calls necessary to help through any setup troubles (and thereby also improve the instructions in the readme)!
  • Are there any further build variants you'd like to see included here? We were thinking about adding one for the dask worker specifically, any other suggestions?

@clausmichele
Copy link
Copy Markdown
Member

Hi @LukeWeidenwalker! Thanks for the effort, really like to see this going on. I'm not an expert in setting up CI pipelines, so I can't help you much here.

About your last question: it would be nice to have a version which does not depend on the EODC set-up. For example, the save_result process currently has hard coded that the data in ODC should be stored in EQUI7:

def get_equi7_tiles(data: xr.Dataset):

def eodc_collections_to_res():

@LukeWeidenwalker
Copy link
Copy Markdown
Contributor Author

@clausmichele That's totally fine - the thing I'm most keen on getting feedback for is whether the new build process (as described in the updated README) works for everyone or whether the instructions need further clarifications. I.e. when you setup the new dev environment, can you actually get everything to run as you'd expect it to! ;)

Ah, so in that question I was just referring to build variants, i.e. subsets of dependencies, for when you only want to use parts of this library! I totally agree though that these hardcoded EODC-specific bits want to go, but let's tackle that in a different issue!

@LukeWeidenwalker LukeWeidenwalker changed the title Draft: Setup CI Setup CI Jun 28, 2022
@clausmichele
Copy link
Copy Markdown
Member

Ok I had a look at the README file. So, if you merge this we would be forced to use Visual Studio for updating/developing the repo? Or just for running the tests? Wouldn't be possible to include them as a GitHub action?

@LukeWeidenwalker
Copy link
Copy Markdown
Contributor Author

I would certainly recommend it, but I don't want to force specific tooling on anyone - if you choose to develop outside of the devcontainer and if you already have GDAL installed on your system, you should also be able to create a local virtual environment to develop in using the step described in https://github.com/LukeWeidenwalker/openeo-processes-python#virtual-environment (as I said, please let me know whether that works for you!). With this you will be able to run the tests locally using poetry run python -m pytest. However, the CI runs the tests across multiple Python versions (eventually 3.6-3.10, see above) using nox. This obviously requires all these python versions to be installed on the system. The devcontainer already has these setup for you, but if you develop outside of it, they might not be present. I.e. when tests fail for a specific version in CI, you might not be able to recreate that situation locally as easily, whereas when using the devcontainer you'll have the exact same setup locally that has run in CI.

@soxofaan
Copy link
Copy Markdown
Member

Looks interesting! A couple of notes:

  • as mentioned, I see a lot of VSCode references, e.g. also in user names in the docker container, which looks weird to me.
  • do I understand correctly that the image build is part of the test github actions workflow? It looks like a non-trivial build because you install 5 python versions 😄 . Isn't that going to be a serious time drain?
  • you also drop the extras_require feature (optional dependencies), right? So the package unconditionally depends on dask, odc-algo, ... That's not ideal if we want to use this library from the VITO backend, just for numpy implementation and don't want to drag in dependencies we don't use

@LukeWeidenwalker
Copy link
Copy Markdown
Contributor Author

  1. Ah, fair point - the important parts are the UIDs anyways, so I'll change the USERNAME to the more generic ubuntu!
  2. The image build is indeed part of the test workflow, but I've added layer caching, so layers will only have to be rebuilt when they've actually changed!
  3. Right - this is a good point too! So the desired behaviour would be to be able to do something like pip install openeo-processes[dask], pip install openeo-processes[odc] or pip install openeo-processes[bare], right? I haven't tried this yet, but something similar should be feasible with poetry dependency groups - which will only become available with poetry 1.2 (see this issue). I've created Publish v0.2.0 to PyPI #182 and added optional dependencies as a sub-task there. Maybe by the time I get a chance to pick this up, 1.2 will be out already ;)

@LukeWeidenwalker LukeWeidenwalker mentioned this pull request Jun 28, 2022
5 tasks
@soxofaan
Copy link
Copy Markdown
Member

So the desired behaviour would be to be able to do something like pip install openeo-processes[dask], pip install openeo-processes[odc] or pip install openeo-processes[bare], right? I haven't tried this yet, but something similar should be feasible with poetry dependency groups ```

Indeed something like that. With the difference that a bare install would just be pip install openeo-processes, without more specification.

I don't know poetry very well, but it seem that it already supports "extras" (which is the feature I referred to): https://python-poetry.org/docs/pyproject/#extras

@LukeWeidenwalker
Copy link
Copy Markdown
Contributor Author

Thanks for that, I missed that feature somehow - I've updated pyproject.toml to allow installing these variants. Note however, that unless everything is installed, the test suite won't run because src/openeo_processes/__init__.py attempts to import the entire library using from openeo_processes.arrays import * statements. Idk how you've been using numpy-only processes previously, surely this would have failed too if these imports were missing?

Anyways, I think this is out of scope for this PR (but should be cleaned up at some point!).
Let me know if there's anything else, otherwise I'd really like to get this merged today.

@soxofaan
Copy link
Copy Markdown
Member

Note however, that unless everything is installed, the test suite won't run because src/openeo_processes/init.py attempts to import the entire library using from openeo_processes.arrays import * statements.

If that is the case, then it doesn't make sense to have optional dependencies. Anytime you do any sub-import like import openeo_processes.foo.bar you implicitly do an import of src/openeo_processes/__init__.py first, so nothing will work unless all the dependencies of src/openeo_processes/__init__.py are available.

Idk how you've been using numpy-only processes previously,

We are pinned to a very old version 0.0.4 that does work without having and optional dependency like dask installed: https://github.com/Open-EO/openeo-python-driver/blob/master/setup.py#L44

There are several approaches to handling optional dependencies:

  • local "just in time" imports (instead of global module level imports)
  • falling back on None, e.g.
    try:
        import dask
    except ImportError:
        dask = None
    

@LukeWeidenwalker
Copy link
Copy Markdown
Contributor Author

Yes, nothing will work until we've done this refactoring to wrap imports of potentially missing dependencies.
I'll create a dedicated issue for this and put it as one of the things to do before releasing v.0.2.0 in #182.

@LukeWeidenwalker LukeWeidenwalker merged commit 85d11af into Open-EO:master Jul 1, 2022
This was referenced Jul 1, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Setup CI

5 participants