Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 15 additions & 17 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ jobs:
docker:
# specify the version you desire here
# use `-browsers` prefix for selenium tests, e.g. `3.6.1-browsers`
- image: cimg/python:3.8.0
- image: continuumio/miniconda3

# Specify service dependencies here if necessary
# CircleCI maintains a library of pre-built images
Expand All @@ -19,39 +19,37 @@ jobs:

steps:
- checkout

- run:
name: Set up Anaconda
name: Set up Conda
command: |
wget -q http://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda.sh;
chmod +x ~/miniconda.sh;
~/miniconda.sh -b -p ~/miniconda;
export PATH=~/miniconda/bin:$PATH
echo "export PATH=~/miniconda/bin:$PATH" >> $BASH_ENV;
conda update --yes --quiet conda;
conda init bash
sed -ne '/>>> conda initialize/,/<<< conda initialize/p' ~/.bashrc >> $BASH_ENV

conda update --yes --quiet conda;
export CONDA_EXE=/opt/conda/bin/conda
sed -ne '/>>> conda initialize/,/<<< conda initialize/p' ~/.bashrc >> $BASH_ENV

- run:
name: Build cookiecutter environment and test-env project
command: |
conda create -n cookiecutter --yes python=3.8
conda create -n cookiecutter --yes python=3.8 make
conda activate cookiecutter
pip install cookiecutter
pip install ruamel.yaml
mkdir /home/circleci/.cookiecutter_replay
cp circleci-cookiecutter-easydata.json /home/circleci/.cookiecutter_replay/cookiecutter-easydata.json
mkdir -p /root/repo/.cookiecutter_replay
cp circleci-cookiecutter-easydata.json /root/repo/.cookiecutter_replay/cookiecutter-easydata.json
pwd
which make
cookiecutter --config-file .cookiecutter-easydata-test-circleci.yml . -f --no-input
conda deactivate


- run:
name: Create test-env environment and contrive to always use it
command: |
conda activate cookiecutter
cd test-env
export CONDA_EXE=/home/circleci/miniconda/bin/conda
export CONDA_EXE=/opt/conda/bin/conda
make create_environment
conda activate test-env
conda install -c anaconda make
touch environment.yml
make update_environment
echo "conda activate test-env" >> $BASH_ENV;
Expand Down
2 changes: 1 addition & 1 deletion docs/00-xyz-sample-notebook.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(ds.DESCR)"
"print(ds.README)"
]
},
{
Expand Down
14 changes: 7 additions & 7 deletions docs/Add-csv-template.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@
"* `csv_path`: The desired path to your .csv file (in this case `epidemiology.csv`) relative to paths['raw_data_path']\n",
"* `download_message`: The message to display to indicate to the user how to manually download your .csv file.\n",
"* `license_str`: Information on the license for the dataset\n",
"* `descr_str`: Information on the dataset itself"
"* `readme_str`: Information on the dataset itself"
]
},
{
Expand Down Expand Up @@ -123,7 +123,7 @@
"metadata": {},
"outputs": [],
"source": [
"descr_str = \"\"\"\n",
"readme_str = \"\"\"\n",
"The epidemiology table from Google's [COVID-19 Open-Data dataset](https://github.com/GoogleCloudPlatform/covid-19-open-data). \n",
"\n",
"The full dataset contains datasets of daily time-series data related to COVID-19 for over 20,000 distinct locations around the world. The data is at the spatial resolution of states/provinces for most regions and at county/municipality resolution for many countries such as Argentina, Brazil, Chile, Colombia, Czech Republic, Mexico, Netherlands, Peru, United Kingdom, and USA. All regions are assigned a unique location key, which resolves discrepancies between ISO / NUTS / FIPS codes, etc. The different aggregation levels are:\n",
Expand Down Expand Up @@ -170,7 +170,7 @@
" csv_path=csv_path,\n",
" download_message=download_message,\n",
" license_str=license_str,\n",
" descr_str=descr_str,\n",
" readme_str=readme_str,\n",
" overwrite_catalog=True)"
]
},
Expand Down Expand Up @@ -206,9 +206,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"By default, the workflow helper function also created a `covid-19-epidemiology_raw` dataset that has an empty `ds.data`, but keeps a record of the location of the final `epidemiology.csv` file relative to in `ds.EXTRA`.\n",
"By default, the workflow helper function also created a `covid-19-epidemiology_raw` dataset that has an empty `ds.data`, but keeps a record of the location of the final `epidemiology.csv` file relative to in `ds.FILESET`.\n",
"\n",
"The `.EXTRA` functionality is covered in other documentation."
"The `.FILESET` functionality is covered in other documentation."
]
},
{
Expand Down Expand Up @@ -236,7 +236,7 @@
"metadata": {},
"outputs": [],
"source": [
"ds_raw.EXTRA"
"ds_raw.FILESET"
]
},
{
Expand All @@ -246,7 +246,7 @@
"outputs": [],
"source": [
"# fq path to epidemiology.csv file\n",
"ds_raw.extra_file('epidemiology.csv')"
"ds_raw.fileset_file('epidemiology.csv')"
]
},
{
Expand Down
10 changes: 5 additions & 5 deletions docs/Add-derived-dataset.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(ds.DESCR)"
"print(ds.README)"
]
},
{
Expand Down Expand Up @@ -219,7 +219,7 @@
" source_dataset_name\n",
" dataset_name\n",
" data_function\n",
" added_descr_txt\n",
" added_readme_txt\n",
"\n",
"We'll want our `data_function` to be defined in the project module (in this case `src`) for reproducibility reasons (which we've already done with `subselect_by_key` above)."
]
Expand Down Expand Up @@ -250,7 +250,7 @@
"metadata": {},
"outputs": [],
"source": [
"added_descr_txt = f\"\"\"The dataset {dataset_name} is the subselection \\\n",
"added_readme_txt = f\"\"\"The dataset {dataset_name} is the subselection \\\n",
"to the {key} dataset.\"\"\""
]
},
Expand Down Expand Up @@ -281,7 +281,7 @@
" source_dataset_name=source_dataset_name,\n",
" dataset_name=dataset_name,\n",
" data_function=data_function,\n",
" added_descr_txt=added_descr_txt,\n",
" added_readme_txt=added_readme_txt,\n",
" overwrite_catalog=True)"
]
},
Expand Down Expand Up @@ -318,7 +318,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(ds.DESCR)"
"print(ds.README)"
]
},
{
Expand Down
12 changes: 6 additions & 6 deletions docs/New-Dataset-Template.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,7 @@
"metadata": {},
"source": [
"### Create a process function\n",
"By default, we recommend that you use the `process_extra_files` functionality and then use a transformer function to create a derived dataset, but you can optionally create your own."
"By default, we recommend that you use the `process_fileset_files` functionality and then use a transformer function to create a derived dataset, but you can optionally create your own."
]
},
{
Expand All @@ -176,11 +176,11 @@
"metadata": {},
"outputs": [],
"source": [
"from src.data.extra import process_extra_files\n",
"process_function = process_extra_files\n",
"from src.data.fileset import process_fileset_files\n",
"process_function = process_fileset_files\n",
"process_function_kwargs = {'file_glob':'*.csv',\n",
" 'do_copy': True,\n",
" 'extra_dir': ds_name+'.extra',\n",
" 'fileset_dir': ds_name+'.fileset',\n",
" 'extract_dir': ds_name}"
]
},
Expand Down Expand Up @@ -355,7 +355,7 @@
"metadata": {},
"outputs": [],
"source": [
"ds.EXTRA"
"ds.FILESET"
]
},
{
Expand All @@ -364,7 +364,7 @@
"metadata": {},
"outputs": [],
"source": [
"ds.extra_file('epidemiology.csv')"
"ds.fileset_file('epidemiology.csv')"
]
},
{
Expand Down
4 changes: 2 additions & 2 deletions docs/New-Edge-Template.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@
"metadata": {},
"outputs": [],
"source": [
"source_ds.EXTRA"
"source_ds.FILESET"
]
},
{
Expand Down Expand Up @@ -178,7 +178,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(ds.DESCR)"
"print(ds.README)"
]
},
{
Expand Down
3 changes: 3 additions & 0 deletions docs/test_docs.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@
import requests

from src import paths
from src.log import logger


CCDS_ROOT = Path(__file__).parents[1].resolve()
DOCS_DIR = CCDS_ROOT / "docs"
Expand All @@ -35,6 +37,7 @@ def test_notebook_csv(self):
csv_url = "https://storage.googleapis.com/covid19-open-data/v2/epidemiology.csv"
csv_dest = paths['raw_data_path'] / "epidemiology.csv"
if not csv_dest.exists():
logger.debug("Downloading epidemiology.csv")
csv_file = requests.get(csv_url)
with open(csv_dest, 'wb') as f:
f.write(csv_file.content)
Expand Down
11 changes: 2 additions & 9 deletions {{ cookiecutter.repo_name }}/.circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@ jobs:
docker:
# specify the version you desire here
# use `-browsers` prefix for selenium tests, e.g. `3.6.1-browsers`
- image: circleci/python:3.7.0
- image: continuumio/miniconda3


# Specify service dependencies here if necessary
# CircleCI maintains a library of pre-built images
Expand All @@ -20,14 +21,6 @@ jobs:
steps:
- checkout

- run:
name: Set up Anaconda
command: |
wget -q http://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda.sh;
chmod +x ~/miniconda.sh;
~/miniconda.sh -b -p ~/miniconda;
echo "export PATH=~/miniconda/bin:$PATH" >> $BASH_ENV;

- run:
name: Create environment and contrive to always use it
command: |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,19 @@ The `{{ cookiecutter.repo_name }}` repo is set up with template code to make man

If you haven't yet, configure your conda environment.

**WARNING**: If you have conda-forge listed as a channel in your `.condarc` (or any other channels other than defaults), you may experience great difficulty generating reproducible conda environments.

We recommend you remove conda-forge (and all other non-default channels) from your `.condarc` file and [set your channel priority to 'strict'](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-channels.html). You can still use conda-forge (or any other conda channel), just specify it explicitly in your `environment.yml` by prefixing your package name with `channel-name::`; e.g.
```
- wheel # install from the default (anaconda) channel
- pytorch::pytorch # install this from the `pytorch` channel
- conda-forge::tokenizers # install this from conda-forge
```

## Configuring your python environment
Easydata uses conda to manage python packages installed by both conda **and pip**.

### Adjust your `.condarc`
**WARNING FOR EXISTING CONDA USERS**: If you have `conda-forge` listed as a channel in your `.condarc` (or any other channels other than `default`), **remove them**. These channels should be specified in `environment.yml` instead.

We also recommend [setting your channel priority to 'strict'](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-channels.html) to reduce package incompatibility problems. This will be the default in conda 5.0, but in order to assure reproducibility, we need to use this behavior now.

```
conda config --set channel_priority strict
Expand All @@ -26,18 +32,30 @@ conda config --prepend channels defaults
conda config --prepend envs_dirs ~/.conda/envs # Store environments in local dir for JupyterHub
```

### Fix the CONDA_EXE path
* Make note of the path to your conda binary:
#### Locating the `conda` binary
Ensure the Makefile can find your conda binary, either by setting the `CONDA_EXE` environment variable, or by modifying `Makefile.include` directly.

First, check if `CONDA_EXE` is already set
```
$ which conda
>>> export | grep CONDA_EXE
CONDA_EXE=/Users/your_username/miniconda3/bin/conda
```

If `CONDA_EXE` is not set, you will need to set it manually in `Makefile.include`; i.e.

* Make note of the path to your conda binary. It should be in the `bin` subdirectory of your Anaconda (or miniconda) installation directory:
```
>>> which conda # this will only work if conda is in your PATH, otherwise, verify manually
~/miniconda3/bin/conda
```
* ensure your `CONDA_EXE` environment variable is set correctly in `Makefile.include`
* ensure your `CONDA_EXE` environment variable is set to this value; i.e.
```
export CONDA_EXE=~/miniconda3/bin/conda
>>> export CONDA_EXE=~/miniconda3/bin/conda
```
or edit `Makefile.include` directly.

### Create the conda environment
* Create and switch to the virtual environment:
Create and switch to the virtual environment:
```
cd {{ cookiecutter.repo_name }}
make create_environment
Expand Down
Loading