Skip to content

Commit a8c75d5

Browse files
committed
Update tutorials
0 parents  commit a8c75d5

File tree

11 files changed

+1544
-0
lines changed

11 files changed

+1544
-0
lines changed
Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
# Converts .md files to .ipynb
2+
name: Convert md files to ipynb
3+
on:
4+
push:
5+
branches:
6+
- sync
7+
jobs:
8+
convert-ipynb:
9+
runs-on: ubuntu-latest
10+
steps:
11+
- name: Checkout Code
12+
uses: actions/checkout@v4
13+
with:
14+
fetch-depth: 0
15+
16+
- name: Install jupytext
17+
run: |
18+
pip install jupytext
19+
20+
- uses: webfactory/ssh-agent@v0.9.0
21+
with:
22+
ssh-private-key: ${{ secrets.TUTORIALS_SSH_KEY }}
23+
24+
# We need to set this to ensure that `release-it` can push the tag
25+
# https://github.com/release-it/release-it/blob/master/docs/ci.md#github-actions
26+
- name: git config
27+
run: |
28+
git config user.name "${GITHUB_ACTOR}"
29+
git config user.email "${GITHUB_ACTOR}@users.noreply.github.com"
30+
31+
- name: Convert .md files to .ipynb and push to main branch
32+
run: |
33+
# Checkout temporary branch
34+
git checkout --orphan tmp
35+
36+
# convert .md to .ipynb
37+
jupytext --to ipynb Data\ Science\ Tasks/*.md
38+
jupytext --to ipynb Connecting\ Data\ \&\ Creating\ Pods/*.md
39+
40+
# remove .md files
41+
rm -r Data\ Science\ Tasks/*.md
42+
rm -r Connecting\ Data\ \&\ Creating\ Pods/*.md
43+
44+
# Add all files to branch
45+
git add -A
46+
47+
# commit .ipynb files
48+
git commit -m 'Bitfount tutorials'
49+
50+
# rename current branch main
51+
git branch -m main
52+
53+
# Push changes to main
54+
git push -f -u git@github.com:bitfount/tutorials.git main

.gitignore

Lines changed: 129 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,129 @@
1+
# Byte-compiled / optimized / DLL files
2+
__pycache__/
3+
*.py[cod]
4+
*$py.class
5+
6+
# C extensions
7+
*.so
8+
9+
# Distribution / packaging
10+
.Python
11+
build/
12+
develop-eggs/
13+
dist/
14+
downloads/
15+
eggs/
16+
.eggs/
17+
lib/
18+
lib64/
19+
parts/
20+
sdist/
21+
var/
22+
wheels/
23+
pip-wheel-metadata/
24+
share/python-wheels/
25+
*.egg-info/
26+
.installed.cfg
27+
*.egg
28+
MANIFEST
29+
30+
# PyInstaller
31+
# Usually these files are written by a python script from a template
32+
# before PyInstaller builds the exe, so as to inject date/other infos into it.
33+
*.manifest
34+
*.spec
35+
36+
# Installer logs
37+
pip-log.txt
38+
pip-delete-this-directory.txt
39+
40+
# Unit test / coverage reports
41+
htmlcov/
42+
.tox/
43+
.nox/
44+
.coverage
45+
.coverage.*
46+
.cache
47+
nosetests.xml
48+
coverage.xml
49+
*.cover
50+
*.py,cover
51+
.hypothesis/
52+
.pytest_cache/
53+
54+
# Translations
55+
*.mo
56+
*.pot
57+
58+
# Django stuff:
59+
*.log
60+
local_settings.py
61+
db.sqlite3
62+
db.sqlite3-journal
63+
64+
# Flask stuff:
65+
instance/
66+
.webassets-cache
67+
68+
# Scrapy stuff:
69+
.scrapy
70+
71+
# Sphinx documentation
72+
docs/_build/
73+
74+
# PyBuilder
75+
target/
76+
77+
# Jupyter Notebook
78+
.ipynb_checkpoints
79+
80+
# IPython
81+
profile_default/
82+
ipython_config.py
83+
84+
# pyenv
85+
.python-version
86+
87+
# pipenv
88+
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
89+
# However, in case of collaboration, if having platform-specific dependencies or dependencies
90+
# having no cross-platform support, pipenv may install dependencies that don't work, or not
91+
# install all needed dependencies.
92+
#Pipfile.lock
93+
94+
# PEP 582; used by e.g. github.com/David-OConnor/pyflow
95+
__pypackages__/
96+
97+
# Celery stuff
98+
celerybeat-schedule
99+
celerybeat.pid
100+
101+
# SageMath parsed files
102+
*.sage.py
103+
104+
# Environments
105+
.env
106+
.venv
107+
env/
108+
venv/
109+
ENV/
110+
env.bak/
111+
venv.bak/
112+
113+
# Spyder project settings
114+
.spyderproject
115+
.spyproject
116+
117+
# Rope project settings
118+
.ropeproject
119+
120+
# mkdocs documentation
121+
/site
122+
123+
# mypy
124+
.mypy_cache/
125+
.dmypy.json
126+
dmypy.json
127+
128+
# Pyre type checker
129+
.pyre/
Lines changed: 127 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,127 @@
1+
<!--
2+
3+
---
4+
hide_title: true
5+
jupyter:
6+
jupytext:
7+
hide_notebook_metadata: true
8+
root_level_metadata_as_raw_cell: false
9+
text_representation:
10+
extension: .md
11+
format_name: markdown
12+
format_version: '1.3'
13+
jupytext_version: 1.16.4
14+
kernelspec:
15+
display_name: Python 3 (ipykernel)
16+
language: python
17+
name: python3
18+
sidebar_label: Running a Pod
19+
sidebar_position: 1
20+
slug: /connecting-data-and-creating-pods/running-a-pod
21+
---
22+
23+
-->
24+
25+
# Running a Pod
26+
27+
Welcome to the Bitfount federated learning tutorials! In this sequence of tutorials, you will learn how federated learning works on the Bitfount platform.
28+
This first tutorial introduces the concept of Pods (Processor of Data). A Pod is the component of the Bitfount network which allows for models or queries to run on remote data. Pods are co-located with data, check that users are authorised to perform a given operation, and then execute any approved computation.
29+
30+
By the end of this Jupyter notebook, you should know how to run a Pod by interacting with the Bitfount Python API.
31+
32+
### Prerequisites
33+
34+
```python
35+
!pip install bitfount
36+
```
37+
38+
### Setting up
39+
40+
If you haven't already, create your Bitfount account on the [Bitfount Hub](https://hub.bitfount.com).
41+
42+
If you'd like to run these tutorials locally, clone this repository, activate your virtual environment, install our package: `pip install bitfount` and open a Jupyter notebook by running `jupyter notebook` in your preferred terminal client.
43+
44+
To run a Pod, we must import the relevant pieces from our [API reference](https://docs.bitfount.com/api/bitfount/federated/pod) for constructing a Pod. While several of these are optional, it is best practice to import them all for flexibility.
45+
46+
```python
47+
import logging
48+
49+
import nest_asyncio
50+
51+
from bitfount import CSVSource, DatasourceContainerConfig, Pod, setup_loggers
52+
from bitfount.runners.config_schemas.pod_schemas import PodDataConfig, PodDetailsConfig
53+
54+
nest_asyncio.apply() # Needed because Jupyter also has an asyncio loop
55+
```
56+
57+
Let's set up the loggers. The loggers are necessary to ensure you can receive real-time feedback on your task's progress or error messages if something goes wrong:
58+
59+
```python tags=["logger_setup"]
60+
loggers = setup_loggers([logging.getLogger("bitfount")])
61+
```
62+
63+
### Setting up the Pod
64+
65+
In order to set up a Pod, we must specify a config detailing the characteristics of the Pod. For example:
66+
67+
```python
68+
# Configure a pod using the census income data.
69+
70+
# First, let's set up the pod details configuration for the datasource.
71+
datasource_details = PodDetailsConfig(
72+
display_name="Census Income Demo Pod",
73+
description="This pod contains data from the census income demo set",
74+
)
75+
# Set up the datasource and data configuration
76+
datasource = CSVSource(
77+
"https://bitfount-hosted-downloads.s3.eu-west-2.amazonaws.com/bitfount-tutorials/census_income.csv",
78+
partition_size=1000,
79+
)
80+
data_config = PodDataConfig(
81+
ignore_cols=["fnlwgt"],
82+
force_stypes={
83+
"categorical": [
84+
"TARGET",
85+
"workclass",
86+
"marital-status",
87+
"occupation",
88+
"relationship",
89+
"race",
90+
"native-country",
91+
"gender",
92+
"education",
93+
],
94+
},
95+
modifiers=None,
96+
datasource_args={"seed": 100},
97+
)
98+
99+
pod = Pod(
100+
name="census-income-demo",
101+
datasources=[
102+
DatasourceContainerConfig(
103+
name="census-income-demo-dataset",
104+
datasource=datasource,
105+
datasource_details=datasource_details,
106+
data_config=data_config,
107+
)
108+
],
109+
# approved_pods is an optional attribute, that we will use later in the
110+
# "Training a Model on Two Pods" Tutorial
111+
approved_pods=["census-income-yaml-demo-dataset"],
112+
)
113+
```
114+
115+
Notice how we specified which dataset to connect using the DatasourceContainerConfig and how to read the dataset by including the details in `data_config`. [PodDataConfig](https://docs.bitfount.com/api/bitfount/runners/config_schemas#poddataconfig) has several parameters, many of which are optional, so be sure to check what will work best for your dataset configuration.
116+
117+
### Running the Pod
118+
119+
That's the setup done. Let's run the Pod. You'll notice that the notebook cell doesn't complete. That's because the Pod is set to run until it is interrupted! This is important, as the Pod will need to be running in order for it to be accessed. This means if you are planning to continue to the next tutorial set, keep the kernel running!
120+
121+
```python
122+
pod.start()
123+
```
124+
125+
You should now be able to see your Pod as registered in your Datasets page on [Bitfount Hub Datasets page](https://am.hub.bitfount.com/datasets). If you'd like to learn an alternative mechanism to running a Pod by pointing to a YAML file configuration, go to "Running a Pod Using YAML". If you'd like to skip to training a model or running a SQL query on a Pod, open up "Querying and Training a Model".
126+
127+
Contact our support team at [support@bitfount.com](mailto:support@bitfount.com) if you have any questions.

0 commit comments

Comments
 (0)