bitfount
diff --git a/‎.github/workflows/convert_md_to_ipynb.yaml‎
Lines changed: 54 additions & 0 deletions b/‎.github/workflows/convert_md_to_ipynb.yaml‎
Lines changed: 54 additions & 0 deletions
diff --git a/‎.gitignore‎
Lines changed: 129 additions & 0 deletions b/‎.gitignore‎
Lines changed: 129 additions & 0 deletions
diff --git a/‎Connecting Data & Creating Pods/running_a_pod.md‎
Lines changed: 127 additions & 0 deletions b/‎Connecting Data & Creating Pods/running_a_pod.md‎
Lines changed: 127 additions & 0 deletions
@@ -0,0 +1,54 @@
+# Converts .md files to .ipynb
+name: Convert md files to ipynb
+on:
+  push:
+    branches:
+      - sync
+jobs:
+  convert-ipynb:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout Code
+        uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
+
+      - name: Install jupytext
+        run: |
+          pip install jupytext
+
+      - uses: webfactory/ssh-agent@v0.9.0
+        with:
+          ssh-private-key: ${{ secrets.TUTORIALS_SSH_KEY }}
+
+      # We need to set this to ensure that `release-it` can push the tag
+      # https://github.com/release-it/release-it/blob/master/docs/ci.md#github-actions
+      - name: git config
+        run: |
+          git config user.name "${GITHUB_ACTOR}"
+          git config user.email "${GITHUB_ACTOR}@users.noreply.github.com"
+
+      - name: Convert .md files to .ipynb and push to main branch
+        run: |
+          # Checkout temporary branch
+          git checkout --orphan tmp
+
+          # convert .md to .ipynb
+          jupytext --to ipynb Data\ Science\ Tasks/*.md
+          jupytext --to ipynb Connecting\ Data\ \&\ Creating\ Pods/*.md
+
+          # remove .md files
+          rm -r Data\ Science\ Tasks/*.md
+          rm -r Connecting\ Data\ \&\ Creating\ Pods/*.md
+
+          # Add all files to branch
+          git add -A
+
+          # commit .ipynb files
+          git commit -m 'Bitfount tutorials'
+
+          # rename current branch main
+          git branch -m main
+
+          # Push changes to main
+          git push -f -u git@github.com:bitfount/tutorials.git main
@@ -0,0 +1,129 @@
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+pip-wheel-metadata/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# IPython
+profile_default/
+ipython_config.py
+
+# pyenv
+.python-version
+
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+#Pipfile.lock
+
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow
+__pypackages__/
+
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+
+# Pyre type checker
+.pyre/
@@ -0,0 +1,127 @@
+<!--
+
+---
+hide_title: true
+jupyter:
+  jupytext:
+    hide_notebook_metadata: true
+    root_level_metadata_as_raw_cell: false
+    text_representation:
+      extension: .md
+      format_name: markdown
+      format_version: '1.3'
+      jupytext_version: 1.16.4
+  kernelspec:
+    display_name: Python 3 (ipykernel)
+    language: python
+    name: python3
+sidebar_label: Running a Pod
+sidebar_position: 1
+slug: /connecting-data-and-creating-pods/running-a-pod
+---
+
+-->
+
+# Running a Pod
+
+Welcome to the Bitfount federated learning tutorials! In this sequence of tutorials, you will learn how federated learning works on the Bitfount platform.
+This first tutorial introduces the concept of Pods (Processor of Data). A Pod is the component of the Bitfount network which allows for models or queries to run on remote data. Pods are co-located with data, check that users are authorised to perform a given operation, and then execute any approved computation.
+
+By the end of this Jupyter notebook, you should know how to run a Pod by interacting with the Bitfount Python API.
+
+### Prerequisites
+
+```python
+!pip install bitfount
+```
+
+### Setting up
+
+If you haven't already, create your Bitfount account on the [Bitfount Hub](https://hub.bitfount.com).
+
+If you'd like to run these tutorials locally, clone this repository, activate your virtual environment, install our package: `pip install bitfount` and open a Jupyter notebook by running `jupyter notebook` in your preferred terminal client.
+
+To run a Pod, we must import the relevant pieces from our [API reference](https://docs.bitfount.com/api/bitfount/federated/pod) for constructing a Pod. While several of these are optional, it is best practice to import them all for flexibility.
+
+```python
+import logging
+
+import nest_asyncio
+
+from bitfount import CSVSource, DatasourceContainerConfig, Pod, setup_loggers
+from bitfount.runners.config_schemas.pod_schemas import PodDataConfig, PodDetailsConfig
+
+nest_asyncio.apply()  # Needed because Jupyter also has an asyncio loop
+```
+
+Let's set up the loggers. The loggers are necessary to ensure you can receive real-time feedback on your task's progress or error messages if something goes wrong:
+
+```python tags=["logger_setup"]
+loggers = setup_loggers([logging.getLogger("bitfount")])
+```
+
+### Setting up the Pod
+
+In order to set up a Pod, we must specify a config detailing the characteristics of the Pod. For example:
+
+```python
+# Configure a pod using the census income data.
+
+# First, let's set up the pod details configuration for the datasource.
+datasource_details = PodDetailsConfig(
+    display_name="Census Income Demo Pod",
+    description="This pod contains data from the census income demo set",
+)
+# Set up the datasource and data configuration
+datasource = CSVSource(
+    "https://bitfount-hosted-downloads.s3.eu-west-2.amazonaws.com/bitfount-tutorials/census_income.csv",
+    partition_size=1000,
+)
+data_config = PodDataConfig(
+    ignore_cols=["fnlwgt"],
+    force_stypes={
+        "categorical": [
+            "TARGET",
+            "workclass",
+            "marital-status",
+            "occupation",
+            "relationship",
+            "race",
+            "native-country",
+            "gender",
+            "education",
+        ],
+    },
+    modifiers=None,
+    datasource_args={"seed": 100},
+)
+
+pod = Pod(
+    name="census-income-demo",
+    datasources=[
+        DatasourceContainerConfig(
+            name="census-income-demo-dataset",
+            datasource=datasource,
+            datasource_details=datasource_details,
+            data_config=data_config,
+        )
+    ],
+    # approved_pods is an optional attribute, that we will use later in the
+    # "Training a Model on Two Pods" Tutorial
+    approved_pods=["census-income-yaml-demo-dataset"],
+)
+```
+
+Notice how we specified which dataset to connect using the DatasourceContainerConfig and how to read the dataset by including the details in `data_config`. [PodDataConfig](https://docs.bitfount.com/api/bitfount/runners/config_schemas#poddataconfig) has several parameters, many of which are optional, so be sure to check what will work best for your dataset configuration.
+
+### Running the Pod
+
+That's the setup done. Let's run the Pod. You'll notice that the notebook cell doesn't complete. That's because the Pod is set to run until it is interrupted! This is important, as the Pod will need to be running in order for it to be accessed. This means if you are planning to continue to the next tutorial set, keep the kernel running!
+
+```python
+pod.start()
+```
+
+You should now be able to see your Pod as registered in your Datasets page on [Bitfount Hub Datasets page](https://am.hub.bitfount.com/datasets). If you'd like to learn an alternative mechanism to running a Pod by pointing to a YAML file configuration, go to "Running a Pod Using YAML". If you'd like to skip to training a model or running a SQL query on a Pod, open up "Querying and Training a Model".
+
+Contact our support team at [support@bitfount.com](mailto:support@bitfount.com) if you have any questions.