diff --git a/.github/workflows/tutorials.yml b/.github/workflows/tutorials.yml index ac2f452a42..b4d62a4c86 100644 --- a/.github/workflows/tutorials.yml +++ b/.github/workflows/tutorials.yml @@ -71,4 +71,4 @@ jobs: token: ${{ secrets.GITHUB_TOKEN }} - name: Run tutorials - run: ./.github/utils/tutorials.sh ${{ env.pythonLocation }} "${{ steps.diff.outputs.added_modified }}" "Tutorial2_ Tutorial9_ Tutorial13_" + run: ./.github/utils/tutorials.sh ${{ env.pythonLocation }} "${{ steps.diff.outputs.added_modified }}" "Tutorial2_ Tutorial9_ Tutorial13_ Tutorial18_" diff --git a/docs/_src/tutorials/tutorials/18.md b/docs/_src/tutorials/tutorials/18.md index 34f97c1d3f..47605b7f21 100644 --- a/docs/_src/tutorials/tutorials/18.md +++ b/docs/_src/tutorials/tutorials/18.md @@ -8,6 +8,8 @@ id: "tutorial18md" ---> # Generative Pseudo Labeling for Domain Adaptation of Dense Retrievals +[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepset-ai/haystack/blob/master/tutorials/Tutorial18_GPL.ipynb) + #### Note: Adapted to Haystack from Nils Riemers' original [notebook](https://colab.research.google.com/gist/jamescalam/d2c888775c87f9882bb7c379a96adbc8/gpl-domain-adaptation.ipynb#scrollTo=183ff7ab) The NLP models we use every day were trained on a corpus of data that reflects the world from the past. In the meantime, we've experienced world-changing events, like the COVID pandemics, and we'd like our models to know about them. Training a model from scratch is tedious work but what if we could just update the models with new data? Generative Pseudo Labeling comes to the rescue. @@ -37,6 +39,15 @@ If we search again with the updated model, we get the search results we would ex - 95.14 Polio is transmitted via contaminated water or food - 94.13 HIV is transmitted via sex or sharing needles +### Prepare the Environment + +#### Colab: Enable the GPU runtime +Make sure you enable the GPU runtime to experience decent speed in this tutorial. +**Runtime -> Change Runtime type -> Hardware accelerator -> GPU** + + + + ```python !nvidia-smi diff --git a/tutorials/Tutorial18_GPL.ipynb b/tutorials/Tutorial18_GPL.ipynb index 28fba5adab..9634dca7fb 100644 --- a/tutorials/Tutorial18_GPL.ipynb +++ b/tutorials/Tutorial18_GPL.ipynb @@ -4,6 +4,8 @@ "cell_type": "markdown", "source": [ "# Generative Pseudo Labeling for Domain Adaptation of Dense Retrievals\n", + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepset-ai/haystack/blob/master/tutorials/Tutorial18_GPL.ipynb)\n", + "\n", "#### Note: Adapted to Haystack from Nils Riemers' original [notebook](https://colab.research.google.com/gist/jamescalam/d2c888775c87f9882bb7c379a96adbc8/gpl-domain-adaptation.ipynb#scrollTo=183ff7ab)\n", "\n", "The NLP models we use every day were trained on a corpus of data that reflects the world from the past. In the meantime, we've experienced world-changing events, like the COVID pandemics, and we'd like our models to know about them. Training a model from scratch is tedious work but what if we could just update the models with new data? Generative Pseudo Labeling comes to the rescue.\n", @@ -31,7 +33,15 @@ "- 97.70\tCorona is transmitted via the air\n", "- 96.71\tEbola is transmitted via direct contact with blood\n", "- 95.14\tPolio is transmitted via contaminated water or food\n", - "- 94.13\tHIV is transmitted via sex or sharing needles" + "- 94.13\tHIV is transmitted via sex or sharing needles\n", + "\n", + "### Prepare the Environment\n", + "\n", + "#### Colab: Enable the GPU runtime\n", + "Make sure you enable the GPU runtime to experience decent speed in this tutorial.\n", + "**Runtime -> Change Runtime type -> Hardware accelerator -> GPU**\n", + "\n", + "\n" ], "metadata": { "collapsed": false,