From 48d1ebbda9040aa8f4b8fab41a1c02038647f5fe Mon Sep 17 00:00:00 2001 From: Vladimir Blagojevic Date: Mon, 27 Jun 2022 10:41:22 +0200 Subject: [PATCH 1/3] GPL tutorial - add GPU header and open in colab button --- tutorials/Tutorial18_GPL.ipynb | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/tutorials/Tutorial18_GPL.ipynb b/tutorials/Tutorial18_GPL.ipynb index 28fba5adab..9634dca7fb 100644 --- a/tutorials/Tutorial18_GPL.ipynb +++ b/tutorials/Tutorial18_GPL.ipynb @@ -4,6 +4,8 @@ "cell_type": "markdown", "source": [ "# Generative Pseudo Labeling for Domain Adaptation of Dense Retrievals\n", + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepset-ai/haystack/blob/master/tutorials/Tutorial18_GPL.ipynb)\n", + "\n", "#### Note: Adapted to Haystack from Nils Riemers' original [notebook](https://colab.research.google.com/gist/jamescalam/d2c888775c87f9882bb7c379a96adbc8/gpl-domain-adaptation.ipynb#scrollTo=183ff7ab)\n", "\n", "The NLP models we use every day were trained on a corpus of data that reflects the world from the past. In the meantime, we've experienced world-changing events, like the COVID pandemics, and we'd like our models to know about them. Training a model from scratch is tedious work but what if we could just update the models with new data? Generative Pseudo Labeling comes to the rescue.\n", @@ -31,7 +33,15 @@ "- 97.70\tCorona is transmitted via the air\n", "- 96.71\tEbola is transmitted via direct contact with blood\n", "- 95.14\tPolio is transmitted via contaminated water or food\n", - "- 94.13\tHIV is transmitted via sex or sharing needles" + "- 94.13\tHIV is transmitted via sex or sharing needles\n", + "\n", + "### Prepare the Environment\n", + "\n", + "#### Colab: Enable the GPU runtime\n", + "Make sure you enable the GPU runtime to experience decent speed in this tutorial.\n", + "**Runtime -> Change Runtime type -> Hardware accelerator -> GPU**\n", + "\n", + "\n" ], "metadata": { "collapsed": false, From 71e5436a15b427ea388cbbb0a1d7b7f6f77323a4 Mon Sep 17 00:00:00 2001 From: Vladimir Blagojevic Date: Mon, 4 Jul 2022 11:09:33 +0200 Subject: [PATCH 2/3] Add GPL tutorial to exclusion list --- .github/workflows/tutorials.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/tutorials.yml b/.github/workflows/tutorials.yml index ac2f452a42..b4d62a4c86 100644 --- a/.github/workflows/tutorials.yml +++ b/.github/workflows/tutorials.yml @@ -71,4 +71,4 @@ jobs: token: ${{ secrets.GITHUB_TOKEN }} - name: Run tutorials - run: ./.github/utils/tutorials.sh ${{ env.pythonLocation }} "${{ steps.diff.outputs.added_modified }}" "Tutorial2_ Tutorial9_ Tutorial13_" + run: ./.github/utils/tutorials.sh ${{ env.pythonLocation }} "${{ steps.diff.outputs.added_modified }}" "Tutorial2_ Tutorial9_ Tutorial13_ Tutorial18_" From 2e3280890b1168af9a4bb8fcec6e7b778940f9fe Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Mon, 4 Jul 2022 09:12:04 +0000 Subject: [PATCH 3/3] Update Documentation & Code Style --- docs/_src/tutorials/tutorials/18.md | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/docs/_src/tutorials/tutorials/18.md b/docs/_src/tutorials/tutorials/18.md index 34f97c1d3f..47605b7f21 100644 --- a/docs/_src/tutorials/tutorials/18.md +++ b/docs/_src/tutorials/tutorials/18.md @@ -8,6 +8,8 @@ id: "tutorial18md" ---> # Generative Pseudo Labeling for Domain Adaptation of Dense Retrievals +[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepset-ai/haystack/blob/master/tutorials/Tutorial18_GPL.ipynb) + #### Note: Adapted to Haystack from Nils Riemers' original [notebook](https://colab.research.google.com/gist/jamescalam/d2c888775c87f9882bb7c379a96adbc8/gpl-domain-adaptation.ipynb#scrollTo=183ff7ab) The NLP models we use every day were trained on a corpus of data that reflects the world from the past. In the meantime, we've experienced world-changing events, like the COVID pandemics, and we'd like our models to know about them. Training a model from scratch is tedious work but what if we could just update the models with new data? Generative Pseudo Labeling comes to the rescue. @@ -37,6 +39,15 @@ If we search again with the updated model, we get the search results we would ex - 95.14 Polio is transmitted via contaminated water or food - 94.13 HIV is transmitted via sex or sharing needles +### Prepare the Environment + +#### Colab: Enable the GPU runtime +Make sure you enable the GPU runtime to experience decent speed in this tutorial. +**Runtime -> Change Runtime type -> Hardware accelerator -> GPU** + + + + ```python !nvidia-smi