Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/tutorials.yml
Original file line number Diff line number Diff line change
Expand Up @@ -71,4 +71,4 @@ jobs:
token: ${{ secrets.GITHUB_TOKEN }}

- name: Run tutorials
run: ./.github/utils/tutorials.sh ${{ env.pythonLocation }} "${{ steps.diff.outputs.added_modified }}" "Tutorial2_ Tutorial9_ Tutorial13_"
run: ./.github/utils/tutorials.sh ${{ env.pythonLocation }} "${{ steps.diff.outputs.added_modified }}" "Tutorial2_ Tutorial9_ Tutorial13_ Tutorial18_"
11 changes: 11 additions & 0 deletions docs/_src/tutorials/tutorials/18.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ id: "tutorial18md"
--->

# Generative Pseudo Labeling for Domain Adaptation of Dense Retrievals
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepset-ai/haystack/blob/master/tutorials/Tutorial18_GPL.ipynb)

#### Note: Adapted to Haystack from Nils Riemers' original [notebook](https://colab.research.google.com/gist/jamescalam/d2c888775c87f9882bb7c379a96adbc8/gpl-domain-adaptation.ipynb#scrollTo=183ff7ab)

The NLP models we use every day were trained on a corpus of data that reflects the world from the past. In the meantime, we've experienced world-changing events, like the COVID pandemics, and we'd like our models to know about them. Training a model from scratch is tedious work but what if we could just update the models with new data? Generative Pseudo Labeling comes to the rescue.
Expand Down Expand Up @@ -37,6 +39,15 @@ If we search again with the updated model, we get the search results we would ex
- 95.14 Polio is transmitted via contaminated water or food
- 94.13 HIV is transmitted via sex or sharing needles

### Prepare the Environment

#### Colab: Enable the GPU runtime
Make sure you enable the GPU runtime to experience decent speed in this tutorial.
**Runtime -> Change Runtime type -> Hardware accelerator -> GPU**

<img src="https://raw.githubusercontent.com/deepset-ai/haystack/master/docs/img/colab_gpu_runtime.jpg">



```python
!nvidia-smi
Expand Down
12 changes: 11 additions & 1 deletion tutorials/Tutorial18_GPL.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@
"cell_type": "markdown",
"source": [
"# Generative Pseudo Labeling for Domain Adaptation of Dense Retrievals\n",
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepset-ai/haystack/blob/master/tutorials/Tutorial18_GPL.ipynb)\n",
"\n",
"#### Note: Adapted to Haystack from Nils Riemers' original [notebook](https://colab.research.google.com/gist/jamescalam/d2c888775c87f9882bb7c379a96adbc8/gpl-domain-adaptation.ipynb#scrollTo=183ff7ab)\n",
"\n",
"The NLP models we use every day were trained on a corpus of data that reflects the world from the past. In the meantime, we've experienced world-changing events, like the COVID pandemics, and we'd like our models to know about them. Training a model from scratch is tedious work but what if we could just update the models with new data? Generative Pseudo Labeling comes to the rescue.\n",
Expand Down Expand Up @@ -31,7 +33,15 @@
"- 97.70\tCorona is transmitted via the air\n",
"- 96.71\tEbola is transmitted via direct contact with blood\n",
"- 95.14\tPolio is transmitted via contaminated water or food\n",
"- 94.13\tHIV is transmitted via sex or sharing needles"
"- 94.13\tHIV is transmitted via sex or sharing needles\n",
"\n",
"### Prepare the Environment\n",
"\n",
"#### Colab: Enable the GPU runtime\n",
"Make sure you enable the GPU runtime to experience decent speed in this tutorial.\n",
"**Runtime -> Change Runtime type -> Hardware accelerator -> GPU**\n",
"\n",
"<img src=\"https://raw.githubusercontent.com/deepset-ai/haystack/master/docs/img/colab_gpu_runtime.jpg\">\n"
],
"metadata": {
"collapsed": false,
Expand Down