Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
11f59ce
add Templatic Augmentation NB to tutorials
ArshaanNazir Jul 20, 2023
1eade28
Merge remote-tracking branch 'origin/main' into gh-pages
ArshaanNazir Jul 20, 2023
f412df9
Merge branch 'main' into gh-pages
ArshaanNazir Aug 3, 2023
574f929
updated docs and Tutorials
Prikshit7766 Aug 18, 2023
ef31c2c
Merge branch 'main' into gh-pages
ArshaanNazir Aug 18, 2023
4d2884a
Merge pull request #718 from JohnSnowLabs/website-update
ArshaanNazir Aug 18, 2023
e95dbfb
fix urls
alytarik Aug 24, 2023
bc5adc5
Merge branch 'gh-pages' into docs/fix-notebook-urls
alytarik Aug 24, 2023
4a5aac3
Merge pull request #723 from JohnSnowLabs/docs/fix-notebook-urls
ArshaanNazir Aug 25, 2023
20d529f
update clinical NB
ArshaanNazir Sep 1, 2023
2938158
fix typo
alytarik Sep 1, 2023
c94ca38
chore(readme): Added new datasets
RakshitKhajuria Sep 1, 2023
6e8dac9
chore(readme): updated links
RakshitKhajuria Sep 1, 2023
7a6a7b3
chore(website): Updated new datasets to website
RakshitKhajuria Sep 1, 2023
d6582ca
chore(website): Updated one_liner.md
Prikshit7766 Sep 1, 2023
cd09857
chore(website): notebook link and updated task.md
Prikshit7766 Sep 1, 2023
9a57595
chore(website): added Disinformation test
Prikshit7766 Sep 1, 2023
5ce3a1c
Merge branch 'chore/website_nb_updates' of https://github.com/JohnSno…
Prikshit7766 Sep 1, 2023
beaae20
update blog link
ArshaanNazir Sep 2, 2023
4c4ea79
chore(website): Added new datset nbs to tutorials
RakshitKhajuria Sep 3, 2023
e2664b0
notebook: updated Harness parameters table
Prikshit7766 Sep 3, 2023
31a1f7d
chore(website): updated harness.md
Prikshit7766 Sep 3, 2023
617b060
update torch dependecny
ArshaanNazir Sep 3, 2023
47e2098
chore(website): updated data.md
Prikshit7766 Sep 3, 2023
0398867
pull from gh-pages
Prikshit7766 Sep 3, 2023
cd6fc5e
chore(website): Updated Data page
RakshitKhajuria Sep 3, 2023
6efeaee
notebook: minor fix
Prikshit7766 Sep 3, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,12 @@ Langtest comes with different datasets to test your models, covering a wide rang
| [**BBQ**](https://arxiv.org/abs/2110.08193) | Evaluate how your model responds to questions in the presence of social biases against protected classes across various social dimensions. Assess biases in model outputs with both under-informative and adequately informative contexts, aiming to promote fair and unbiased question-answering models. | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/llm_notebooks/dataset-notebooks/BBQ_dataset.ipynb) |
|[**XSum**](https://aclanthology.org/D18-1206/) | Evaluate your model's ability to generate concise and informative summaries for long articles with the XSum dataset. It consists of articles and corresponding one-sentence summaries, offering a valuable benchmark for text summarization models. | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/llm_notebooks/dataset-notebooks/XSum_dataset.ipynb)|
|[**Real Toxicity Prompts**](https://aclanthology.org/2020.findings-emnlp.301/) | Evaluate your model's accuracy in recognizing and handling toxic language with the Real Toxicity Prompts dataset. It contains real-world prompts from online platforms, ensuring robustness in NLP models to maintain safe environments. | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/llm_notebooks/OpenAI_QA_Testing_Notebook.ipynb)
|[**LogiQA**](https://aclanthology.org/2020.findings-emnlp.301/) | Evaluate your model's accuracy on Machine Reading Comprehension with Logical Reasoning questions. | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/llm_notebooks/dataset-notebooks/LogiQA_dataset.ipynb)
|[**BigBench Abstract narrative understanding**](https://arxiv.org/abs/2206.04615) | Evaluate your model's performance in selecting the most relevant proverb for a given narrative. | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/llm_notebooks/dataset-notebooks/Bigbench_dataset.ipynb)
|[**BigBench Causal Judgment**](https://arxiv.org/abs/2206.04615) | Evaluate your model's performance in measuring the ability to reason about cause and effect. | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/llm_notebooks/dataset-notebooks/Bigbench_dataset.ipynb)
|[**BigBench DisambiguationQA**](https://arxiv.org/abs/2206.04615) | Evaluate your model's performance on determining the interpretation of sentences containing ambiguous pronoun references.| [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](hhttps://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/llm_notebooks/dataset-notebooks/Bigbench_dataset.ipynb)
|[**BigBench DisflQA**](https://arxiv.org/abs/2206.04615) | Evaluate your model's performance in picking the correct answer span from the context given the disfluent question. | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/llm_notebooks/dataset-notebooks/Bigbench_dataset.ipynb)
|[**ASDiv**](https://arxiv.org/abs/2106.15772) | Evaluate your model's ability answer questions based on Math Word Problems. | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/llm_notebooks/dataset-notebooks/ASDiv_dataset.ipynb)

> **Note**
> For usage and documentation, head over to [langtest.org](https://langtest.org/docs/pages/docs/data#question-answering)
Expand All @@ -95,7 +101,7 @@ You can check out the following langtest blogs:

| Blog | Description |
|------|-------------|
| [**Automatically Testing for Demographic Bias in Clinical Treatment Plans Generated by Large Language Models**](https://medium.com/p/ffcf358b6092/edit) | Helps in understanding and testing demographic bias in clinical treatment plans generated by LLM. |
| [**Automatically Testing for Demographic Bias in Clinical Treatment Plans Generated by Large Language Models**](https://medium.com/john-snow-labs/automatically-testing-for-demographic-bias-in-clinical-treatment-plans-generated-by-large-language-ffcf358b6092) | Helps in understanding and testing demographic bias in clinical treatment plans generated by LLM. |
| [**LangTest: Unveiling & Fixing Biases with End-to-End NLP Pipelines**](https://www.johnsnowlabs.com/langtest-unveiling-fixing-biases-with-end-to-end-nlp-pipelines/) | The end-to-end language pipeline in LangTest empowers NLP practitioners to tackle biases in language models with a comprehensive, data-driven, and iterative approach. |
| [**Beyond Accuracy: Robustness Testing of Named Entity Recognition Models with LangTest**](https://medium.com/@prikshit7766/fb046ace7eb9) | While accuracy is undoubtedly crucial, robustness testing takes natural language processing (NLP) models evaluation to the next level by ensuring that models can perform reliably and consistently across a wide array of real-world conditions. |

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1313,11 +1313,11 @@
"\n",
"\n",
"\n",
"| Parameter | Description |\n",
"| ------------- | ----------- |\n",
"| **task** | Task for which the model is to be evaluated (text-classification or ner) |\n",
"| **model** | Specifies the model(s) to be evaluated. Can be a dictionary or a list of dictionaries. Each dictionary should contain 'model' and 'hub' keys. If a path is specified, the dictionary must contain 'model' and 'hub' keys. |\n",
"| **data** | The data to be used for evaluation. A dictionary providing flexibility and options for data sources. It should include the following keys: <ul><li>data_source (mandatory): The source of the data.</li><li>subset (optional): The subset of the data.</li><li>feature_column (optional): The column containing the features.</li><li>target_column (optional): The column containing the target labels.</li><li>split (optional): The data split to be used.</li></ul> |\n",
"| Parameter | Description | \n",
"| - | - | \n",
"|**task** |Task for which the model is to be evaluated (text-classification or ner)|\n",
"| **model** | Specifies the model(s) to be evaluated. This parameter can be provided as either a dictionary or a list of dictionaries. Each dictionary should contain the following keys: <ul><li>model (mandatory): \tPipelineModel or path to a saved model or pretrained pipeline/model from hub.</li><li>hub (mandatory): Hub (library) to use in back-end for loading model from public models hub or from path</li></ul>|\n",
"| **data** | The data to be used for evaluation. A dictionary providing flexibility and options for data sources. It should include the following keys: <ul><li>data_source (mandatory): The source of the data.</li><li>subset (optional): The subset of the data.</li><li>feature_column (optional): The column containing the features.</li><li>target_column (optional): The column containing the target labels.</li><li>split (optional): The data split to be used.</li><li>source (optional): Set to 'huggingface' when loading Hugging Face dataset.</li></ul> |\n",
"| **config** | Configuration for the tests to be performed, specified in the form of a YAML file. |\n",
"\n",
"\n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -112,11 +112,11 @@
"\n",
"\n",
"\n",
"| Parameter | Description |\n",
"| ------------- | ----------- |\n",
"| **task** | Task for which the model is to be evaluated (text-classification or ner) |\n",
"| **model** | Specifies the model(s) to be evaluated. Can be a dictionary or a list of dictionaries. Each dictionary should contain 'model' and 'hub' keys. If a path is specified, the dictionary must contain 'model' and 'hub' keys. |\n",
"| **data** | The data to be used for evaluation. A dictionary providing flexibility and options for data sources. It should include the following keys: <ul><li>data_source (mandatory): The source of the data.</li><li>subset (optional): The subset of the data.</li><li>feature_column (optional): The column containing the features.</li><li>target_column (optional): The column containing the target labels.</li><li>split (optional): The data split to be used.</li></ul> |\n",
"| Parameter | Description | \n",
"| - | - | \n",
"|**task** |Task for which the model is to be evaluated (text-classification or ner)|\n",
"| **model** | Specifies the model(s) to be evaluated. This parameter can be provided as either a dictionary or a list of dictionaries. Each dictionary should contain the following keys: <ul><li>model (mandatory): \tPipelineModel or path to a saved model or pretrained pipeline/model from hub.</li><li>hub (mandatory): Hub (library) to use in back-end for loading model from public models hub or from path</li></ul>|\n",
"| **data** | The data to be used for evaluation. A dictionary providing flexibility and options for data sources. It should include the following keys: <ul><li>data_source (mandatory): The source of the data.</li><li>subset (optional): The subset of the data.</li><li>feature_column (optional): The column containing the features.</li><li>target_column (optional): The column containing the target labels.</li><li>split (optional): The data split to be used.</li><li>source (optional): Set to 'huggingface' when loading Hugging Face dataset.</li></ul> |\n",
"| **config** | Configuration for the tests to be performed, specified in the form of a YAML file. |\n",
"\n",
"\n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -110,11 +110,11 @@
"\n",
"\n",
"\n",
"| Parameter | Description |\n",
"| ------------- | ----------- |\n",
"| **task** | Task for which the model is to be evaluated (text-classification or ner) |\n",
"| **model** | Specifies the model(s) to be evaluated. Can be a dictionary or a list of dictionaries. Each dictionary should contain 'model' and 'hub' keys. If a path is specified, the dictionary must contain 'model' and 'hub' keys. |\n",
"| **data** | The data to be used for evaluation. A dictionary providing flexibility and options for data sources. It should include the following keys: <ul><li>data_source (mandatory): The source of the data.</li><li>subset (optional): The subset of the data.</li><li>feature_column (optional): The column containing the features.</li><li>target_column (optional): The column containing the target labels.</li><li>split (optional): The data split to be used.</li></ul> |\n",
"| Parameter | Description | \n",
"| - | - | \n",
"|**task** |Task for which the model is to be evaluated (text-classification or ner)|\n",
"| **model** | Specifies the model(s) to be evaluated. This parameter can be provided as either a dictionary or a list of dictionaries. Each dictionary should contain the following keys: <ul><li>model (mandatory): \tPipelineModel or path to a saved model or pretrained pipeline/model from hub.</li><li>hub (mandatory): Hub (library) to use in back-end for loading model from public models hub or from path</li></ul>|\n",
"| **data** | The data to be used for evaluation. A dictionary providing flexibility and options for data sources. It should include the following keys: <ul><li>data_source (mandatory): The source of the data.</li><li>subset (optional): The subset of the data.</li><li>feature_column (optional): The column containing the features.</li><li>target_column (optional): The column containing the target labels.</li><li>split (optional): The data split to be used.</li><li>source (optional): Set to 'huggingface' when loading Hugging Face dataset.</li></ul> |\n",
"| **config** | Configuration for the tests to be performed, specified in the form of a YAML file. |\n",
"\n",
"\n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -110,11 +110,11 @@
"\n",
"\n",
"\n",
"| Parameter | Description |\n",
"| ------------- | ----------- |\n",
"| **task** | Task for which the model is to be evaluated (text-classification or ner) |\n",
"| **model** | Specifies the model(s) to be evaluated. Can be a dictionary or a list of dictionaries. Each dictionary should contain 'model' and 'hub' keys. If a path is specified, the dictionary must contain 'model' and 'hub' keys. |\n",
"| **data** | The data to be used for evaluation. A dictionary providing flexibility and options for data sources. It should include the following keys: <ul><li>data_source (mandatory): The source of the data.</li><li>subset (optional): The subset of the data.</li><li>feature_column (optional): The column containing the features.</li><li>target_column (optional): The column containing the target labels.</li><li>split (optional): The data split to be used.</li></ul> |\n",
"| Parameter | Description | \n",
"| - | - | \n",
"|**task** |Task for which the model is to be evaluated (text-classification or ner)|\n",
"| **model** | Specifies the model(s) to be evaluated. This parameter can be provided as either a dictionary or a list of dictionaries. Each dictionary should contain the following keys: <ul><li>model (mandatory): \tPipelineModel or path to a saved model or pretrained pipeline/model from hub.</li><li>hub (mandatory): Hub (library) to use in back-end for loading model from public models hub or from path</li></ul>|\n",
"| **data** | The data to be used for evaluation. A dictionary providing flexibility and options for data sources. It should include the following keys: <ul><li>data_source (mandatory): The source of the data.</li><li>subset (optional): The subset of the data.</li><li>feature_column (optional): The column containing the features.</li><li>target_column (optional): The column containing the target labels.</li><li>split (optional): The data split to be used.</li><li>source (optional): Set to 'huggingface' when loading Hugging Face dataset.</li></ul> |\n",
"| **config** | Configuration for the tests to be performed, specified in the form of a YAML file. |\n",
"\n",
"\n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -90,11 +90,11 @@
"\n",
"\n",
"\n",
"| Parameter | Description |\n",
"| ------------- | ----------- |\n",
"| **task** | Task for which the model is to be evaluated (text-classification or ner) |\n",
"| **model** | Specifies the model(s) to be evaluated. Can be a dictionary or a list of dictionaries. Each dictionary should contain 'model' and 'hub' keys. If a path is specified, the dictionary must contain 'model' and 'hub' keys. |\n",
"| **data** | The data to be used for evaluation. A dictionary providing flexibility and options for data sources. It should include the following keys: <ul><li>data_source (mandatory): The source of the data.</li><li>subset (optional): The subset of the data.</li><li>feature_column (optional): The column containing the features.</li><li>target_column (optional): The column containing the target labels.</li><li>split (optional): The data split to be used.</li></ul> |\n",
"| Parameter | Description | \n",
"| - | - | \n",
"|**task** |Task for which the model is to be evaluated (text-classification or ner)|\n",
"| **model** | Specifies the model(s) to be evaluated. This parameter can be provided as either a dictionary or a list of dictionaries. Each dictionary should contain the following keys: <ul><li>model (mandatory): \tPipelineModel or path to a saved model or pretrained pipeline/model from hub.</li><li>hub (mandatory): Hub (library) to use in back-end for loading model from public models hub or from path</li></ul>|\n",
"| **data** | The data to be used for evaluation. A dictionary providing flexibility and options for data sources. It should include the following keys: <ul><li>data_source (mandatory): The source of the data.</li><li>subset (optional): The subset of the data.</li><li>feature_column (optional): The column containing the features.</li><li>target_column (optional): The column containing the target labels.</li><li>split (optional): The data split to be used.</li><li>source (optional): Set to 'huggingface' when loading Hugging Face dataset.</li></ul> |\n",
"| **config** | Configuration for the tests to be performed, specified in the form of a YAML file. |\n",
"\n",
"\n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -98,8 +98,8 @@
"| Parameter | Description | \n",
"| - | - | \n",
"|**task** |Task for which the model is to be evaluated (question-answering or summarization)|\n",
"| **model** | Specifies the model(s) to be evaluated. Can be a dictionary or a list of dictionaries. Each dictionary should contain 'model' and 'hub' keys. If a path is specified, the dictionary must contain 'model' and 'hub' keys.|\n",
"| **data** | The data to be used for evaluation. A dictionary providing flexibility and options for data sources. It should include the following keys: <ul><li>data_source (mandatory): The source of the data.</li><li>subset (optional): The subset of the data.</li><li>feature_column (optional): The column containing the features.</li><li>target_column (optional): The column containing the target labels.</li><li>split (optional): The data split to be used.</li></ul> |\n",
"| **model** | Specifies the model(s) to be evaluated. This parameter can be provided as either a dictionary or a list of dictionaries. Each dictionary should contain the following keys: <ul><li>model (mandatory): \tPipelineModel or path to a saved model or pretrained pipeline/model from hub.</li><li>hub (mandatory): Hub (library) to use in back-end for loading model from public models hub or from path</li></ul>|\n",
"| **data** | The data to be used for evaluation. A dictionary providing flexibility and options for data sources. It should include the following keys: <ul><li>data_source (mandatory): The source of the data.</li><li>subset (optional): The subset of the data.</li><li>feature_column (optional): The column containing the features.</li><li>target_column (optional): The column containing the target labels.</li><li>split (optional): The data split to be used.</li><li>source (optional): Set to 'huggingface' when loading Hugging Face dataset.</li></ul> |\n",
"| **config** | Configuration for the tests to be performed, specified in the form of a YAML file. |\n",
"\n",
"<br/>\n",
Expand Down
Loading