Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
167 commits
Select commit Hold shift + click to select a range
11f59ce
add Templatic Augmentation NB to tutorials
ArshaanNazir Jul 20, 2023
1eade28
Merge remote-tracking branch 'origin/main' into gh-pages
ArshaanNazir Jul 20, 2023
f412df9
Merge branch 'main' into gh-pages
ArshaanNazir Aug 3, 2023
1f9e7fb
support for running hf text-generation models
alytarik Aug 16, 2023
b896e43
support for running hf text-generation models
alytarik Aug 17, 2023
6a7d2c2
Merge branch 'feature/text-generation-hf-models' of https://github.co…
alytarik Aug 17, 2023
923fdca
bugfix
alytarik Aug 17, 2023
574f929
updated docs and Tutorials
Prikshit7766 Aug 18, 2023
ef31c2c
Merge branch 'main' into gh-pages
ArshaanNazir Aug 18, 2023
4d2884a
Merge pull request #718 from JohnSnowLabs/website-update
ArshaanNazir Aug 18, 2023
cd28282
Merge branch 'release/1.3.0' into feature/text-generation-hf-models
alytarik Aug 21, 2023
8f9ff0c
remove hardcoded value
alytarik Aug 21, 2023
48fb975
fix for summarization
alytarik Aug 22, 2023
1444f93
fix for summarization
alytarik Aug 22, 2023
a5b8f86
fix for summarization
alytarik Aug 22, 2023
327df94
Merge branch 'feature/text-generation-hf-models' of https://github.co…
alytarik Aug 22, 2023
57762cc
add notebook
alytarik Aug 23, 2023
d1da16c
rename notebooks
alytarik Aug 23, 2023
ef5e36a
add new notebook to website
alytarik Aug 23, 2023
78f1a5c
add hf llm tests
alytarik Aug 23, 2023
e95dbfb
fix urls
alytarik Aug 24, 2023
1e46516
fix urls
alytarik Aug 24, 2023
bc5adc5
Merge branch 'gh-pages' into docs/fix-notebook-urls
alytarik Aug 24, 2023
c815e14
task(dataset/Bigbench): Added AbstractUnderstanding and DisambiguationQA
RakshitKhajuria Aug 24, 2023
b97e9cb
task(dataset): Added LogiQA
RakshitKhajuria Aug 24, 2023
d6d9d5a
dataset: added casual-judgement-test and disfl-qa-test
Prikshit7766 Aug 24, 2023
fd39a49
dataset: added asdiv-test
Prikshit7766 Aug 24, 2023
76c6645
datasource.py: added dataset path
Prikshit7766 Aug 24, 2023
1f5faa5
helpers.py : prompt added (asdiv, causaljudgment, disflqa)
Prikshit7766 Aug 24, 2023
ac3647c
task(datasource.py): Dataset added to path
RakshitKhajuria Aug 24, 2023
adae708
task(helpers.py): Prompts added
RakshitKhajuria Aug 24, 2023
470362d
fix: path
RakshitKhajuria Aug 24, 2023
6f3751b
resolved: augmentation issue
chakravarthik27 Aug 25, 2023
5bd6128
task(restructure):BBQ data
RakshitKhajuria Aug 25, 2023
c23c216
dataset: added BBQ-test-tiny
Prikshit7766 Aug 25, 2023
4a5aac3
Merge pull request #723 from JohnSnowLabs/docs/fix-notebook-urls
ArshaanNazir Aug 25, 2023
0df8977
notebook: BBQ_dataset.ipynb
Prikshit7766 Aug 25, 2023
12933dd
updated Healthcare_NER_Model_Evaluation_with_LangTest.ipynb
Prikshit7766 Aug 25, 2023
b14c843
transform\__init__.py: Removing samples with no transformation
Prikshit7766 Aug 26, 2023
ce7220e
Add Log for Removing Untransformed Samples
Prikshit7766 Aug 26, 2023
bbc802d
updated the warning and logging level
Prikshit7766 Aug 26, 2023
530b495
test(test_harness.py): Added test
Prikshit7766 Aug 26, 2023
a054942
Merge pull request #711 from JohnSnowLabs/feature/text-generation-hf-…
ArshaanNazir Aug 27, 2023
02affe9
fix gastroenterology data
ArshaanNazir Aug 28, 2023
a8c2fcd
update clinical prompt
ArshaanNazir Aug 28, 2023
9680ee7
Task(dataset): Added ASDiv tiny version
RakshitKhajuria Aug 28, 2023
1b07925
Task(dataset): Added casual-judgement and disfl-qa tiny version
Prikshit7766 Aug 28, 2023
e6965e7
Task(dataset): Added DisambiguationQA tiny version
RakshitKhajuria Aug 28, 2023
9fafe1a
fix: path asdiv
RakshitKhajuria Aug 28, 2023
c83f917
Rename ASDiv-test-tiny.jsonl to asdiv-test-tiny.jsonl
Prikshit7766 Aug 28, 2023
354e9ff
Chore(notebook): Added asdiv dataset nb
RakshitKhajuria Aug 28, 2023
f62332a
Chore(notebook): Added LogiQA dataset nb
RakshitKhajuria Aug 28, 2023
9e8d35f
Merge branch 'dataset-lm-evaluation-library' of https://github.com/Jo…
RakshitKhajuria Aug 28, 2023
51887d3
update default config
ArshaanNazir Aug 28, 2023
f734772
rename: abstract narrative understanding
RakshitKhajuria Aug 28, 2023
9e8013b
rename: abstract narrative understanding
RakshitKhajuria Aug 28, 2023
0e9dcb8
added Bigbench_dataset notebooks
Prikshit7766 Aug 28, 2023
f85676e
updated colab links
Prikshit7766 Aug 28, 2023
594a92f
Rename DisflQA and Causal-judgment
Prikshit7766 Aug 28, 2023
75fb13b
rename: abstract narrative understanding
RakshitKhajuria Aug 28, 2023
a48c5a5
setup.py: added dataset path
Prikshit7766 Aug 28, 2023
d8184b0
fixed the white space issue
Prikshit7766 Aug 28, 2023
f804fba
fix linting
ArshaanNazir Aug 29, 2023
5902f6d
add political compass test
alytarik Aug 29, 2023
68ed622
Merge branch 'release/1.4.0' into feature/political-compass-test
alytarik Aug 29, 2023
af07645
add sample for llm answers
alytarik Aug 29, 2023
277a406
add new sample type
alytarik Aug 29, 2023
409d2e3
add political compass test
alytarik Aug 29, 2023
50a4920
add political compass prompt
alytarik Aug 29, 2023
9dc9ae7
add run function to llm answer sample
alytarik Aug 29, 2023
555113a
fix formatting
Prikshit7766 Aug 29, 2023
a06569c
Merge pull request #731 from JohnSnowLabs/fix/clinical_tests
ArshaanNazir Aug 29, 2023
60096ab
augmentation: update export_mode as inplace
Prikshit7766 Aug 29, 2023
bc5b214
update test_augmentation
Prikshit7766 Aug 29, 2023
1134da4
fix swap_entities
ArshaanNazir Aug 30, 2023
57b2e1a
fix linting
ArshaanNazir Aug 30, 2023
5f0e3fb
Merge pull request #724 from JohnSnowLabs/dataset-lm-evaluation-library
ArshaanNazir Aug 31, 2023
e13717e
Merge pull request #725 from JohnSnowLabs/improve-bbq-data
ArshaanNazir Aug 31, 2023
ed44f20
Add blogs
ArshaanNazir Aug 31, 2023
1401feb
update Readme
ArshaanNazir Aug 31, 2023
73658a2
Update blog section
ArshaanNazir Aug 31, 2023
25f6563
fix readme
ArshaanNazir Aug 31, 2023
b09602d
Merge pull request #735 from JohnSnowLabs/chore/add_blogs
ArshaanNazir Aug 31, 2023
58bb60c
fix import in NB
ArshaanNazir Aug 31, 2023
ed38fc9
Merge pull request #726 from JohnSnowLabs/blog-post
ArshaanNazir Aug 31, 2023
3fb022f
add deafault config for disinformation test
Prikshit7766 Aug 31, 2023
740fe88
add DisinformationSample to samples
Prikshit7766 Aug 31, 2023
0c813bc
add prompt for disinformation test
Prikshit7766 Aug 31, 2023
6bb1c00
update datasource
Prikshit7766 Aug 31, 2023
307e564
update modelhandler for disinformation test
Prikshit7766 Aug 31, 2023
fd599ca
langtest.py:added columns names and config path
Prikshit7766 Aug 31, 2023
cd7c8a5
added DisinformationTestFactory
Prikshit7766 Aug 31, 2023
8344aa6
added NarrativeReiteration and NarrativeWedging datasets
Prikshit7766 Aug 31, 2023
564a3cd
add political task
alytarik Aug 31, 2023
2a08e64
add political model handler
alytarik Aug 31, 2023
6f8b7f6
add run method
alytarik Aug 31, 2023
3dc094f
Remove NarrativeReiteration dataset
Prikshit7766 Aug 31, 2023
431502f
update modelhandler
Prikshit7766 Aug 31, 2023
05c90c6
rename: task name
Prikshit7766 Aug 31, 2023
6ca0a0c
Bug: fix errors in testcases
chakravarthik27 Aug 31, 2023
b86680a
Merge pull request #734 from JohnSnowLabs/bug/augmentation-output-dif…
ArshaanNazir Aug 31, 2023
33e8e33
Added disinformation_test notebook
Prikshit7766 Aug 31, 2023
f087212
conflict resolved
Prikshit7766 Aug 31, 2023
bbdc761
Merge pull request #737 from JohnSnowLabs/Feature/disinformation_test
ArshaanNazir Aug 31, 2023
9a72afb
report for political
alytarik Aug 31, 2023
7d4b5a6
scoring for political
alytarik Aug 31, 2023
4f0e0c7
remove commetns
alytarik Aug 31, 2023
f81bc61
add political compass plot
alytarik Aug 31, 2023
b80968a
change political prompt
alytarik Aug 31, 2023
80cc34c
add dependency
alytarik Aug 31, 2023
c3eeef8
Merge branch 'release/1.4.0' into feature/political-compass-test
alytarik Aug 31, 2023
46811f1
added filter_unique_samples function
Prikshit7766 Aug 31, 2023
e1b9627
political config
alytarik Aug 31, 2023
f088369
linting
alytarik Aug 31, 2023
41b65d3
add matplotlib to dependencies
alytarik Aug 31, 2023
0a4b64b
update dependencies
ArshaanNazir Aug 31, 2023
038e822
update lock file
ArshaanNazir Sep 1, 2023
e930f2c
utils.py: added docstring
Prikshit7766 Sep 1, 2023
530d80b
Merge pull request #738 from JohnSnowLabs/feature/political-compass-test
ArshaanNazir Sep 1, 2023
742052a
Merge pull request #732 from JohnSnowLabs/ensure-uniqueness-of-senten…
ArshaanNazir Sep 1, 2023
20d529f
update clinical NB
ArshaanNazir Sep 1, 2023
2938158
fix typo
alytarik Sep 1, 2023
4b261de
fix import PromptTemplate
Prikshit7766 Sep 1, 2023
01ab6fd
add _check_langchain_package
Prikshit7766 Sep 1, 2023
a955472
minor fix
Prikshit7766 Sep 1, 2023
5946433
fix political plot showing incorrect results
alytarik Sep 1, 2023
c94ca38
chore(readme): Added new datasets
RakshitKhajuria Sep 1, 2023
06633b9
updated disinformation config
Prikshit7766 Sep 1, 2023
6e8dac9
chore(readme): updated links
RakshitKhajuria Sep 1, 2023
92bb9b2
rename test_type in DisinformationTestFactory
Prikshit7766 Sep 1, 2023
77c1c47
Disinformation_Test notebook
Prikshit7766 Sep 1, 2023
6d096a1
Merge pull request #742 from JohnSnowLabs/fix/political-plot-fix
ArshaanNazir Sep 1, 2023
9780839
Merge pull request #740 from JohnSnowLabs/fix-import
ArshaanNazir Sep 1, 2023
7a6a7b3
chore(website): Updated new datasets to website
RakshitKhajuria Sep 1, 2023
52b5296
fix typo
Prikshit7766 Sep 1, 2023
d6582ca
chore(website): Updated one_liner.md
Prikshit7766 Sep 1, 2023
cd09857
chore(website): notebook link and updated task.md
Prikshit7766 Sep 1, 2023
9a57595
chore(website): added Disinformation test
Prikshit7766 Sep 1, 2023
5ce3a1c
Merge branch 'chore/website_nb_updates' of https://github.com/JohnSno…
Prikshit7766 Sep 1, 2023
5924f6e
Merge pull request #743 from JohnSnowLabs/rename_disinformation_test_…
ArshaanNazir Sep 2, 2023
beaae20
update blog link
ArshaanNazir Sep 2, 2023
4c4ea79
chore(website): Added new datset nbs to tutorials
RakshitKhajuria Sep 3, 2023
e2664b0
notebook: updated Harness parameters table
Prikshit7766 Sep 3, 2023
31a1f7d
chore(website): updated harness.md
Prikshit7766 Sep 3, 2023
617b060
update torch dependecny
ArshaanNazir Sep 3, 2023
47e2098
chore(website): updated data.md
Prikshit7766 Sep 3, 2023
0398867
pull from gh-pages
Prikshit7766 Sep 3, 2023
cd6fc5e
chore(website): Updated Data page
RakshitKhajuria Sep 3, 2023
6efeaee
notebook: minor fix
Prikshit7766 Sep 3, 2023
9b2e2d6
Merge pull request #739 from JohnSnowLabs/chore/website_nb_updates
ArshaanNazir Sep 4, 2023
3ea53f6
add notebook
alytarik Sep 4, 2023
488b22e
fix pip install in notebook
alytarik Sep 4, 2023
118ee09
add political main page
alytarik Sep 4, 2023
8e28dd3
add political nb to notebooks list
alytarik Sep 4, 2023
912dcc3
add political compass test to website
alytarik Sep 4, 2023
d40ef96
reorder tutorials list
alytarik Sep 4, 2023
1a149af
transform\utils.py: updated filter_unique_samples
Prikshit7766 Sep 4, 2023
95d5ce3
add one liner
alytarik Sep 4, 2023
1fb52d4
transform\__init__.py: updated warning_message
Prikshit7766 Sep 4, 2023
7a4aeac
add datpolitical data source and description
alytarik Sep 4, 2023
2b4e31f
add political task to tasks page
alytarik Sep 4, 2023
c5ef180
Merge pull request #745 from JohnSnowLabs/docs/political-nb-and-website
ArshaanNazir Sep 4, 2023
d489413
website(data.md): minor fix
Prikshit7766 Sep 4, 2023
2d5774c
Update README.md
ArshaanNazir Sep 4, 2023
45bea8a
conflict resolved
Prikshit7766 Sep 4, 2023
74a1e6e
Merge pull request #746 from JohnSnowLabs/updated-filter_unique_samples
ArshaanNazir Sep 4, 2023
06935a8
Merge pull request #747 from JohnSnowLabs/fix/blog-section
ArshaanNazir Sep 4, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,10 +84,31 @@ Langtest comes with different datasets to test your models, covering a wide rang
| [**BBQ**](https://arxiv.org/abs/2110.08193) | Evaluate how your model responds to questions in the presence of social biases against protected classes across various social dimensions. Assess biases in model outputs with both under-informative and adequately informative contexts, aiming to promote fair and unbiased question-answering models. | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/llm_notebooks/dataset-notebooks/BBQ_dataset.ipynb) |
|[**XSum**](https://aclanthology.org/D18-1206/) | Evaluate your model's ability to generate concise and informative summaries for long articles with the XSum dataset. It consists of articles and corresponding one-sentence summaries, offering a valuable benchmark for text summarization models. | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/llm_notebooks/dataset-notebooks/XSum_dataset.ipynb)|
|[**Real Toxicity Prompts**](https://aclanthology.org/2020.findings-emnlp.301/) | Evaluate your model's accuracy in recognizing and handling toxic language with the Real Toxicity Prompts dataset. It contains real-world prompts from online platforms, ensuring robustness in NLP models to maintain safe environments. | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/llm_notebooks/OpenAI_QA_Testing_Notebook.ipynb)
|[**LogiQA**](https://aclanthology.org/2020.findings-emnlp.301/) | Evaluate your model's accuracy on Machine Reading Comprehension with Logical Reasoning questions. | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/llm_notebooks/dataset-notebooks/LogiQA_dataset.ipynb)
|[**BigBench Abstract narrative understanding**](https://arxiv.org/abs/2206.04615) | Evaluate your model's performance in selecting the most relevant proverb for a given narrative. | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/llm_notebooks/dataset-notebooks/Bigbench_dataset.ipynb)
|[**BigBench Causal Judgment**](https://arxiv.org/abs/2206.04615) | Evaluate your model's performance in measuring the ability to reason about cause and effect. | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/llm_notebooks/dataset-notebooks/Bigbench_dataset.ipynb)
|[**BigBench DisambiguationQA**](https://arxiv.org/abs/2206.04615) | Evaluate your model's performance on determining the interpretation of sentences containing ambiguous pronoun references.| [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](hhttps://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/llm_notebooks/dataset-notebooks/Bigbench_dataset.ipynb)
|[**BigBench DisflQA**](https://arxiv.org/abs/2206.04615) | Evaluate your model's performance in picking the correct answer span from the context given the disfluent question. | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/llm_notebooks/dataset-notebooks/Bigbench_dataset.ipynb)
|[**ASDiv**](https://arxiv.org/abs/2106.15772) | Evaluate your model's ability answer questions based on Math Word Problems. | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/llm_notebooks/dataset-notebooks/ASDiv_dataset.ipynb)

> **Note**
> For usage and documentation, head over to [langtest.org](https://langtest.org/docs/pages/docs/data#question-answering)


## Blog

You can check out the following langtest articles:

| Blog | Description |
|------|-------------|
| [**Automatically Testing for Demographic Bias in Clinical Treatment Plans Generated by Large Language Models**](https://medium.com/john-snow-labs/automatically-testing-for-demographic-bias-in-clinical-treatment-plans-generated-by-large-language-ffcf358b6092) | Helps in understanding and testing demographic bias in clinical treatment plans generated by LLM. |
| [**LangTest: Unveiling & Fixing Biases with End-to-End NLP Pipelines**](https://www.johnsnowlabs.com/langtest-unveiling-fixing-biases-with-end-to-end-nlp-pipelines/) | The end-to-end language pipeline in LangTest empowers NLP practitioners to tackle biases in language models with a comprehensive, data-driven, and iterative approach. |
| [**Beyond Accuracy: Robustness Testing of Named Entity Recognition Models with LangTest**](To be Published Soon) | While accuracy is undoubtedly crucial, robustness testing takes natural language processing (NLP) models evaluation to the next level by ensuring that models can perform reliably and consistently across a wide array of real-world conditions. |
| [**Elevate Your NLP Models with Automated Data Augmentation for Enhanced Performance**](To be Published Soon) | In this article, we discuss how automated data augmentation may supercharge your NLP models and improve their performance and how we do that using LangTest. |

> **Note**
> To checkout all blogs, head over to [Blogs](https://www.johnsnowlabs.com/responsible-ai-blog/)

## Contributing

We welcome all sorts of contributions:
Expand Down
1 change: 1 addition & 0 deletions conda/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ requirements:
- python >=3.7,<3.12
- numpy
- pandas
- matplotlib
- scikit-learn
- transformers <=4.28.1
- pytorch
Expand Down
Loading