Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
86 commits
Select commit Hold shift + click to select a range
61b57de
Add NER model handler and update PretrainedModelForNER class
chakravarthik27 Apr 11, 2024
b3d16d7
Add default_llm_chat_prompt to helpers.py and Added NER support for C…
chakravarthik27 Apr 11, 2024
30c20ef
Fix role_extract regex pattern in PretrainedModelForNER class and lin…
chakravarthik27 Apr 11, 2024
a46fa39
Refactor predict method in PretrainedModelForNER class for better per…
chakravarthik27 Apr 11, 2024
85a3b80
Enhancements for user prompt handling for multi-dataset testing in Ha…
chakravarthik27 Apr 12, 2024
c9055cf
Update user prompt handling in Harness for multi-dataset testing
chakravarthik27 Apr 12, 2024
75453d0
Merge pull request #1010 from JohnSnowLabs/Enchaments/implement-the-p…
chakravarthik27 Apr 15, 2024
a32b3d8
Refactor PretrainedModelForNER class for better performance and error…
chakravarthik27 Apr 15, 2024
1c34bd9
Refactor LLMChain and PromptTemplate imports in llm_modelhandler.py
chakravarthik27 Apr 15, 2024
fc20e04
Refactor PretrainedModelForNER class for better performance and error…
chakravarthik27 Apr 15, 2024
01c5473
fix linting issues
chakravarthik27 Apr 15, 2024
82bbf4a
New: created the leaderboard and summary classes in utils to track th…
chakravarthik27 Apr 17, 2024
ab450d8
Add benchmarking functionality to Harness class
chakravarthik27 Apr 17, 2024
6801583
Add benchmarking functionality to Harness class
chakravarthik27 Apr 19, 2024
fc8855a
Refactor sorting logic in Leaderboard class to sort by model and aver…
chakravarthik27 Apr 19, 2024
b6b131c
Refactor benchmarking logic in Harness class and Leaderboard class
chakravarthik27 Apr 19, 2024
d64a6a5
Update data source and target column in TestNERDataset class
chakravarthik27 Apr 22, 2024
c9e011e
benchmarking logic in Harness and Leaderboard classes, and update lea…
chakravarthik27 Apr 23, 2024
a0de4f6
Refactor benchmarking logic in Harness and Leaderboard classes, and u…
chakravarthik27 Apr 23, 2024
214f44b
Refactor dataset_name column logic in Harness class to handle single …
chakravarthik27 Apr 23, 2024
e0f1bfd
Fix comparison operator in SpeedTestSample class
chakravarthik27 Apr 24, 2024
b7769bd
Implement Augmenter class for data augmentation in langtest
chakravarthik27 Apr 24, 2024
b50dcfd
Update data source and target column in TestNERDataset
chakravarthik27 Apr 24, 2024
08d88fd
Refactor Augmenter class to remove unused code and improve performance
chakravarthik27 Apr 24, 2024
042d3d2
updated the nb
chakravarthik27 Apr 25, 2024
74589d3
Refactor Augmenter class to improve code organization and readability
chakravarthik27 Apr 25, 2024
702b2b7
update the nb
chakravarthik27 Apr 27, 2024
60e3f1a
Merge pull request #1015 from JohnSnowLabs/bug_fix/performance_tests
chakravarthik27 Apr 27, 2024
4adf0f4
Merge remote-tracking branch 'origin/release/2.2.0' into feature/data…
chakravarthik27 Apr 27, 2024
8b308d9
Implemented new, inplace and extend functionality style to genereate …
chakravarthik27 Apr 28, 2024
ca7c75f
Merge pull request #1009 from JohnSnowLabs/Enhancements/nertask-suppo…
chakravarthik27 Apr 28, 2024
652009d
Improved Augmenter class to handle task name validation and improve d…
chakravarthik27 Apr 28, 2024
fb35378
Merge remote-tracking branch 'origin/release/2.2.0' into feature/data…
chakravarthik27 Apr 29, 2024
0501298
Add langchain-openai package to pyproject.toml
chakravarthik27 Apr 29, 2024
3d0b1c2
Add langchain-openai package and boto3 to pyproject.toml
chakravarthik27 Apr 29, 2024
71f2183
Basic Implementation of prompt techniques handles from config.
chakravarthik27 May 1, 2024
4160fea
Refactor field order in MessageType and Conversion classes in prompts.py
chakravarthik27 May 1, 2024
38c4749
lint fix
chakravarthik27 May 1, 2024
06c2ef6
Refactor field order in MessageType and Conversion classes in prompts.py
chakravarthik27 May 3, 2024
46acac1
Refactor RandomAge class to improve code readability and maintainability
chakravarthik27 May 3, 2024
6e6216c
fixed lint
chakravarthik27 May 3, 2024
33c9933
Merge pull request #1020 from JohnSnowLabs/fix/bug-in-randomize_age-test
chakravarthik27 May 6, 2024
21ae26b
improved to get prompt based on the style like chat or instruct
chakravarthik27 May 6, 2024
649a5dc
Handle the prompts for mulitple datasets.
chakravarthik27 May 6, 2024
73f16e6
Refactor prompt manager to handle default prompt configuration
chakravarthik27 May 7, 2024
db35333
Integrated with model_handler and prompt manager.
chakravarthik27 May 7, 2024
ca0a4c1
error handling, when `prompt_config` is not available in config
chakravarthik27 May 7, 2024
e1cb02a
improve the prompt handling for instruct models
chakravarthik27 May 8, 2024
cb0c2ed
Refactor prompt manager to handle default prompt configuration
chakravarthik27 May 8, 2024
f5d2c91
improved in the lm studio
chakravarthik27 May 8, 2024
92309a6
Merge pull request #1016 from JohnSnowLabs/feature/data-augmentation-…
chakravarthik27 May 8, 2024
5e536ee
support orderless field in MessageType object
chakravarthik27 May 8, 2024
9cd4e1c
Refactor data handling in Harness class to support multiple datasets
chakravarthik27 May 8, 2024
3b0712c
fix the prompt issues and nb update
chakravarthik27 May 9, 2024
d180a27
Merge pull request #1012 from JohnSnowLabs/Improvements/load-and-save…
chakravarthik27 May 10, 2024
36a4426
refactor: improve test case handling in Harness class
chakravarthik27 May 10, 2024
3396282
Refactor test case handling in Harness class
chakravarthik27 May 10, 2024
3fc0c59
improved the `import_edited_testcases()` functionality. and updated d…
chakravarthik27 May 10, 2024
5d8196d
refactor: remove "robustness" and "bias" categories from categories_c…
chakravarthik27 May 10, 2024
cd10c70
Merge remote-tracking branch 'origin/release/2.2.0' into feature/impl…
chakravarthik27 May 11, 2024
e7e94a2
fix lint and format issue
chakravarthik27 May 11, 2024
89dd5f3
Merge pull request #1022 from JohnSnowLabs/enhancements/improving-the…
chakravarthik27 May 11, 2024
e2d08fc
Merge pull request #1018 from JohnSnowLabs/feature/implement-the-prom…
chakravarthik27 May 11, 2024
760d9a0
Notebook for LLM evaluation in ner task
chakravarthik27 May 12, 2024
876e39c
Data_Augmenter Nb
chakravarthik27 May 12, 2024
4cc11d0
Added the MultiPrompt_MultiDataset NB
chakravarthik27 May 12, 2024
dd9e9bb
chore: Update Langtest_Cli_Eval_Command.ipynb and Benchmarking Report NB
chakravarthik27 May 13, 2024
04d67e4
Refactor Summary class to update summary dataframe and handle file path
chakravarthik27 May 13, 2024
34dfdf9
Refactor `Augmenter` class to `DataAugmenter` for improved code organ…
chakravarthik27 May 13, 2024
c7b29f7
resolved: lint issues.
chakravarthik27 May 13, 2024
f9fcf7e
Merge pull request #1024 from JohnSnowLabs/Improvements/load-and-save…
chakravarthik27 May 13, 2024
e641ee9
Merge pull request #1025 from JohnSnowLabs/feature/data-augmentation-…
chakravarthik27 May 13, 2024
43be5ab
Merge remote-tracking branch 'origin/release/2.2.0' into chore/final_…
chakravarthik27 May 13, 2024
cf34e05
Updated the Augmeter to DataAugmeter
chakravarthik27 May 13, 2024
95f12d3
updated the description in nb
chakravarthik27 May 13, 2024
214ac2f
updated: ner task on llm
chakravarthik27 May 14, 2024
5481109
Refactor leaderboard class to support ranking by different criteria
chakravarthik27 May 14, 2024
ce7a06a
Merge pull request #1027 from JohnSnowLabs/Improvements/load-and-save…
chakravarthik27 May 14, 2024
ac75314
Merge remote-tracking branch 'origin/release/2.2.0' into chore/final_…
chakravarthik27 May 14, 2024
39a0116
Updated: Benchmarking NB
chakravarthik27 May 15, 2024
fc00494
Refactor pagination component to include additional release notes links
chakravarthik27 May 15, 2024
3688c5c
Add Fewshot Model Evaluation and Evaluating NER in LLMs tutorials
chakravarthik27 May 15, 2024
9e84ae9
Merge pull request #1023 from JohnSnowLabs/chore/final_website_updates
chakravarthik27 May 15, 2024
aa76c7f
updated: langtest version in pip
chakravarthik27 May 15, 2024
dd5e083
minor format fix
ArshaanNazir May 15, 2024
c0642ed
Merge pull request #1028 from JohnSnowLabs/chore/final_website_updates
ArshaanNazir May 15, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2,426 changes: 2,426 additions & 0 deletions demo/tutorials/benchmarks/Benchmarking_with_Harness.ipynb

Large diffs are not rendered by default.

19 changes: 2 additions & 17 deletions demo/tutorials/benchmarks/Langtest_Cli_Eval_Command.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -46,24 +46,9 @@
"id": "OPPUwGvzyAoV",
"outputId": "670c68e7-83fe-418c-8e3e-094590f5b7f2"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m19.7/19.7 MB\u001b[0m \u001b[31m73.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25h\u001b[33mWARNING: langtest 2.1.0rc2 does not provide the extra 'all'\u001b[0m\u001b[33m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m13.0/13.0 MB\u001b[0m \u001b[31m99.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m3.1/3.1 MB\u001b[0m \u001b[31m105.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m345.4/345.4 kB\u001b[0m \u001b[31m41.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25h\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
"google-colab 1.0.0 requires pandas==1.5.3, but you have pandas 2.2.1 which is incompatible.\u001b[0m\u001b[31m\n",
"\u001b[0m"
]
}
],
"outputs": [],
"source": [
"!pip install -q langtest[all]==2.1.0rc2"
"!pip install -q langtest[all]"
]
},
{
Expand Down
1,392 changes: 1,392 additions & 0 deletions demo/tutorials/llm_notebooks/Fewshot_QA_Notebook.ipynb

Large diffs are not rendered by default.

1,017 changes: 1,017 additions & 0 deletions demo/tutorials/llm_notebooks/NER Casual LLM.ipynb

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions demo/tutorials/misc/Data_Augmenter_Notebook.ipynb

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions demo/tutorials/misc/MultiPrompt_MultiDataset.ipynb

Large diffs are not rendered by default.

4,198 changes: 4,197 additions & 1 deletion demo/tutorials/misc/PerformanceTest_Notebook.ipynb

Large diffs are not rendered by default.

21 changes: 12 additions & 9 deletions docs/_includes/docs-langtest-pagination.html
Original file line number Diff line number Diff line change
@@ -1,12 +1,15 @@
<ul class="pagination owl-carousel pagination_big">
<li><a href="release_notes_2_1_0">2.1.0</a></li>
<li><a href="release_notes_2_0_0">2.0.0</a></li>
<li><a href="release_notes_1_10_0">1.10.0</a></li>
<li><a href="release_notes_1_9_0">1.9.0</a></li>
<li><a href="release_notes_1_8_0">1.8.0</a></li>
<li><a href="release_notes_1_7_0">1.7.0</a></li>
<li><a href="release_notes_1_6_0">1.6.0</a></li>
<li><a href="release_notes_1_5_0">1.5.0</a></li>
<li><a href="release_notes_1_4_0">1.4.0</a></li>
<li><a href="release_notes_1_3_0">1.3.0</a></li>
<li><a href="release_notes_1_2_0">1.2.0</a></li>
<li><a href="release_notes_1_1_0">1.1.0</a></li>
<li><a href="release_notes_1_0_0">1.0.0</a></li>
</ul>
<li><a href="release_notes_1_7_0">1.7.0</a></li>
<li><a href="release_notes_1_6_0">1.6.0</a></li>
<li><a href="release_notes_1_5_0">1.5.0</a></li>
<li><a href="release_notes_1_4_0">1.4.0</a></li>
<li><a href="release_notes_1_3_0">1.3.0</a></li>
<li><a href="release_notes_1_2_0">1.2.0</a></li>
<!-- <li><a href="release_notes_1_1_0">1.1.0</a></li>
<li><a href="release_notes_1_0_0">1.0.0</a></li> -->
</ul>
482 changes: 276 additions & 206 deletions docs/pages/docs/langtest_versions/latest_release.md

Large diffs are not rendered by default.

327 changes: 327 additions & 0 deletions docs/pages/docs/langtest_versions/release_notes_2_1_0.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,327 @@
---
layout: docs
header: true
seotitle: LangTest - Deliver Safe and Effective Language Models | John Snow Labs
title: LangTest Release Notes
permalink: /docs/pages/docs/langtest_versions/release_notes_2_1_0
key: docs-release-notes
modify_date: 2024-04-02
---

<div class="h3-box" markdown="1">

## 2.1.0

## 📢 Highlights

John Snow Labs is thrilled to announce the release of LangTest 2.1.0! This update brings exciting new features and improvements designed to streamline your language model testing workflows and provide deeper insights.

- **🔗 Enhanced API-based LLM Integration:** LangTest now supports testing API-based Large Language Models (LLMs). This allows you to seamlessly integrate diverse LLM models with LangTest and conduct performance evaluations across various datasets.

- **📂 Expanded File Format Support:** LangTest 2.1.0 introduces support for additional file formats, further increasing its flexibility in handling different data structures used in LLM testing.

- **📊 Improved Multi-Dataset Handling:** We've made significant improvements in how LangTest manages multiple datasets. This simplifies workflows and allows for more efficient testing across a wider range of data sources.

- **🖥️ New Benchmarking Commands**: LangTest now boasts a set of new commands specifically designed for benchmarking language models. These commands provide a structured approach to evaluating model performance and comparing results across different models and datasets.

</div><div class="h3-box" markdown="1">

## 🔥 Key Enhancements:

### **🔗 Streamlined Integration and Enhanced Functionality for API-Based Large Language Models:**
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/llm_notebooks/Generic_API-Based_Model_Testing_Demo.ipynb)

This feature empowers you to seamlessly integrate virtually any language model hosted on an external API platform. Whether you prefer OpenAI, Hugging Face, or even custom vLLM solutions, LangTest now adapts to your workflow. `input_processor` and `output_parser` functions are not required for openai api compatible server.

#### Key Features:

- **Effortless API Integration:** Connect to any API system by specifying the API URL, parameters, and a custom function for parsing the returned results. This intuitive approach allows you to leverage your preferred language models with minimal configuration.

- **Customizable Parameters:** Define the URL, parameters specific to your chosen API, and a parsing function tailored to extract the desired output. This level of control ensures compatibility with diverse API structures.

- **Unparalleled Flexibility:** Generic API Support removes platform limitations. Now, you can seamlessly integrate language models from various sources, including OpenAI, Hugging Face, and even custom vLLM solutions hosted on private platforms.

#### How it Works:

**Parameters:**
Define the `input_processer` function for creating a payload and the `output_parser` function is used to extract the output from the response.

```python
GOOGLE_API_KEY = "<YOUR API KEY>"
model_url = f"https://generativelanguage.googleapis.com/v1beta/models/gemini-pro:generateContent?key={GOOGLE_API_KEY}"

# headers
headers = {
"Content-Type": "application/json",
}

# function to create a payload
def input_processor(content):
return {"contents": [
{
"role": "user",
"parts": [
{
"text": content
}
]
}
]}

# function to extract output from model response
def output_parser(response):
try:
return response['candidates'][0]['content']['parts'][0]['text']
except:
return ""
```

To take advantage of this feature, users can utilize the following setup code:

```python
from langtest import Harness

# Initialize Harness with API parameters
harness = Harness(
task="question-answering",
model={
"model": {
"url": url,
"headers": headers,
"input_processor": input_processor,
"output_parser": output_parser,
},
"hub": "web",
},
data={
"data_source": "OpenBookQA",
"split": "test-tiny",
}
)
# Generate, Run and get Report
harness.generate().run().report()
```
![image](https://github.com/JohnSnowLabs/langtest/assets/23481244/9754c506-e715-4e2c-8b9d-dfd98f0695e5)


### 📂 Streamlined Data Handling and Evaluation

This feature streamlines your testing workflows by enabling LangTest to process a wider range of file formats directly.

#### Key Features:

- **Effortless File Format Handling:** LangTest now seamlessly ingests data from various file formats, including pickles (.pkl) in addition to previously supported formats. Simply provide the data source path in your harness configuration, and LangTest takes care of the rest.

- **Simplified Data Source Management**: LangTest intelligently recognizes the file extension and automatically selects the appropriate processing method. This eliminates the need for manual configuration, saving you time and effort.

- **Enhanced Maintainability**: The underlying code structure is optimized for flexibility. Adding support for new file formats in the future requires minimal effort, ensuring LangTest stays compatible with evolving data storage practices.

#### How it works:

```python
from langtest import Harness

harness = Harness(
task="question-answering",
model={
"model": "http://localhost:1234/v1/chat/completions",
"hub": "lm-studio",
},
data={
"data_source": "path/to/file.pkl", #
},
)
# generate, run and report
harness.generate().run().report()
```
### 📊 Multi-Dataset Handling and Evaluation
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/misc/Multiple_dataset.ipynb)

This feature empowers you to efficiently benchmark your language models across a wider range of datasets.

#### Key Features:

- **Effortless Multi-Dataset Testing:** LangTest now seamlessly integrates and executes tests on multiple datasets within a single harness configuration. This streamlined approach eliminates the need for repetitive setups, saving you time and resources.

- **Enhanced Fairness Evaluation**: By testing models across diverse datasets, LangTest helps identify and mitigate potential biases. This ensures your models perform fairly and accurately on a broader spectrum of data, promoting ethical and responsible AI development.

- **Robust Accuracy Assessment:** Multi-dataset support empowers you to conduct more rigorous accuracy testing. By evaluating models on various datasets, you gain a deeper understanding of their strengths and weaknesses across different data distributions. This comprehensive analysis strengthens your confidence in the model's real-world performance.

#### How it works:

Initiate the Harness class
```python
harness = Harness(
task="question-answering",
model={"model": "gpt-3.5-turbo-instruct", "hub": "openai"},
data=[
{"data_source": "NQ-open", "split": "test-tiny",},
{"data_source": "MedQA", "split": "test-tiny"},
{"data_source": "LogiQA", "split": "test-tiny"},
],
)
```
Configure the accuracy tests in Harness class
```python
harness.configure(
{
"tests": {
"defaults": {"min_pass_rate": 0.65},

"accuracy": {
"llm_eval": {"min_score": 0.60},
"min_rouge1_score": {"min_score": 0.60},
"min_rouge2_score": {"min_score": 0.60},
"min_rougeL_score": {"min_score": 0.60},
"min_rougeLsum_score": {"min_score": 0.60},
},
}
}
)
```
harness.generate() generates testcases, .run() executes them, and .report() compiles results.
```python
harness.generate().run().report()
```
![image](https://github.com/JohnSnowLabs/langtest/assets/23481244/0d48be2f-e5bc-4971-b0a1-2756a10d3f24)

### 🖥️ Streamlined Evaluation Workflows with Enhanced CLI Commands
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/benchmarks/Langtest_Cli_Eval_Command.ipynb)

LangTest's evaluation capabilities, focusing on report management and leaderboards. These enhancements empower you to:

- **Streamlined Reporting and Tracking:** Effortlessly save and load detailed evaluation reports directly from the command line using `langtest eval`, enabling efficient performance tracking and comparative analysis over time, with manual file review options in the `~/.langtest` or `./.langtest` folder.

- **Enhanced Leaderboards:** Gain valuable insights with the new langtest show-leaderboard command. This command displays existing leaderboards, providing a centralized view of ranked model performance across evaluations.

- **Average Model Ranking:** Leaderboard now include the average ranking for each evaluated model. This metric provides a comprehensive understanding of model performance across various datasets and tests.

### How it works:

First, create the `parameter.json` or `parameter.yaml` in the working directory

**JSON Format**
```json
{
"task": "question-answering",
"model": {
"model": "google/flan-t5-base",
"hub": "huggingface"
},
"data": [
{
"data_source": "MedMCQA"
},
{
"data_source": "PubMedQA"
},
{
"data_source": "MMLU"
},
{
"data_source": "MedQA"
}
],
"config": {
"model_parameters": {
"max_tokens": 64,
"device": 0,
"task": "text2text-generation"
},
"tests": {
"defaults": {
"min_pass_rate": 0.70
},
"robustness": {
"add_typo": {
"min_pass_rate": 0.70
}
}
}
}
}
```
**Yaml Format**
```yaml
task: question-answering
model:
model: google/flan-t5-base
hub: huggingface
data:
- data_source: MedMCQA
- data_source: PubMedQA
- data_source: MMLU
- data_source: MedQA
config:
model_parameters:
max_tokens: 64
device: 0
task: text2text-generation
tests:
defaults:
min_pass_rate: 0.70
robustness:
add_typo:
min_pass_rate: 0.7

```
And open the terminal or cmd in your system
```bash
langtest eval --model <your model name or endpoint> \
--hub <model hub like hugging face, lm-studio, web ...> \
-c < your configuration file like parameter.json or parameter.yaml>
```
Finally, we can know the leaderboard and rank of the model.
![image](https://github.com/JohnSnowLabs/langtest/assets/23481244/a405d0c6-5ef1-4efb-924c-0ba8667ebe43)

----

To visualize the leaderboard anytime using the CLI command
```bash
langtest show-leaderboard
```
![image](https://github.com/JohnSnowLabs/langtest/assets/23481244/f357c173-e4b1-4dc8-86ad-98438046b89c)

## 📒 New Notebooks

{:.table2}
| Notebooks | Colab Link |
|--------------------|-------------|
| Generic API-based Model Testing | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/llm_notebooks/Generic_API-Based_Model_Testing_Demo.ipynb)|
| Multi-Dataset | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/misc/Multiple_dataset.ipynb) |
| Langtest Eval Cli Command | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/benchmarks/Langtest_Cli_Eval_Command.ipynb) |


## 🐛 Fixes

- Fixed multi-dataset support for accuracy task [#998]
- Fixed bugs in langtest package [#1003][#1004]


## ⚡ Enhancements
- Improved the error handling in Harness run method [#990]
- Websites Updates [#1001]
- Updated new version for dependencies [#992]
- Improved the data augmentation for Question-Answering task [#991]

## What's Changed

* Feautre/integration with web api by @chakravarthik27 in https://github.com/JohnSnowLabs/langtest/pull/986
* Refactor TestFactory class to handle exceptions in async tests by @chakravarthik27 in https://github.com/JohnSnowLabs/langtest/pull/990
* data augmentation support for question-answering task by @chakravarthik27 in https://github.com/JohnSnowLabs/langtest/pull/991
* Updated dependencies by @chakravarthik27 in https://github.com/JohnSnowLabs/langtest/pull/992
* Fix/implement the multiple dataset support for accuracy tests by @chakravarthik27 in https://github.com/JohnSnowLabs/langtest/pull/998
* Feature/add support for other file formats by @chakravarthik27 in https://github.com/JohnSnowLabs/langtest/pull/993
* Bug Fix: Generated results are none by @chakravarthik27 in https://github.com/JohnSnowLabs/langtest/pull/1000
* Feature/implement load & save for benchmark reports by @chakravarthik27 in https://github.com/JohnSnowLabs/langtest/pull/999
* Fix/bug fixes langtest 2 1 0 rc1 by @chakravarthik27 in https://github.com/JohnSnowLabs/langtest/pull/1003
* website updates by @chakravarthik27 in https://github.com/JohnSnowLabs/langtest/pull/1001
* Fix/bug fixes langtest 2 1 0 rc1 by @chakravarthik27 in https://github.com/JohnSnowLabs/langtest/pull/1004
* Release/2.0.1 by @chakravarthik27 in https://github.com/JohnSnowLabs/langtest/pull/1005


**Full Changelog**: https://github.com/JohnSnowLabs/langtest/compare/2.0.0...2.1.0

## ⚒️ Previous Versions
</div>
{%- include docs-langtest-pagination.html -%}
Loading