PacificAI · ArshaanNazir · Aug 18, 2023 · Aug 17, 2023 · Aug 17, 2023 · Aug 17, 2023
diff --git a/demo/tutorials/end-to-end-notebooks/HuggingFace_Real_World_Notebook.ipynb b/demo/tutorials/end-to-end-notebooks/HuggingFace_Real_World_Notebook.ipynb
@@ -111,13 +111,14 @@
         "<br/>\n",
         "\n",
         "\n",
-        "| Parameter  | Description |  \n",
-        "| - | - | \n",
-        "|**task**     |Task for which the model is to be evaluated (text-classification or ner)|\n",
-        "|**model**     |PipelineModel or path to a saved model or pretrained pipeline/model from hub.\n",
-        "|**data**       |Path to the data that is to be used for evaluation. Can be .csv or .conll file in the CoNLL format \n",
-        "|**config**     |Configuration for the tests to be performed, specified in form of a YAML file.\n",
-        "|**hub**       |model hub to load from the path. Required if model param is passed as path.|\n",
+        "\n",
+        "| Parameter     | Description |\n",
+        "| ------------- | ----------- |\n",
+        "| **task**      | Task for which the model is to be evaluated (text-classification or ner) |\n",
+        "| **model**     | Specifies the model(s) to be evaluated. Can be a dictionary or a list of dictionaries. Each dictionary should contain 'model' and 'hub' keys. If a path is specified, the dictionary must contain 'model' and 'hub' keys. |\n",
+        "| **data**      | The data to be used for evaluation. A dictionary providing flexibility and options for data sources. It should include the following keys: <ul><li>data_source (mandatory): The source of the data.</li><li>subset (optional): The subset of the data.</li><li>feature_column (optional): The column containing the features.</li><li>target_column (optional): The column containing the target labels.</li><li>split (optional): The data split to be used.</li></ul> |\n",
+        "| **config**    | Configuration for the tests to be performed, specified in the form of a YAML file. |\n",
+        "\n",
         "\n",
         "<br/>\n",
         "<br/>"
@@ -549,7 +550,8 @@
       },
       "outputs": [],
       "source": [
-        "h = Harness(task=\"ner\", model=\"trained_model\", hub=\"huggingface\", data=\"sample.conll\")"
+        "\n",
+        "h = Harness(task=\"ner\", model={\"model\": \"trained_model\", \"hub\": \"huggingface\"}, data={\"data_source\" :\"sample.conll\"})"
       ]
     },
     {

diff --git a/demo/tutorials/end-to-end-notebooks/JohnSnowLabs_RealWorld_Custom_Pipeline_Notebook.ipynb b/demo/tutorials/end-to-end-notebooks/JohnSnowLabs_RealWorld_Custom_Pipeline_Notebook.ipynb
@@ -109,13 +109,14 @@
         "<br/>\n",
         "\n",
         "\n",
-        "| Parameter  | Description |\n",
-        "| - | - |\n",
-        "|**task**     |Task for which the model is to be evaluated (text-classification or ner)|\n",
-        "|**model**     |PipelineModel or path to a saved model or pretrained pipeline/model from hub.\n",
-        "|**data**       |Path to the data that is to be used for evaluation. Can be .csv or .conll file in the CoNLL format\n",
-        "|**config**     |Configuration for the tests to be performed, specified in form of a YAML file.\n",
-        "|**hub**       |model hub to load from the path. Required if model param is passed as path.|\n",
+        "\n",
+        "| Parameter     | Description |\n",
+        "| ------------- | ----------- |\n",
+        "| **task**      | Task for which the model is to be evaluated (text-classification or ner) |\n",
+        "| **model**     | Specifies the model(s) to be evaluated. Can be a dictionary or a list of dictionaries. Each dictionary should contain 'model' and 'hub' keys. If a path is specified, the dictionary must contain 'model' and 'hub' keys. |\n",
+        "| **data**      | The data to be used for evaluation. A dictionary providing flexibility and options for data sources. It should include the following keys: <ul><li>data_source (mandatory): The source of the data.</li><li>subset (optional): The subset of the data.</li><li>feature_column (optional): The column containing the features.</li><li>target_column (optional): The column containing the target labels.</li><li>split (optional): The data split to be used.</li></ul> |\n",
+        "| **config**    | Configuration for the tests to be performed, specified in the form of a YAML file. |\n",
+        "\n",
         "\n",
         "<br/>\n",
         "<br/>"
@@ -409,7 +410,7 @@
       },
       "outputs": [],
       "source": [
-        "harness = Harness(task=\"ner\", model=ner_model, data=\"sample.conll\")"
+        "harness = Harness(task=\"ner\", model={\"model\": ner_model, \"hub\": \"johnsnowlabs\"}, data={\"data_source\" :\"sample.conll\"})"
       ]
     },
     {

diff --git a/demo/tutorials/end-to-end-notebooks/JohnSnowLabs_RealWorld_Notebook.ipynb b/demo/tutorials/end-to-end-notebooks/JohnSnowLabs_RealWorld_Notebook.ipynb
@@ -109,13 +109,14 @@
         "<br/>\n",
         "\n",
         "\n",
-        "| Parameter  | Description |  \n",
-        "| - | - | \n",
-        "|**task**     |Task for which the model is to be evaluated (text-classification or ner)|\n",
-        "|**model**     |PipelineModel or path to a saved model or pretrained pipeline/model from hub.\n",
-        "|**data**       |Path to the data that is to be used for evaluation. Can be .csv or .conll file in the CoNLL format \n",
-        "|**config**     |Configuration for the tests to be performed, specified in form of a YAML file.\n",
-        "|**hub**       |model hub to load from the path. Required if model param is passed as path.|\n",
+        "\n",
+        "| Parameter     | Description |\n",
+        "| ------------- | ----------- |\n",
+        "| **task**      | Task for which the model is to be evaluated (text-classification or ner) |\n",
+        "| **model**     | Specifies the model(s) to be evaluated. Can be a dictionary or a list of dictionaries. Each dictionary should contain 'model' and 'hub' keys. If a path is specified, the dictionary must contain 'model' and 'hub' keys. |\n",
+        "| **data**      | The data to be used for evaluation. A dictionary providing flexibility and options for data sources. It should include the following keys: <ul><li>data_source (mandatory): The source of the data.</li><li>subset (optional): The subset of the data.</li><li>feature_column (optional): The column containing the features.</li><li>target_column (optional): The column containing the target labels.</li><li>split (optional): The data split to be used.</li></ul> |\n",
+        "| **config**    | Configuration for the tests to be performed, specified in the form of a YAML file. |\n",
+        "\n",
         "\n",
         "<br/>\n",
         "<br/>"
@@ -242,7 +243,7 @@
       },
       "outputs": [],
       "source": [
-        "harness = Harness(task=\"ner\", model=ner_model, data=\"sample.conll\", hub=\"johnsnowlabs\")"
+        "harness = Harness(task=\"ner\", model={\"model\": ner_model, \"hub\": \"johnsnowlabs\"}, data={\"data_source\" :\"sample.conll\"})"
       ]
     },
     {
@@ -1284,7 +1285,7 @@
         }
       ],
       "source": [
-        "harness = Harness.load(\"saved_test_configurations\",model=augmented_ner_model)"
+        "harness = Harness.load(\"saved_test_configurations\",model=augmented_ner_model,task=\"ner\")"
       ]
     },
     {

diff --git a/demo/tutorials/end-to-end-notebooks/Spacy_Real_World_Notebook.ipynb b/demo/tutorials/end-to-end-notebooks/Spacy_Real_World_Notebook.ipynb
@@ -89,13 +89,14 @@
         "<br/>\n",
         "\n",
         "\n",
-        "| Parameter  | Description |  \n",
-        "| - | - | \n",
-        "|**task**     |Task for which the model is to be evaluated (text-classification or ner)|\n",
-        "|**model**     |PipelineModel or path to a saved model or pretrained pipeline/model from hub.\n",
-        "|**data**       |Path to the data that is to be used for evaluation. Can be .csv or .conll file in the CoNLL format \n",
-        "|**config**     |Configuration for the tests to be performed, specified in form of a YAML file.\n",
-        "|**hub**       |model hub to load from the path. Required if model param is passed as path.|\n",
+        "\n",
+        "| Parameter     | Description |\n",
+        "| ------------- | ----------- |\n",
+        "| **task**      | Task for which the model is to be evaluated (text-classification or ner) |\n",
+        "| **model**     | Specifies the model(s) to be evaluated. Can be a dictionary or a list of dictionaries. Each dictionary should contain 'model' and 'hub' keys. If a path is specified, the dictionary must contain 'model' and 'hub' keys. |\n",
+        "| **data**      | The data to be used for evaluation. A dictionary providing flexibility and options for data sources. It should include the following keys: <ul><li>data_source (mandatory): The source of the data.</li><li>subset (optional): The subset of the data.</li><li>feature_column (optional): The column containing the features.</li><li>target_column (optional): The column containing the target labels.</li><li>split (optional): The data split to be used.</li></ul> |\n",
+        "| **config**    | Configuration for the tests to be performed, specified in the form of a YAML file. |\n",
+        "\n",
         "\n",
         "<br/>\n",
         "<br/>"
@@ -416,7 +417,7 @@
       },
       "outputs": [],
       "source": [
-        "h = Harness(task=\"ner\",model=spacy_model, data=\"/content/sample.conll\")"
+        "h = Harness(task=\"ner\",model={\"model\": spacy_model, \"hub\": \"spacy\"}, data={\"data_source\": \"/content/sample.conll\"})"
       ]
     },
     {

diff --git a/demo/tutorials/llm_notebooks/AI21_QA_Summarization_Testing_Notebook.ipynb b/demo/tutorials/llm_notebooks/AI21_QA_Summarization_Testing_Notebook.ipynb
@@ -98,10 +98,9 @@
         "| Parameter  | Description |  \n",
         "| - | - | \n",
         "|**task**     |Task for which the model is to be evaluated (question-answering or summarization)|\n",
-        "|**model**     |LLM model name (ex: text-davinci-002, command-xlarge-nightly etc.)|\n",
-        "|**data**       |Benchmark dataset name (ex: BoolQ-test, XSum-test etc.)|\n",
-        "|**config**     |Configuration for the tests to be performed, specified in form of a YAML file.|\n",
-        "|**hub**       | Name of the hub (ex: openai, azure-openai, ai21, cohere etc.)|\n",
+        "| **model**     | Specifies the model(s) to be evaluated. Can be a dictionary or a list of dictionaries. Each dictionary should contain 'model' and 'hub' keys. If a path is specified, the dictionary must contain 'model' and 'hub' keys.|\n",
+        "| **data**      | The data to be used for evaluation. A dictionary providing flexibility and options for data sources. It should include the following keys: <ul><li>data_source (mandatory): The source of the data.</li><li>subset (optional): The subset of the data.</li><li>feature_column (optional): The column containing the features.</li><li>target_column (optional): The column containing the target labels.</li><li>split (optional): The data split to be used.</li></ul> |\n",
+        "| **config**    | Configuration for the tests to be performed, specified in the form of a YAML file. |\n",
         "\n",
         "<br/>\n",
         "<br/>"
@@ -173,7 +172,7 @@
       },
       "outputs": [],
       "source": [
-        "harness = Harness(task=\"question-answering\", hub=\"ai21\", model=\"j2-jumbo-instruct\", data='BoolQ-test-tiny')"
+        "harness = Harness(task=\"question-answering\", model={\"model\": \"j2-jumbo-instruct\", \"hub\":\"ai21\"}, data={\"data_source\": 'BoolQ-test-tiny'})"
       ]
     },
     {
@@ -1146,7 +1145,7 @@
       },
       "outputs": [],
       "source": [
-        "harness = Harness(task=\"question-answering\", hub=\"ai21\", model=\"j2-jumbo-instruct\", data='NQ-open-test-tiny')"
+        "harness = Harness(task=\"question-answering\", model={\"model\": \"j2-jumbo-instruct\", \"hub\": \"ai21\"}, data={\"data_source\": 'NQ-open-test-tiny'})"
       ]
     },
     {
@@ -1819,7 +1818,7 @@
       "metadata": {},
       "outputs": [],
       "source": [
-        "harness = Harness(task=\"summarization\", hub=\"ai21\", model=\"j2-jumbo-instruct\", data='XSum-test-tiny')"
+        "harness = Harness(task=\"summarization\", model={\"model\": \"j2-jumbo-instruct\", \"hub\": \"ai21\"}, data={\"data_source\": 'XSum-test-tiny'})"
       ]
     },
     {
@@ -3236,7 +3235,7 @@
       "name": "python",
       "nbconvert_exporter": "python",
       "pygments_lexer": "ipython3",
-      "version": "3.11.3"
+      "version": "3.9.13"
     }
   },
   "nbformat": 4,

diff --git a/demo/tutorials/llm_notebooks/Azure_OpenAI_QA_Summarization_Testing_Notebook.ipynb b/demo/tutorials/llm_notebooks/Azure_OpenAI_QA_Summarization_Testing_Notebook.ipynb
@@ -92,10 +92,9 @@
         "| Parameter  | Description |  \n",
         "| - | - | \n",
         "|**task**     |Task for which the model is to be evaluated (question-answering or summarization)|\n",
-        "|**model**     |LLM model name (ex: text-davinci-002, command-xlarge-nightly etc.)|\n",
-        "|**data**       |Benchmark dataset name (ex: BoolQ-test, XSum-test etc.)|\n",
-        "|**config**     |Configuration for the tests to be performed, specified in form of a YAML file.|\n",
-        "|**hub**       | Name of the hub (ex: openai, azure-openai, ai21, cohere etc.)|\n",
+        "| **model**     | Specifies the model(s) to be evaluated. Can be a dictionary or a list of dictionaries. Each dictionary should contain 'model' and 'hub' keys. If a path is specified, the dictionary must contain 'model' and 'hub' keys.|\n",
+        "| **data**      | The data to be used for evaluation. A dictionary providing flexibility and options for data sources. It should include the following keys: <ul><li>data_source (mandatory): The source of the data.</li><li>subset (optional): The subset of the data.</li><li>feature_column (optional): The column containing the features.</li><li>target_column (optional): The column containing the target labels.</li><li>split (optional): The data split to be used.</li></ul> |\n",
+        "| **config**    | Configuration for the tests to be performed, specified in the form of a YAML file. |\n",
         "\n",
         "<br/>\n",
         "<br/>"
@@ -173,7 +172,7 @@
       },
       "outputs": [],
       "source": [
-        "harness = Harness(task=\"question-answering\", hub=\"azure-openai\", model=\"text-davinci-003\", data='BoolQ-test-tiny')"
+        "harness = Harness(task=\"question-answering\", model={\"model\": \"text-davinci-003\", \"hub\":\"azure-openai\"} data={\"data_source\": 'BoolQ-test-tiny'})"
       ]
     },
     {
@@ -1131,7 +1130,8 @@
       },
       "outputs": [],
       "source": [
-        "harness = Harness(task=\"question-answering\", hub=\"azure-openai\", model=\"text-davinci-003\", data='NQ-open-test-tiny')"
+        "harness = Harness(task=\"question-answering\", model={\"model\": \"text-davinci-003\",\"hub\":\"azure-openai\"} data={\"data_source\": \n",
+        "'NQ-open-test-tiny'})"
       ]
     },
     {
@@ -1806,7 +1806,8 @@
       "metadata": {},
       "outputs": [],
       "source": [
-        "harness = Harness(task='summarization',model='text-davinci-003', hub=\"azure-openai\", data='XSum-test-tiny')"
+        "harness = Harness(task='summarization',model={\"model\": 'text-davinci-003', \"hub\": \"azure-openai\"}, data={\"data_source\": \n",
+        "'XSum-test-tiny'})"
       ]
     },
     {

diff --git a/demo/tutorials/llm_notebooks/Clinical_Tests.ipynb b/demo/tutorials/llm_notebooks/Clinical_Tests.ipynb
@@ -100,12 +100,11 @@
         "\n",
         "\n",
         "| Parameter  | Description |  \n",
-        "| - | - |\n",
-        "|**task**     |Task for which the model is to be evaluated (ex: clinical-tests)|\n",
-        "|**model**     |LLM model name (ex: text-davinci-003)|\n",
-        "|**data**       |dataset name (ex: Medical-files, Gastroenterology-files, Oromaxillofacial-files)|\n",
-        "|**config**     |Configuration for the tests to be performed, specified in form of a YAML file.|\n",
-        "|**hub**       | Name of the hub (ex: openai, azure-openai, ai21, cohere etc.)|\n",
+        "| - | - | \n",
+        "|**task**     |Task for which the model is to be evaluated (question-answering or summarization)|\n",
+        "| **model**     | Specifies the model(s) to be evaluated. Can be a dictionary or a list of dictionaries. Each dictionary should contain 'model' and 'hub' keys. If a path is specified, the dictionary must contain 'model' and 'hub' keys.|\n",
+        "| **data**      | The data to be used for evaluation. A dictionary providing flexibility and options for data sources. It should include the following keys: <ul><li>data_source (mandatory): The source of the data.</li><li>subset (optional): The subset of the data.</li><li>feature_column (optional): The column containing the features.</li><li>target_column (optional): The column containing the target labels.</li><li>split (optional): The data split to be used.</li></ul> |\n",
+        "| **config**    | Configuration for the tests to be performed, specified in the form of a YAML file. |\n",
         "\n",
         "<br/>\n",
         "<br/>"

diff --git a/demo/tutorials/llm_notebooks/Cohere_QA_Summarization_Testing_Notebook.ipynb b/demo/tutorials/llm_notebooks/Cohere_QA_Summarization_Testing_Notebook.ipynb
@@ -92,10 +92,9 @@
         "| Parameter  | Description |  \n",
         "| - | - | \n",
         "|**task**     |Task for which the model is to be evaluated (question-answering or summarization)|\n",
-        "|**model**     |LLM model name (ex: text-davinci-002, command-xlarge-nightly etc.)|\n",
-        "|**data**       |Benchmark dataset name (ex: BoolQ-test, XSum-test etc.)|\n",
-        "|**config**     |Configuration for the tests to be performed, specified in form of a YAML file.|\n",
-        "|**hub**       | Name of the hub (ex: openai, azure-openai, ai21, cohere etc.)|\n",
+        "| **model**     | Specifies the model(s) to be evaluated. Can be a dictionary or a list of dictionaries. Each dictionary should contain 'model' and 'hub' keys. If a path is specified, the dictionary must contain 'model' and 'hub' keys.|\n",
+        "| **data**      | The data to be used for evaluation. A dictionary providing flexibility and options for data sources. It should include the following keys: <ul><li>data_source (mandatory): The source of the data.</li><li>subset (optional): The subset of the data.</li><li>feature_column (optional): The column containing the features.</li><li>target_column (optional): The column containing the target labels.</li><li>split (optional): The data split to be used.</li></ul> |\n",
+        "| **config**    | Configuration for the tests to be performed, specified in the form of a YAML file. |\n",
         "\n",
         "<br/>\n",
         "<br/>"
@@ -176,7 +175,7 @@
       },
       "outputs": [],
       "source": [
-        "harness = Harness(task=\"question-answering\", hub=\"cohere\", model=\"command-xlarge-nightly\", data='BoolQ-test-tiny')"
+        "harness = Harness(task=\"question-answering\", model={\"model\": \"command-xlarge-nightly\", \"hub\":\"cohere\"}, data={\"data_source\": 'BoolQ-test-tiny'})"
       ]
     },
     {
@@ -577,7 +576,8 @@
       },
       "outputs": [],
       "source": [
-        "harness = Harness(task=\"question-answering\", hub=\"cohere\", model=\"command-xlarge-nightly\", data='NQ-open-test-tiny')"
+        "harness = Harness(task=\"question-answering\", model={\"model\": \"command-xlarge-nightly\",\"hub\":\"cohere\"} data={\"data_source\": \n",
+        "'NQ-open-test-tiny'})"
       ]
     },
     {
@@ -716,7 +716,7 @@
       "metadata": {},
       "outputs": [],
       "source": [
-        "harness = Harness(task='summarization',hub=\"cohere\", model=\"command-xlarge-nightly\", data='XSum-test-tiny')"
+        "harness = Harness(task='summarization', model={\"model\": \"command-xlarge-nightly\", \"hub\":\"cohere\"}, data={\"data_source\": 'XSum-test-tiny'})"
       ]
     },
     {