Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
f331b69
Added: implemeted the breaking sentence by newline in robustness.
chakravarthik27 Sep 14, 2024
f274765
refactor the add_new_lines and while random selection of number of ne…
chakravarthik27 Sep 14, 2024
0414f71
parameter: number_of_lines -> max_lines.
chakravarthik27 Sep 14, 2024
a3986b4
Merge pull request #1109 from JohnSnowLabs/feature/implement-the-addn…
chakravarthik27 Sep 14, 2024
3160b1d
Implemented the add_tabs test in robustness category
chakravarthik27 Sep 14, 2024
8179145
Merge remote-tracking branch 'origin/release/2.4.0' into feature/impl…
chakravarthik27 Sep 14, 2024
c8a9511
implemented: basic structured to handle visualQA
chakravarthik27 Sep 14, 2024
f7b53e6
Refactor VisualQASample class to include additional attributes and do…
chakravarthik27 Sep 14, 2024
6eec7ca
Refactor llm_modelhandler.py to include PretrainedModelForVisualQA class
chakravarthik27 Sep 14, 2024
b95ecf3
Refactor VisualQA class to fix typo in base class name
chakravarthik27 Sep 14, 2024
ca2f9d6
Merge pull request #1110 from JohnSnowLabs/feature/implement-the-addt…
chakravarthik27 Sep 15, 2024
adf18db
Merge remote-tracking branch 'origin/release/2.4.0' into feature/impl…
chakravarthik27 Sep 15, 2024
d3e6fa5
updated: image handling while loading dataset.
chakravarthik27 Sep 15, 2024
3ee5f8f
implemented the different tests under robusntess category and support…
chakravarthik27 Sep 15, 2024
3dd6770
Refactor image handling in robustness tests
chakravarthik27 Sep 15, 2024
d95e558
Refactor image handling in robustness tests and add support for multi…
chakravarthik27 Sep 15, 2024
ebd7bfd
Refactor image handling in robustness tests and update VisualQASample…
chakravarthik27 Sep 15, 2024
4538490
Refactor image handling in robustness tests and exclude image-related…
chakravarthik27 Sep 15, 2024
41f0db2
fixed: format issues.
chakravarthik27 Sep 15, 2024
3521927
Refactor image handling in robustness tests and remove commented code
chakravarthik27 Sep 16, 2024
a87e96c
Refactor image handling in robustness tests and update VisualQASample…
chakravarthik27 Sep 16, 2024
04e18e3
- added new tests in image robustness.
chakravarthik27 Sep 16, 2024
8039ef8
Add pillow library to pyproject.toml
chakravarthik27 Sep 16, 2024
febf855
Update transformers version to 4.44.2
chakravarthik27 Sep 16, 2024
101305a
Update transformers version to 4.43.1
chakravarthik27 Sep 16, 2024
96cc4f1
Update pyproject.toml to force CPU installation of torch
chakravarthik27 Sep 16, 2024
d64312d
Update accelerate version to 0.22.0
chakravarthik27 Sep 16, 2024
4780cf0
Update accelerate version to 0.33.0 and pyproject.toml to force CPU i…
chakravarthik27 Sep 16, 2024
0c7c9b0
Now handles the multi-label in accuracy tests.
chakravarthik27 Sep 16, 2024
54f235d
Refactor accuracy tests to handle multi-label classification
chakravarthik27 Sep 16, 2024
a04eba6
Update mlflow version to 2.16.1 and add openpyxl and tables dependencies
chakravarthik27 Sep 16, 2024
9f7f73e
Merge pull request #1114 from JohnSnowLabs/fix/error-in-accuracy-test…
chakravarthik27 Sep 16, 2024
ac652cf
Update pydantic version to 1.10.11
chakravarthik27 Sep 17, 2024
2d0f0d8
Update transformers version to 4.44.2 and mlflow version to 2.16.2
chakravarthik27 Sep 17, 2024
3745e6a
Refactor calculate_f1_score function to handle different types of y_t…
chakravarthik27 Sep 17, 2024
bcdfc92
formatted.
chakravarthik27 Sep 17, 2024
b0a1a26
Merge pull request #1116 from JohnSnowLabs/fix/error-in-accuracy-test…
chakravarthik27 Sep 17, 2024
d3a4663
Merge pull request #1112 from JohnSnowLabs/update/fixing-security-issues
chakravarthik27 Sep 17, 2024
a5ae26a
Merge remote-tracking branch 'origin/release/2.4.0' into feature/impl…
chakravarthik27 Sep 17, 2024
10aa4b3
Refactor security.py to add new security checks
chakravarthik27 Sep 17, 2024
b29f9dd
resolve OutofMemory issues
chakravarthik27 Sep 17, 2024
16a3aa5
updated the notebook
chakravarthik27 Sep 17, 2024
b337d2b
Update pillow version to 10.0.0 and make it a required dependency
chakravarthik27 Sep 17, 2024
67c641d
Merge pull request #1111 from JohnSnowLabs/feature/implement-the-supp…
chakravarthik27 Sep 17, 2024
62b77b1
Refactor typing imports in accuracy.py and safety.py
chakravarthik27 Sep 18, 2024
409cb96
Refactor prepare_model_response method to handle multi-label classifi…
chakravarthik27 Sep 18, 2024
d98a9d3
fixed: circular import errors
chakravarthik27 Sep 18, 2024
7a58067
Refactor test type in safety.py and add decimal formatting in output.py
chakravarthik27 Sep 18, 2024
5e482e1
Refactor multi-label handling in TestResultManager
chakravarthik27 Sep 18, 2024
e9c54e9
fixed: formatted issue
chakravarthik27 Sep 18, 2024
4664bbf
Merge pull request #1118 from JohnSnowLabs/fix/error-in-accuracy-test…
chakravarthik27 Sep 18, 2024
a90c932
Refactor PromptGuard class and related modules
chakravarthik27 Sep 19, 2024
092b3e9
Refactor fairness test to handle multi-label classification in text c…
chakravarthik27 Sep 19, 2024
f362a62
fixed: format and liniting issues.
chakravarthik27 Sep 19, 2024
7e2b232
Merge pull request #1121 from JohnSnowLabs/fix/error-in-fairness-test…
chakravarthik27 Sep 19, 2024
d89477a
Merge pull request #1119 from JohnSnowLabs/feature/enhance-security-t…
chakravarthik27 Sep 19, 2024
da9f58b
Refactor security.py: Remove unused classes and methods
chakravarthik27 Sep 19, 2024
90e902f
update version to 2.4.0 in pyproject.toml for release
chakravarthik27 Sep 20, 2024
551cc12
jailbreak and injection tests supports for text-classification.
chakravarthik27 Sep 20, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions demo/tutorials/llm_notebooks/Visual_QA.ipynb

Large diffs are not rendered by default.

10 changes: 9 additions & 1 deletion langtest/datahandler/datasource.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,12 @@
"anti-stereotype": ["anti-stereotype"],
"unrelated": ["unrelated"],
},
"visualqa": {
"image": ["image", "image_1"],
"question": ["question"],
"options": ["options"],
"answer": ["answer"],
},
}


Expand Down Expand Up @@ -183,7 +189,7 @@ def __init__(self, file_path: Union[str, dict], task: TaskManager, **kwargs) ->
raise ValueError(Errors.E024)

if "data_source" not in file_path:
raise ValueError(Errors.E025)
raise ValueError(Errors.E025())
self._custom_label = file_path.copy()
self._file_path = file_path.get("data_source")
self._size = None
Expand Down Expand Up @@ -1246,6 +1252,7 @@ class HuggingFaceDataset(BaseDataset):
"summarization",
"ner",
"question-answering",
"visualqa",
]

LIB_NAME = "datasets"
Expand Down Expand Up @@ -1709,6 +1716,7 @@ class PandasDataset(BaseDataset):
"legal",
"factuality",
"stereoset",
"visualqa",
]
COLUMN_NAMES = {task: COLUMN_MAPPER[task] for task in supported_tasks}

Expand Down
6 changes: 6 additions & 0 deletions langtest/langtest.py
Original file line number Diff line number Diff line change
Expand Up @@ -605,6 +605,7 @@ def generated_results(self) -> Optional[pd.DataFrame]:
"model_name",
"category",
"test_type",
"original_image",
"original",
"context",
"prompt",
Expand All @@ -613,8 +614,10 @@ def generated_results(self) -> Optional[pd.DataFrame]:
"completion",
"test_case",
"perturbed_context",
"perturbed_image",
"perturbed_question",
"sentence",
"question",
"patient_info_A",
"patient_info_B",
"case",
Expand Down Expand Up @@ -838,6 +841,7 @@ def testcases(self, additional_cols=False) -> pd.DataFrame:
"model_name",
"category",
"test_type",
"original_image",
"original",
"context",
"original_context",
Expand All @@ -863,7 +867,9 @@ def testcases(self, additional_cols=False) -> pd.DataFrame:
"correct_sentence",
"incorrect_sentence",
"perturbed_context",
"perturbed_image",
"perturbed_question",
"question",
"ground_truth",
"options",
"expected_result",
Expand Down
57 changes: 56 additions & 1 deletion langtest/modelhandler/llm_modelhandler.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
import logging
from functools import lru_cache
from langtest.utils.custom_types.helpers import HashableDict
from langchain.chat_models.base import BaseChatModel


class PretrainedModelForQA(ModelAPI):
Expand Down Expand Up @@ -80,7 +81,7 @@ def load_model(cls, hub: str, path: str, *args, **kwargs) -> "PretrainedModelFor
try:
cls._update_model_parameters(hub, filtered_kwargs)
if path in (
"gpt-4o",
"gpt-4o-mini",
"gpt-4",
"gpt-3.5-turbo",
"gpt-4-1106-preview",
Expand Down Expand Up @@ -452,3 +453,57 @@ class PretrainedModelForSycophancy(PretrainedModelForQA, ModelAPI):
"""

pass


class PretrainedModelForVisualQA(PretrainedModelForQA, ModelAPI):
"""A class representing a pretrained model for visual question answering.

Inherits:
PretrainedModelForQA: The base class for pretrained models.
"""

@lru_cache(maxsize=102400)
def predict(
self, text: Union[str, dict], prompt: dict, images: List[Any], *args, **kwargs
):
"""Perform prediction using the pretrained model.

Args:
text (Union[str, dict]): The input text or dictionary.
prompt (dict): The prompt configuration.
images (List[Any]): The list of images.
*args: Additional positional arguments.
**kwargs: Additional keyword arguments.

Returns:
dict: A dictionary containing the prediction result.
- 'result': The prediction result.
"""
try:
if not isinstance(self.model, BaseChatModel):
ValueError("visualQA task is only supported for chat models")

# prepare prompt
prompt_template = PromptTemplate(**prompt)
from langchain_core.messages import HumanMessage

images = [
{
"type": "image_url",
"image_url": {"url": image},
}
for image in images
]

messages = HumanMessage(
content=[
{"type": "text", "text": prompt_template.format(**text)},
*images,
]
)

response = self.model.invoke([messages])
return response.content

except Exception as e:
raise ValueError(Errors.E089(error_message=e))
128 changes: 128 additions & 0 deletions langtest/modelhandler/promptguard.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
class PromptGuard:
_instance = None

def __new__(cls, model_name: str = "meta-llama/Prompt-Guard-86M", device="cpu"):
if cls._instance is None:
cls._instance = super().__new__(cls)
cls._instance.model_name = model_name
cls._instance.device = device
(
cls._instance.model,
cls._instance.tokenizer,
) = cls._instance._load_model_and_tokenizer()
return cls._instance

def __init__(
self, model_name: str = "meta-llama/Prompt-Guard-86M", device="cpu"
) -> None:
self.model_name = "meta-llama/Prompt-Guard-86M"
self.device = "cpu"
self.model, self.tokenizer = self._load_model_and_tokenizer()

def _load_model_and_tokenizer(self):
"""
Load the model and tokenizer from Hugging Face.
"""
from transformers import AutoModelForSequenceClassification, AutoTokenizer

model = AutoModelForSequenceClassification.from_pretrained(self.model_name).to(
self.device
)
tokenizer = AutoTokenizer.from_pretrained(self.model_name)
return model, tokenizer

def _preprocess_text(self, text):
"""
Preprocess the input text by removing spaces to mitigate prompt injection tactics.
"""
cleaned_text = "".join([char for char in text if not char.isspace()])
tokens = self.tokenizer.tokenize(cleaned_text)
result = " ".join(
[self.tokenizer.convert_tokens_to_string([token]) for token in tokens]
)
return result or text

def _get_class_probabilities(self, texts, temperature=1.0, preprocess=True):
"""
Internal method to get class probabilities for a single or batch of texts.
"""
import torch
from torch.nn.functional import softmax

if preprocess:
texts = [self._preprocess_text(text) for text in texts]

inputs = self.tokenizer(
texts, return_tensors="pt", padding=True, truncation=True, max_length=512
)
inputs = inputs.to(self.device)

with torch.no_grad():
logits = self.model(**inputs).logits

probabilities = softmax(logits / temperature, dim=-1)
return probabilities

def get_jailbreak_score(self, text, temperature=1.0, preprocess=True):
"""
Get jailbreak score for a single input text.
"""
probabilities = self._get_class_probabilities([text], temperature, preprocess)
return probabilities[0, 2].item()

def get_indirect_injection_score(self, text, temperature=1.0, preprocess=True):
"""
Get indirect injection score for a single input text.
"""
probabilities = self._get_class_probabilities([text], temperature, preprocess)
return (probabilities[0, 1] + probabilities[0, 2]).item()

def _process_text_batch(
self, texts, score_indices, temperature=1.0, max_batch_size=16, preprocess=True
):
"""
Internal method to process texts in batches and return scores.
"""
import torch

num_texts = len(texts)
all_scores = torch.zeros(num_texts)

for i in range(0, num_texts, max_batch_size):
batch_texts = texts[i : i + max_batch_size]
probabilities = self._get_class_probabilities(
batch_texts, temperature, preprocess
)
batch_scores = probabilities[:, score_indices].sum(dim=1).cpu()

all_scores[i : i + max_batch_size] = batch_scores

return all_scores.tolist()

def get_jailbreak_scores_for_texts(
self, texts, temperature=1.0, max_batch_size=16, preprocess=True
):
"""
Get jailbreak scores for a batch of texts.
"""
return self._process_text_batch(
texts,
score_indices=[2],
temperature=temperature,
max_batch_size=max_batch_size,
preprocess=preprocess,
)

def get_indirect_injection_scores_for_texts(
self, texts, temperature=1.0, max_batch_size=16, preprocess=True
):
"""
Get indirect injection scores for a batch of texts.
"""
return self._process_text_batch(
texts,
score_indices=[1, 2],
temperature=temperature,
max_batch_size=max_batch_size,
preprocess=preprocess,
)
41 changes: 41 additions & 0 deletions langtest/tasks/task.py
Original file line number Diff line number Diff line change
Expand Up @@ -851,3 +851,44 @@ def create_sample(

class FillMask(BaseTask):
pass


class VisualQA(BaseTask):
_name = "visualqa"
_default_col = {
"image": ["image"],
"question": ["question"],
"answer": ["answer"],
}
sample_class = samples.VisualQASample

def create_sample(
cls,
row_data: dict,
image: str = "image_1",
question: str = "question",
options: str = "options",
answer: str = "answer",
dataset_name: str = "",
) -> samples.VisualQASample:
"""Create a sample."""
keys = list(row_data.keys())

# auto-detect the default column names from the row_data
column_mapper = cls.column_mapping(keys, [image, question, options, answer])

options = row_data.get(column_mapper.get(options, "-"), "-")

if len(options) > 3 and options[0] == "[" and options[-1] == "]":
options = ast.literal_eval(row_data[column_mapper["options"]])
options = "\n".join(
[f"{chr(65 + i)}. {option}" for i, option in enumerate(options)]
)

return samples.VisualQASample(
original_image=row_data[column_mapper[image]],
question=row_data[column_mapper[question]],
options=options,
expected_result=row_data[column_mapper[answer]],
dataset_name=dataset_name,
)
3 changes: 3 additions & 0 deletions langtest/transform/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@
from langtest.transform.grammar import GrammarTestFactory
from langtest.transform.safety import SafetyTestFactory

from langtest.transform import image

# Fixing the asyncio event loop
nest_asyncio.apply()

Expand All @@ -47,4 +49,5 @@
SycophancyTestFactory,
GrammarTestFactory,
SafetyTestFactory,
image,
]
Loading