adversarial-evaluation

Here are 4 public repositories matching this topic...

Ziqing110 / rag-evidence-attack-lab

Scientific QA robustness evaluation pipeline for evidence-missing RAG scenarios on PeerQA, with EM/F1 reliability analysis.

python rag openai-api llm-evaluation hallucination-detection adversarial-evaluation

Updated Mar 18, 2026
Python

Madhur-1 / RevealVLLMSafetyEval

Star

RevealVLLMSafetyEval is a comprehensive pipeline for evaluating Vision-Language Models (VLMs) on their compliance with harm-related policies. It automates the creation of adversarial multi-turn datasets and the evaluation of model responses, supporting responsible AI development and red-teaming efforts.

red-teaming responsible-ai llava vllm vision-language-models qwen2 responsible-ai-techniques llama3 phi3 gpt-4o qwen2-vl pixtral adversarial-evaluation multimodal-safety

Updated May 12, 2025
Python

Darv0n / sia-research-engine

Star

Multi-agent deep research engine with SIA (Semantic Intelligence Architecture) — thermodynamic entropy control, adversarial critique, multi-reactor swarm orchestration

python research entropy multi-agent knowledge-graph swarm-intelligence rag anthropic langgraph adversarial-evaluation

Updated Feb 25, 2026
Python

SHRAVANIRANE / GuardMCP

Star

GuardMCP - Deterministic Runtime Semantic Enforcement for Agentic Tool Execution using Directional Intent–Action Alignment

semantic-alignment nlp-research adversarial-evaluation agent-safety prompt-injection-detection research-benchmark embedding-based-methods vector-space-analysis

Updated Apr 4, 2026
Python

Improve this page

Add a description, image, and links to the adversarial-evaluation topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the adversarial-evaluation topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adversarial-evaluation

Here are 4 public repositories matching this topic...

Ziqing110 / rag-evidence-attack-lab

Madhur-1 / RevealVLLMSafetyEval

Darv0n / sia-research-engine

SHRAVANIRANE / GuardMCP

Improve this page

Add this topic to your repo