HuntMCP

HuntMCP is a portfolio-grade, deterministic-first SOC investigation prototype. It parses exported security logs, normalizes events, applies rule-based hunting logic, extracts IOCs, enriches them with CTI sources, and produces an analyst-style Markdown report.

The core idea is simple: the LLM is not the detection engine. HuntMCP detects suspicious activity with auditable rules first, then uses an OpenAI-compatible LLM only as an analyst assistant for explanation, severity reasoning, MITRE ATT&CK suggestions, false positive notes, and recommended next steps.

Project Status

This repository is intended as a cybersecurity portfolio project and advanced MVP, not as a production SIEM or production detection replacement.

Current validated state:

CLI pipeline works end to end.
FastAPI backend supports investigation jobs and persisted case summaries.
SQLite persistence stores cases, runs, events, findings, IOCs, enrichments, triage results, reports, and job state.
Deterministic tests run offline even when a local .env exists.
Current validation baseline: 302 passed, ruff check clean, ruff format clean.

What It Does

HuntMCP can:

Parse CSV, JSON, and JSONL log exports.
Normalize Windows Security, Sysmon, DNS, Proxy/Web, and generic CSV logs.
Run Sigma-inspired deterministic detection rules.
Extract IOCs such as IPv4 addresses, domains, URLs, hashes, CVEs, and emails.
Enrich IOCs with mock/local CTI and optional external CTI connectors.
Triage findings with an OpenAI-compatible LLM or a deterministic mock fallback.
Generate Markdown investigation reports.
Persist investigation output to SQLite for case review.
Expose a small FastAPI backend for uploads, jobs, cases, findings, IOCs, and reports.

What It Is Not

HuntMCP is not a SIEM.
HuntMCP is not a production detection replacement.
HuntMCP does not perform actor attribution.
HuntMCP does not automatically prove malicious activity.
Generated reports require analyst review.
Public CTI sources can be noisy, stale, incomplete, or unavailable.

Architecture

The codebase is organized so modules can later be wrapped as MCP servers:

huntmcp.parsers      -> Log Parser MCP candidate
huntmcp.detection    -> Detection MCP candidate
huntmcp.ioc          -> IOC Extractor MCP candidate
huntmcp.enrichment   -> CTI Enrichment MCP candidate
huntmcp.llm          -> Analyst Triage MCP candidate
huntmcp.reporting    -> Report MCP candidate

High-level flow:

Raw logs
  -> Parser
  -> Normalized events
  -> Deterministic rule engine
  -> Findings
  -> IOC extractor
  -> CTI enrichment
  -> Redacted LLM triage
  -> Markdown report + SQLite case storage

Supported Log Types

Windows Security
Sysmon
DNS
Proxy/Web
Generic CSV

Detection Rules

The default rule set includes:

Multiple failed logons
Failed logon followed by successful logon
Suspicious PowerShell command
New local admin user creation
DNS beaconing candidate
Known suspicious URL/domain access

The engine is deterministic and auditable. Chunked detection has parity tests to ensure correlation rules are not missed when the CLI is run with --chunk-size.

CTI Enrichment

Supported CTI source names for --cti:

mock or mock-local-cti
urlhaus
abuseipdb
otx
virustotal

External CTI connectors are optional. If a key is missing, local/mock workflows can still run. URLhaus metadata lookups do not download malware samples.

LLM Triage Workflow

Rules create findings from normalized events.
IOCs are extracted from those findings.
CTI enrichment is attached.
Sensitive values are redacted.
Only selected suspicious context is sent to the LLM.
Log content is explicitly treated as untrusted data.
The model returns structured JSON:
- verdict
- severity
- reason
- mitre_attack
- false_positive_notes
- recommended_next_steps
- analyst_summary

If OPENAI_API_KEY is not set, HuntMCP returns deterministic mock triage so tests and demos do not require secrets.

The default example model is:

LLM_MODEL=mimo-v2.5

Security Considerations

.env is ignored and must not be committed.
.env.example contains only empty/example values.
API keys are not logged or sent to the model.
Raw logs are treated as untrusted input.
Prompt templates warn the LLM not to follow instructions from log content.
Redaction supports usernames, internal IPs, hostnames, emails, tokens, and cookies.
CTI lookups use guardrails to block localhost, private ranges, link-local ranges, multicast, and metadata IPs.
Tests isolate secret environment variables to avoid accidental live API calls.
CORS origins are configurable with HUNTMCP_CORS_ALLOW_ORIGINS.

Quick Start

python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
python huntmcp.py self-test

Expected self-test result:

ok: true
event_count: 5
finding_count: 5
enriched_count: 5
triage_count: 5

Configuration

Create a local .env from the example:

Copy-Item .env.example .env

Example variables:

OPENAI_API_KEY=
LLM_MODEL=mimo-v2.5
OPENAI_BASE_URL=
URLHAUS_AUTH_KEY=
ABUSEIPDB_API_KEY=
OTX_API_KEY=
VIRUSTOTAL_API_KEY=
LLM_TIMEOUT_SECONDS=30
HUNTMCP_MAX_UPLOAD_SIZE_BYTES=10485760
HUNTMCP_MAX_LLM_FINDINGS=50
HUNTMCP_CTI_LOOKUP_LIMIT=250

CLI Demo

Parse logs:

python huntmcp.py parse --input data/sample_logs/windows_security.csv --type windows_security

Run detections:

python huntmcp.py detect

Enrich findings:

python huntmcp.py enrich --cti mock

Run LLM/mock triage:

python huntmcp.py triage --limit 5

Generate report:

python huntmcp.py report

Run the persisted investigation workflow:

python huntmcp.py init-db
python huntmcp.py investigate --input data/sample_logs/windows_security.csv --type windows_security --cti mock --model mimo-v2.5 --case-id demo-windows --case-name demo-windows
python huntmcp.py case-summary --case-id demo-windows

Use multiple CTI sources:

python huntmcp.py enrich --cti mock,urlhaus,abuseipdb,otx,virustotal --cti-lookup-limit 250

--cti-lookup-limit caps unique remote IOC lookups for a run. This keeps large public datasets from spending minutes in third-party CTI calls before the LLM triage step starts. Use -1 only when you intentionally want unlimited live CTI lookups.

Fetch a small URLhaus-derived proxy sample:

python huntmcp.py fetch-urlhaus-sample --limit 25
python huntmcp.py parse --input data/public_samples/urlhaus_recent_proxy.csv --type proxy
python huntmcp.py detect

Public Dataset Validation

HuntMCP was also validated against URLhaus public recent URL metadata with live OpenAI-compatible LLM triage using deepseek-v4-flash. The portfolio-scale run processed 1000 URLhaus records, produced 1504 deterministic findings, enriched 1504 findings, and used 200 live DeepSeek triage results for the final report.

See docs/deepseek_v4_flash_urlhaus_200_run.md for the commands, metrics, CTI lookup budget, verdict/severity distribution, and MITRE ATT&CK mapping summary. Generated JSON, SQLite, downloaded CSV, and report artifacts are intentionally excluded from git; this documentation file is the push-safe evidence artifact.

API Demo

Start the API:

uvicorn huntmcp.api:app --reload

Set explicit CORS origins if needed:

$env:HUNTMCP_CORS_ALLOW_ORIGINS="http://localhost:3000,http://localhost:5173"
uvicorn huntmcp.api:app --reload

Useful endpoints:

GET  /health
POST /investigate
POST /investigations
GET  /jobs/{job_id}
GET  /cases
GET  /cases/{case_id}/summary
GET  /findings/{case_id}
GET  /iocs/{case_id}
GET  /reports/{case_id}
GET  /dashboard

Testing

Run the full validation suite:

python -m pytest -q
python -m ruff check huntmcp huntmcp.py tests
python -m ruff format --check huntmcp huntmcp.py tests

Current local validation:

302 passed
All ruff checks passed
76 files already formatted

Example Output

A report includes:

Executive Summary
Findings
Timeline
IOC Table
MITRE ATT&CK Mapping
Detection Logic
False Positive Notes
Recommended Next Steps
Limitations

Reports are generated under reports/. Generated reports are ignored by git by default so the repository stays clean.

Repository Hygiene

The repository intentionally excludes:

.env
Python caches
pytest/ruff caches
SQLite databases
generated normalized events
generated findings
generated enrichment output
generated reports
downloaded public CTI datasets

Only source code, tests, docs, configs, and small sample logs should be pushed.

Limitations

This is a portfolio-grade prototype, not a production SOC platform.
The current detection set is intentionally small.
SQLite is suitable for local/single-user workflows, not multi-tenant production.
Very large datasets need a true stateful streaming detection engine.
External CTI quality depends on third-party availability and rate limits.
LLM output can be wrong and must be reviewed by an analyst.
API authentication, RBAC, audit logging, and deployment hardening are future work.

Future Work

True stateful streaming detection for very large datasets
More Sigma-compatible rule loading and rule metadata
Richer analyst UI for timeline, IOC pivoting, and finding review
API authentication and role-based access control
Job queue backend for longer investigations
More CTI connector response normalization and rate-limit handling
Docker deployment profiles and production secret management
Real MCP server wrappers for parser, detection, enrichment, triage, and reporting modules

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
config		config
data		data
docs		docs
huntmcp		huntmcp
reports		reports
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
huntmcp.py		huntmcp.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HuntMCP

Project Status

What It Does

What It Is Not

Architecture

Supported Log Types

Detection Rules

CTI Enrichment

LLM Triage Workflow

Security Considerations

Quick Start

Configuration

CLI Demo

Public Dataset Validation

API Demo

Testing

Example Output

Repository Hygiene

Limitations

Future Work

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

HuntMCP

Project Status

What It Does

What It Is Not

Architecture

Supported Log Types

Detection Rules

CTI Enrichment

LLM Triage Workflow

Security Considerations

Quick Start

Configuration

CLI Demo

Public Dataset Validation

API Demo

Testing

Example Output

Repository Hygiene

Limitations

Future Work

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages