LLM-Driven
Exploit Validation

Feed it a target. Watch it hunt. LLMtary autonomously discovers vulnerabilities, executes real commands, and delivers confirmed proof-of-exploitation.

Available for Windows, macOS, and Linux

What is LLMtary?

LLMtary is a Flutter desktop application that uses large language models to automate the full penetration testing lifecycle โ€” from reconnaissance and vulnerability analysis through active exploit validation and professional report generation. It doesn't just suggest vulnerabilities; it proves them by executing real commands on your machine, evaluating the output, and iterating until each finding is confirmed or ruled out.

๐Ÿ”

Autonomous Recon

LLM-guided reconnaissance engine runs discovery commands โ€” port scans, service banners, DNS enumeration, WAF detection โ€” and builds enriched target JSON before analysis begins.

๐Ÿง 

2-Phase Analysis Pipeline

Phase 1 builds context from CVE matching, DNS/OSINT, and network services. Phase 2 fires targeted web, Active Directory, and tech-specific analysis โ€” each enriched with Phase 1 findings.

โšก

Active Exploit Testing Loop

Each finding goes through an autonomous agentic loop: plan, execute, evaluate, adapt. The LLM runs real shell commands against the target and iterates until the vulnerability is confirmed or ruled out.

๐Ÿ”—

Attack Chain Reasoning

When โ‰ฅ2 vulnerabilities are confirmed, a BloodHound-style chain reasoning pass fires โ€” identifying how individual findings can be combined into higher-impact multi-step attack paths.

๐Ÿ—๏ธ

Credential Bank

Discovered credentials are collected session-wide and automatically reused when testing subsequent targets. Verified credentials trigger an authenticated re-analysis pass for deeper findings.

๐Ÿ›ก๏ธ

Safety Controls

Hard blocklist prevents dangerous commands (rm -rf /, format, fork bombs). Command approval mode lets you review every command before execution. Scope enforcement discards out-of-scope findings.

Require approval toggle for reviewing commands before execution
๐Ÿ“‹

Post-Exploitation Enumeration

When RCE or high-value access is confirmed, a post-exploitation loop automatically enumerates users, credentials, network interfaces, running services, and privilege escalation paths.

๐Ÿ“„

Professional Reports

Generate HTML, Markdown, or CSV reports with AI-assisted executive summaries, full CVSS metadata, discovered credentials, and attack chain narratives. Export encrypted .penex project bundles.

Phased Engagement Architecture

LLMtary mirrors a real engagement workflow. Each phase enriches the next โ€” passive recon feeds targeted analysis, confirmed findings feed attack chain reasoning, and credentials feed re-analysis.

Phase 1 โ€” Passive

Recon & Fingerprinting

CVE/version matching, network service analysis, DNS/OSINT, and email security checks. Results are compiled into a context block injected into every Phase 2 prompt.

Phase 2 โ€” Active Analysis

Full Vulnerability Discovery

Web app (4 passes), Active Directory (3 passes), SSL/TLS, privilege escalation, and 15+ technology deep-dives โ€” each enriched with Phase 1 context for targeted accuracy.

Execution

Autonomous Exploit Loop

Each finding runs through an agentic loop: RECON โ†’ VERIFICATION โ†’ EXPLOITATION โ†’ CONFIRMATION. The LLM adapts its approach based on command output, detects rate-limits, and avoids repeating failed methods.

Post-Execution

Chain Reasoning & Reporting

Confirmed findings are chained into multi-step attack paths. Post-exploitation enumeration documents the full impact. Reports are generated as HTML, Markdown, or CSV.

What the Reports Look Like

Every engagement produces a clean, self-contained HTML report. Click any screenshot to expand it.

๐ŸŸฅ
Severity Badges

Critical, High, Medium, and Low โ€” colour-coded with CONFIRMED status on every finding.

๐Ÿ”ข
CVSS Metadata

Full CVSS 3.1 vector strings, numeric scores, and CVE IDs pulled automatically for each finding.

๐Ÿ’ป
Proof Commands

The exact command used to confirm exploitation is embedded inline โ€” copy-paste ready for client demos.

๐Ÿ“Š
Executive Summary

AI-generated narrative with a severity breakdown grid โ€” ready for the non-technical stakeholder audience.

Using LLMtary

Navigate left to right through the four tabs. Each tab feeds the next.

SCOPE / RECON TAB

Create a project, add targets (hostname, FQDN, or IP), and configure scope. Run autonomous recon to collect scan data, or paste your own JSON directly. The built-in recon engine drives nmap, DNS enumeration, web fingerprinting, and WAF detection through an LLM-guided loop.


VULN / HUNT TAB

Click Analyze to fire the 2-phase analysis pipeline. Multiple LLM prompts run in parallel. Findings appear in the vulnerability table as they arrive, sorted by severity and confidence. Each finding includes CVSS metadata, evidence quotes, and business risk assessment.


PROOF / EXPLOIT TAB

Select the findings you want to actively test and click Execute Selected. The autonomous exploit loop runs each finding through real command execution on your machine. Status updates in real time: CONFIRMED, NOT VULNERABLE, or UNDETERMINED. Enable Command Approval mode to review every command before it runs.


RESULT / REPORT TAB

Review the full findings summary with confirmed exploit counts, attack chains, and token usage by phase and target. Generate a professional report as HTML, Markdown, or CSV. AI-assisted generation creates the executive summary, methodology, and conclusion sections.

Supported AI Providers

Works with local models for air-gapped environments and cloud providers for maximum performance. Settings are saved per-provider โ€” switching restores your previous API key, model, and base URL.

Ollama
Local
localhost:11434
LM Studio
Local
localhost:1234/v1
Claude
Anthropic ยท Cloud
ChatGPT
OpenAI ยท Cloud
Gemini
Google ยท Cloud
OpenRouter
Multi-model ยท Cloud

Local models require 14B+ parameters. Recommended: 32B+ (Q4_K_M or higher) for reliable multi-step reasoning. Cloud providers โ€” Claude Opus/Sonnet, GPT-4o, Gemini 1.5 Pro โ€” deliver the best results.

Built-in

LLM Setup Wizard

Getting started takes less than a minute. LLMtary's built-in setup wizard walks you through selecting your AI provider, entering your base URL and API key, and choosing a model โ€” all in one screen.

  • Supports 6 providers out of the box โ€” local and cloud
  • Settings are saved per-provider โ€” switch without re-entering credentials
  • Built-in Test button verifies your connection before you start
  • Configure temperature, max tokens, timeout, and iteration caps
  • Command whitelist controls exactly which tools LLMtary is allowed to run
LLM provider setup wizard showing provider selection, base URL, model, and settings

LLMtary vs. PenPeeper

Both tools are built for security professionals and share the same cross-platform Flutter foundation โ€” but they serve different stages of the engagement.

LLMtary PenPeeper
Primary Purpose Autonomous exploit testing & validation Engagement management & organization
User Role Set targets and review confirmed results Drive the workflow manually, step by step
Vulnerability Testing Active โ€” executes real shell commands, confirms or denies each finding Passive โ€” flags potential vulnerabilities for manual review
Recon Autonomous LLM-guided recon loop Built-in tool automation (nmap, nikto, etc.) with manual control
Report Output HTML, Markdown, CSV with confirmed exploit proof Polished PDF with custom graphics, rich text editor
Attack Chaining Automatic โ€” BloodHound-style chain reasoning on confirmed findings Manual โ€” user connects the dots between flagged items
AI Integration Core engine โ€” drives recon, analysis, and exploitation autonomously Assistive โ€” generates summaries and populates finding fields
Best For Automated validation, large target sets, proof-of-exploitation Organizing complex engagements, client-ready PDF reporting
Open Source Open Source (GitHub) Open Source (GitHub)

Also check out PenPeeper

The open source penetration testing engagement manager. Organize devices, run automated scans, flag vulnerabilities, and generate polished client-ready PDF reports โ€” all in one workflow tool.

Visit PenPeeper.com