LLM Security Research · Lateos

Adversarial robustness assessments for language models.

Structured prompt injection susceptibility testing across frontier and edge LLMs. Using the IPI Taxonomy to classify, quantify, and map attack surface to OWASP LLM categories. All findings follow responsible structural disclosure — no weaponized payloads published.

View Findings Methodology

Overview

What this research covers.

This research track assesses how well language models resist indirect prompt injection (IPI) — the class of attack where untrusted content in a model's context window attempts to redirect or override its operational behavior.

Each assessment runs a structured test harness against a target model. Tests are organized by IPI class (attack pattern) and delivery variant (direct, obfuscated, embedded), producing a quantified susceptibility rate against a reproducible taxonomy. Results are cross-referenced to the OWASP Top 10 for LLMs to give enterprise teams a compliance-mapped risk picture.

The goal is to give application developers and security teams architectural data — not exploit code. Knowing which attack classes a model resists vs. which it fails, and why, drives better middleware design, safer deployment patterns, and more informed model selection.

Methodology

Structural disclosure model.

All findings are published under a Structural Disclosure framework. This approach demonstrates technical validity through data flow analysis, architectural root cause identification, and abstract defensive unit-tests — without releasing the adversarial inputs themselves.

Withheld	Disclosed Instead	Operational Impact
Exact adversarial tokens & specific prompt strings	Data flow diagrams — the path of untrusted input through model context windows and tool boundaries	Protects production applications while exposing the underlying logic flaw
Weaponized payloads that trigger tool bypasses or arbitrary execution	OWASP LLM mapping — categorization via standard frameworks (LLM01 through LLM10)	Provides enterprise context for compliance and internal risk auditing
Live exploit scripts automating model exploitation	Validation unit-tests — safe, abstract dummy patterns allowing teams to verify if their middleware intercepts the class	Positions the research as builder-oriented, directly assisting defensive engineering teams

Responsible disclosure guardrail: If a finding affects a provider's live shared production API layer rather than an architectural pattern unique to a specific model tier, coordinated disclosure or official bug bounty submission precedes any public publication.

Findings Index

Published assessments.

IPI Assessment 2026-06-15 Critical findings

GPT-5 Nano — Prompt Injection Susceptibility Assessment

IPI Taxonomy v0.13 assessment across 210 test cases (21 classes × 10 variants; 9 inference failures excluded; 201 analyzed). Structural analysis of recursive instruction framing, MCP tool description poisoning, role hijacking, and steganographic injection. Full OWASP LLM mapping with defensive mitigations and validation unit-tests.

38.3% Overall susceptibility

210 Test cases (201 analyzed)

5 Classes >50% rate

IPI-010 Critical — 100% rate

→

Custom Testing

LLM adversarial testing available on request.

If you are building on top of a language model and need a structured adversarial assessment before shipping, or want to validate that your defensive middleware holds up against the IPI taxonomy, custom engagements are available. Testing is conducted against your target model or deployment configuration, not a generic benchmark environment.

What gets tested

Full IPI taxonomy coverage

Each engagement runs the full IPI Taxonomy test suite — 21 attack classes across 3 delivery variants (direct, obfuscated, embedded). Coverage spans prompt injection, steganographic payloads, tool-description poisoning, multi-agent propagation, unauthorized action induction, RAG corpus attacks, and role-boundary bypass patterns.

→ 21 IPI classes × 3 delivery variants
→ Rule-based verdict pass with LLM-judge escalation for ambiguous cases
→ Tested against your target — API endpoint, local deployment, or agent pipeline
→ Delivery variant breakdown identifies whether obfuscation or embedding changes the risk profile

What you receive

Structured findings report

The deliverable is a full structural disclosure report — the same format as the published assessments on this page. It quantifies susceptibility per class and severity tier, traces architectural root causes, maps findings to OWASP LLM Top 10, and provides immediately actionable mitigations for each failed class.

→ Susceptibility rate per class with severity breakdown (Critical / High / Medium)
→ Architectural root cause analysis for each failed class
→ OWASP LLM Top 10 cross-reference for compliance and audit readiness
→ Immediate and architectural mitigation recommendations per finding
→ Abstract validation unit-tests for your defensive middleware

Interested in an assessment? Reach out with your target model, deployment context, and any specific threat classes you want prioritized.

leo@lateos.ai →