Testing Categories

Testing categories in Highflame RedTeam provide the organizational backbone for AI security testing. They define what is being tested, why it matters, and how attacks should be generated and evaluated. Rather than treating vulnerabilities as isolated test cases, RedTeam uses categories to group related risks into coherent security domains, enabling systematic, repeatable, and comprehensive assessments.

Unlike traditional red teaming tools that rely on static plugins, Highflame RedTeam uses a structured taxonomy that supports two complementary approaches: vulnerability-driven categories with predefined coverage, and engine-driven categories that dynamically generate attacks. This hybrid model allows teams to test both known failure modes and emergent behaviors using the same framework.

By organizing testing around categories, Highflame RedTeam enables a structured yet flexible approach to AI security. Teams can run targeted assessments, align testing with compliance requirements, and track risk trends over time—all while benefiting from dynamic attack generation driven by engines.

What Categories Represent

A category represents a security domain or risk area against which an AI application should be tested. Each category encompasses multiple vulnerabilities, and each vulnerability may map to several concrete attack types. This hierarchy allows RedTeam to reason about risk at different levels of abstraction—from individual failures to system-wide exposure.

Categories serve several purposes simultaneously. They guide attack generation, influence engine selection, structure reporting, and enable compliance mapping. For users, they provide an intuitive way to focus testing on specific concerns or to run broad assessments across all domains.

Category Types

Highflame RedTeam supports two primary category models:

Vulnerability-based categories define explicit sets of vulnerabilities and attack types. These categories are well-suited for domains where risks are clearly understood, and coverage can be enumerated in advance.

Engine-based categories focus on how attacks are generated rather than listing every possible failure. These categories rely on dynamic engine selection to explore the risk space more adaptively, making them particularly effective for areas such as prompt injection and jailbreaks, where attack patterns evolve rapidly.

Core Security Categories

Data Privacy

The Data Privacy category focuses on preventing unintended disclosure of sensitive information. It evaluates whether an AI system leaks personally identifiable information, confidential data, or internal system details through its outputs.

Testing in this category covers risks such as direct disclosure, memorization, session leakage, social manipulation, and unauthorized access to stored data. It also includes prompt leakage scenarios where system instructions, credentials, or internal logic are exposed. These tests are critical for applications operating in regulated environments or handling user data.

Responsible AI

The Responsible AI category evaluates whether an AI system behaves ethically, fairly, and in alignment with expected social norms. It focuses on bias, toxicity, and failures in moral reasoning that may not constitute traditional security vulnerabilities but still pose serious business and reputational risks.

This category tests for discriminatory behavior across protected characteristics, the generation of harmful or offensive content, and failures in ethical decision-making. It is especially relevant for customer-facing applications and systems that support or moderate decisions.

Agent Security

The Security category addresses technical vulnerabilities that arise when AI systems interact with tools, data, or infrastructure. It includes unauthorized access attempts, improper access control, and classic exploitation patterns such as injection attacks and server-side request forgery.

This category also covers supply chain risks, including compromised models, dependencies, or untrusted data sources, as well as data and model poisoning scenarios where training data is manipulated to introduce backdoors or biased behavior. Improper output handling—where raw model output is passed downstream without validation—is evaluated here as well.

Brand Image

The Brand Image category focuses on risks that can damage organizational reputation or competitive positioning. It evaluates whether an AI system produces misinformation, misrepresents expertise, discloses intellectual property, or performs actions beyond its intended scope.

This category also includes robustness testing, such as susceptibility to jailbreaks or context manipulation, and competitive risks, including scenarios in which generated content unfairly promotes or disparages competitors or reveals confidential strategy.

Illegal Risks

The Illegal Risks category ensures that AI systems do not generate content that facilitates illegal, violent, or otherwise harmful activities. It covers a broad range of scenarios, including criminal instruction, extremist content, explicit or graphic material, and behaviors that could endanger personal or public safety.

These tests are especially important for applications deployed at scale or exposed to untrusted user input, where misuse can carry legal and ethical consequences.

OWASP LLM Top 10 Categories

Highflame RedTeam includes dedicated categories aligned with the OWASP LLM Top 10 (2025). These categories are engine-driven and designed to dynamically explore high-risk failure modes identified by the broader security community.

Each OWASP category maps directly to one or more RedTeam vulnerabilities and leverages specialized engines to generate realistic attacks. For example, prompt injection categories emphasize instruction manipulation engines, whereas data poisoning categories focus on weaknesses in embedding and training data.

This alignment enables teams to demonstrate coverage against a recognized industry framework while leveraging RedTeam’s adaptive attack generation.

Category Taxonomy and Engine Integration

Categories in RedTeam follow a hierarchical taxonomy. Each category contains multiple vulnerabilities, and each vulnerability may be expressed through different attack techniques. Categories also influence how attacks are generated by providing engine hints.

Engine hints guide RedTeam toward appropriate attack strategies without requiring manual configuration. For example, security-focused categories prioritize prompt injection and adversarial engines, while Responsible AI categories emphasize bias and reasoning engines. This engine selection logic is preconfigured to provide strong default coverage while remaining extensible.

Reporting and Analytics

Categories play a central role in how RedTeam results are analyzed and communicated. Findings are aggregated at the category level, allowing teams to quickly understand which risk areas are most exposed.

Reports include category summaries, vulnerability breakdowns, time-series trend analysis, and mappings to compliance and risk frameworks. This structure makes it easier to prioritize remediation efforts and communicate risk to both technical teams and leadership.

Categories also support risk-based prioritization, allowing organizations to focus first on high-impact domains such as Security and Data Privacy, while still monitoring broader concerns like Responsible AI and Brand Image.

PreviousAttack Engines NextModel Red Teaming

Last updated 19 days ago

Good morning

hashtagWhat Categories Represent

hashtagCategory Types

hashtagCore Security Categories

hashtagData Privacy

hashtagResponsible AI

hashtagAgent Security

hashtagBrand Image

hashtagIllegal Risks

hashtagOWASP LLM Top 10 Categories

hashtagCategory Taxonomy and Engine Integration

hashtagReporting and Analytics