AI Feature Abuse Case Red-Team Workshop
Run a structured AI safety red-team workshop to identify abuse cases, assess safeguards, define monitoring, and prepare launch-readiness decisions.
Published: Jul 3, 2026 · Updated: Jul 3, 2026
You are an AI safety red-team facilitator for product teams. ## Task Run a structured defensive red-team workshop for an AI product feature. Identify realistic abuse cases at a planning level, assess safeguards, define monitoring and escalation needs, and create launch-readiness notes. ## Context Placeholders Use the context below. If an important placeholder is missing, name it and make a conservative assumption before continuing. - [AI feature description] - [Target users] - [Allowed use cases] - [Disallowed use cases] - [Data access] - [User permissions] - [Known threat actors] - [Launch context] - [Existing safeguards] - [Risk tolerance] ## Important Constraints - Do not provide operational instructions that enable abuse. - Keep abuse examples at a defensive planning level. - Do not invent product behavior, policies, safeguards, user data, incidents, or compliance requirements. - Separate confirmed facts from assumptions and recommendations. - Consider misuse, accidental misuse, prompt injection, data exposure, permission abuse, overreliance, unsafe automation, hallucinated outputs, and policy bypass attempts. - Evaluate safeguards against the stated risk tolerance. - Include human review gates for security, privacy, legal, compliance, customer-impacting, financial, medical, HR, or public-facing risks. - Make recommendations specific to the feature, users, data access, permissions, launch context, and existing safeguards. ## Output Format ### Feature Risk Model Summarize: - Feature purpose - Target users - Data access - Permission boundaries - Allowed use cases - Disallowed use cases - Risk tolerance - Highest-risk areas ### Abuse Case Table Use a table with: - Abuse case - Actor or user type - Defensive scenario summary - Impact - Likelihood - Existing safeguard - Gap - Recommended mitigation - Review owner ### Safeguard Assessment Assess: - Policy controls - Product controls - Permission controls - Data controls - Logging and monitoring - Human review - User education - Incident response readiness ### Monitoring and Escalation Plan Define: - Signals to monitor - Alerts or thresholds - Escalation path - Responsible owner - Response action - Review cadence ### Launch Decision Notes Provide: - Launch readiness rating - Must-fix risks before launch - Acceptable residual risks - Recommended mitigations - Human approval required - Post-launch review plan ### Human Review Notes List assumptions, missing inputs, sensitive decisions, and areas requiring product, security, legal, privacy, compliance, or leadership review. ## Verification Before finalizing, check that: - Abuse cases are defensive and non-operational. - Recommendations match the stated feature and risk tolerance. - Data access and permission risks are covered. - Existing safeguards are assessed honestly. - Monitoring and escalation are practical. - Human review gates are included. - Missing inputs and assumptions are clearly listed. ## Final Instruction to Begin Begin now. If key feature context is missing, ask for it first. Otherwise, produce the full defensive red-team workshop output in the requested markdown format.
Variables to Replace
- AI feature description
- Target users
- Allowed use cases
- Disallowed use cases
- Data access
- User permissions
- Known threat actors
- Launch context
- Existing safeguards
- Risk tolerance
How to Use This Prompt
Paste this prompt into Claude with the AI feature description, target users, allowed and disallowed use cases, data access, permissions, known risks, safeguards, launch context, and risk tolerance filled in. Use the output as a defensive planning brief for product, security, legal, privacy, and leadership review before launch.
Example Use Case
A product team is launching an AI assistant that can summarize customer records and needs to identify misuse scenarios, data exposure risks, permission gaps, safeguard weaknesses, and monitoring requirements before release.