Pavan Kumar Dubasi

Overview

Identity Control Arena is an evaluation environment built on the ControlArena framework to study identity delegation attacks in AI systems. The project was developed during the 48-hour AI Control Hackathon 2026, organized by Apart Research and Redwood Research.

Key Results

Format mismatch discovery: A critical finding where format inconsistencies between attacker and monitor silently zeroed out all monitoring scores, highlighting a subtle failure mode in AI control evaluations
Behavioral monitoring: The implemented behavioral monitor successfully catches 50% of identity delegation attacks at a threshold of 0.5, establishing a baseline for this attack category

Technical Details

The arena integrates with ControlArena's evaluation harness to test scenarios where an AI agent attempts to delegate its identity or permissions to unauthorized entities. The evaluation pipeline uses Claude API for both the attacker and monitor agents.

Context

Built as part of the broader AI control research community focused on understanding and mitigating risks from advanced AI systems that may attempt to circumvent oversight mechanisms.

Identity Control Arena

Highlights

Overview

Key Results

Technical Details

Context

Related Projects

Attestix

CodeSage

VibeMCP

Like what you see? Let's talk.

Identity Control ArenaIdentity Control Arena

Highlights

Overview

Key Results

Technical Details

Context

Related Projects

Attestix

CodeSage

VibeMCP

Like what you see? Let's talk.

Identity Control Arena