Managing cloud AI agents

AI

Side Project

Website

Scope

Research, UX/UI Design, Vibe Coding

Role

Designer

Timeline

2026

Read

4 min

Intro

AI agents can generate code, but they don’t share the context developers rely on. Most of this context stays implicit and is not structured or accessible to the system.

As a result, agents produce output without understanding the goal, developers spend time correcting it, and trust breaks down. AI speeds up code generation, but it does not create shared understanding.

From prompts to context

To produce meaningful results, agents need access to the same context developers use. This includes the origin of the task, product constraints, expectations of what “good” looks like, and decisions made during development.

Today, this context is scattered across tools like documentation, chats, and tickets, and rarely forms a complete picture. To compensate for this, developers often have to manually reconstruct context in prompts, which takes time and still leaves room for missing details.

Agents need direct access to this ecosystem, not through prompts, but through structured context. Approaches like MCP enable this by connecting agents to the same tools developers use.

If a system works for engineers at scale, it should work for agents as well.

How it works

Core challenge

Designing how teams manage fully autonomous AI agents

Designing how development teams manage AI agents,
not just run them

Research & Insights

Stripe's Minions

I started by exploring Stripe’s approach to cloud-based agents, known as Minions.

These agents can be launched from Slack or other internal tools, operate with access to context, and execute tasks in the background.

However, the interface reveals a limitation. It works well for individual runs, but breaks down when multiple agents are running in parallel.

At that point, it becomes difficult to understand what each agent is doing, how tasks are progressing, and how their outputs relate to each other.

This exposes a gap between execution and management. While agents can perform tasks, developers still lack a clear way to oversee them as a system.

Stripe's Devbox Interface

Agents are not roles

At first, I explored structuring agents as roles, similar to a real-world team — Frontend Engineer, Backend Engineer, CTO. Each agent would be responsible for a specific domain and run tasks in parallel. This is the same as Paperclip approach.

However, after discussing this approach with developers, it became clear that this model does not reflect how real work happens. Developers don’t think in rigid role boundaries. A single task often spans multiple domains, from frontend to backend and beyond.

Treating agents as fixed roles introduces unnecessary constraints and limits their usefulness. Instead, the system should focus on tasks and context, allowing agents to adapt to the problem rather than being defined by predefined categories.

First iterations

Final soluiton

Separating execution from observability

It also became clear that combining execution and observability layers in a single interface does not work. When everything is treated as equally important, nothing stands out.

The goal shifted towards clarity — helping developers immediately understand what requires their attention. At any moment, they need to know what agents are doing, which tasks are ready for review, and where intervention is required.

To support this, the primary unit of the interface became the run. Each run provides a clear snapshot of the task, including its current state, related code changes, execution context, and the next possible action.

Task status transitions

In some cases, an agent requires human approval before continuing its work. This can happen for various reasons, such as performing sensitive actions or making impactful changes.

Once approval is given, the agent resumes execution. At this point, it is important for developers to clearly see the transition from one state to another.

To support this, status changes are designed as smooth transitions, making it easier to follow what happened and understand the agent’s current state.

What is "Agent run"?

An agent run is a single execution of an autonomous agent that completes a task from start to finish

Designing how development teams manage AI agents,
not just run them

An agent run is a single execution of an autonomous agent that completes a task from start to finish

Inside a run

Each run represents a single execution of an agent.

Instead of a chat interface, it operates with predefined context from Slack, so the task and expected outcome are already known.

The main view presents activity as a structured log, showing actions and resulting code changes.

Developers can still guide the process by adding instructions in the sidebar, while also accessing the original prompt, related sources, and repository state.

Once the run is complete, the system highlights what requires human review and provides a summary, allowing developers to move directly to creating a pull request.

Create pull request from finished task

Once a run is complete, the developer reviews the changes produced by the agent.

If everything looks correct, they can create a pull request directly from the interface without leaving the workflow.

This keeps the transition from execution to review and shipping fast and uninterrupted.

Also check out…

Playdex ― NFT Rental Marketplace

Steamify ― Skins Cashout

Create a free website with Framer, the website builder loved by startups, designers and agencies.