Skip to content

Add unified CUA template with multi-provider fallback#143

Open
masnwilliams wants to merge 10 commits intomainfrom
hypeship/unified-cua-template
Open

Add unified CUA template with multi-provider fallback#143
masnwilliams wants to merge 10 commits intomainfrom
hypeship/unified-cua-template

Conversation

@masnwilliams
Copy link
Copy Markdown
Contributor

@masnwilliams masnwilliams commented Apr 8, 2026

Summary

  • Adds a new cua template (TypeScript + Python) that consolidates the separate anthropic-computer-use, openai-computer-use, and gemini-computer-use templates into a single multi-provider template
  • Provider selection via CUA_PROVIDER env var, with automatic fallback via CUA_FALLBACK_PROVIDERS
  • Each provider adapter is self-contained with full agent loop implementation
  • Shared browser session lifecycle with replay recording support
  • Registered as a new template in templates.go for both TypeScript and Python

Structure

pkg/templates/{typescript,python}/cua/
  index.ts / main.py          — Kernel app entrypoint
  session.ts / session.py     — Browser session lifecycle
  providers/
    index.ts / __init__.py    — Provider factory + fallback
    anthropic.ts / .py        — Anthropic Claude adapter
    openai.ts / .py           — OpenAI GPT adapter
    gemini.ts / .py           — Google Gemini adapter

Test plan

  • go build ./... passes
  • go test ./pkg/create/... passes
  • kernel create shows "Unified CUA" template for both TS and Python
  • Deploy TS template with Anthropic provider and run a task
  • Deploy TS template with OpenAI provider and run a task
  • Deploy TS template with Gemini provider and run a task
  • Test fallback by setting an invalid primary key with valid fallback
  • Repeat above for Python template

🤖 Generated with Claude Code


Note

Medium Risk
Adds substantial new TypeScript/Python template code that orchestrates real browser automation via multiple LLM providers and environment-driven fallback, which can affect reliability and cost if misconfigured. Changes are largely additive and isolated to template scaffolding and template registration.

Overview
Adds a new Unified CUA template (cua) for both TypeScript and Python, including provider adapters for Anthropic/OpenAI/Gemini, a provider-resolution + fallback mechanism via CUA_PROVIDER/CUA_FALLBACK_PROVIDERS, and a shared browser session manager with optional replay recording.

Registers the new template in pkg/create/templates.go (including display ordering and sample deploy/invoke commands) and ships template project files (README, .env.example, dependency manifests, and gitignore) for both languages.

Reviewed by Cursor Bugbot for commit 143a6eb. Bugbot is set up for automated code reviews on this repo. Configure here.

@kernel-internal
Copy link
Copy Markdown
Contributor

🔧 CI Fix Available
I've pushed a fix for the CI failure by adding the missing _gitignore files for the new CUA templates.

👉 Click here to create a PR with the fix

@masnwilliams masnwilliams marked this pull request as ready for review April 8, 2026 20:09
Consolidates the separate anthropic-computer-use, openai-computer-use,
and gemini-computer-use templates into a single "cua" template that
supports all three providers with automatic fallback.

- TypeScript and Python templates with identical structure
- Provider selection via CUA_PROVIDER env var
- Optional fallback chain via CUA_FALLBACK_PROVIDERS
- Shared browser session lifecycle with replay support
- Each provider adapter is self-contained and customizable
- Registered as "cua" template in templates.go

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@masnwilliams masnwilliams force-pushed the hypeship/unified-cua-template branch from 99891de to 73255f9 Compare April 8, 2026 20:16
Provider resolution at module load crashes during Hypeman's build/discovery
phase when env vars aren't available. Use lazy initialization so providers
are resolved on first invocation instead.

Also fix TS type errors: narrow candidate.content in Gemini provider, cast
input items in OpenAI provider, simplify computer_call_output construction.

Made-with: Cursor
…odel inputs

- Bump all TS and Python deps to latest versions
- Fix Anthropic computer use: use computer_20251124 with computer-use-2025-11-24
  beta flag (claude-sonnet-4-6 requires the newer tool version)
- Fix OpenAI: add missing screenshot action handler
- Fix Python: correct SDK API (kernel.App), fix session.delete call, add
  missing openai dependency
- Restore provider and model as per-request payload overrides (were dropped
  in rewrite). Provider uses a typed enum (anthropic | openai | gemini).

Made-with: Cursor
… API, session delete

- Add missing screenshot action handler in Python OpenAI provider
- Use Part.from_function_response() instead of FunctionResponsePart() in
  Python Gemini provider (pydantic extra_forbidden in google-genai >=1.71)
- Fix session cleanup: use delete_by_id() instead of delete()

Made-with: Cursor
@masnwilliams masnwilliams requested a review from dprevoznik April 8, 2026 22:14
…n Gemini

- TS Anthropic: move SYSTEM_PROMPT to getSystemPrompt() function so the
  date is computed per-request instead of freezing at module load
- Python Gemini: include screenshot data as inline_data Part alongside
  function responses so the model can see action results
- Remove unused PREDEFINED_ACTIONS list from Python Gemini

Made-with: Cursor
…eout)

Add optional `browser` field to CUA payload for per-request browser
session configuration. Supports proxy_id, profile (id/name/save_changes),
extensions, and timeout_seconds. Viewport and stealth remain deploy-time
defaults since CUA providers depend on consistent viewport dimensions.

Made-with: Cursor
When session_id is provided in the payload, the CUA task uses that
existing browser session directly instead of creating a new one. The
caller is responsible for the session lifecycle. This lets users
pre-configure browsers with any settings and reuse sessions across tasks.

Made-with: Cursor
When an external session_id is provided, retrieve the browser's real
viewport dimensions via browsers.retrieve() instead of hardcoding
1280x800. This ensures coordinate mapping is correct regardless of
how the browser was created.

Made-with: Cursor
// Shared interface every provider adapter must implement.
export interface TaskOptions {
query: string;
model?: string;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each provider hardcodes model-specific API features that will break if you swap in a different model. Sharing Cursor's analysis, which is pretty aligned with what i experienced building other templates. I think it would be smart to at least have in the template which models from which providers are compatible. I don't think we need to lock it down to specific models, particularly if we have defaults.

Image

Cursor Reco
The model field should either be removed from the public payload (keep it as an env var only) or the template should validate model compatibility per provider before calling the API. At minimum, the README and input description should warn that only specific computer-use-capable models work. Right now a user seeing that free-text field in the dashboard will absolutely try claude-haiku-3-5 or gpt-4o and get a confusing 400 error.

break;
}
case 'scroll_document':
case 'scroll_at': {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Gemini provider in both ts-cua and python-cua is not likely optimized for scroll behavior. In the standalone templates, we have logic in place (though not perfect) to handle the fact that Gemini reports back a magnitude as the value in pixels.

Cursor summary of issue:
It's missing the magnitude ÷ 60 pixel-to-notch conversion and the max(1, min(17, ...)) clamp. When Gemini asks to scroll with its default magnitude (~400 pixels), the unified template will fire 400 wheel notches instead of 7. Both TS and Python have the same bug — they're consistent with each other, but both wrong compared to the standalone template.

Getting API exhaustion errors with gemini right now, but wanted to surface this. I put in the % 60 and clamp logic in the standalone templates. It likely could be improved

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behavior in anthropic and openai providers matches the standalone templates right now.

Copy link
Copy Markdown
Contributor

@dprevoznik dprevoznik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments. All three providers working on both ts and python versions. Though Gemini api exhaustion error made it so I couldn't test the scroll logic comment I made.

Prefer PageUp and PageDown in the provider prompts so long-page navigation is more reliable across the unified CUA templates.

Made-with: Cursor
@masnwilliams masnwilliams requested a review from dprevoznik April 13, 2026 22:00
const deltaY = dir === 'up' ? -magnitude : dir === 'down' ? magnitude : 0;
const deltaX = dir === 'left' ? -magnitude : dir === 'right' ? magnitude : 0;
await computer.scroll(sessionId, { x, y, delta_x: deltaX, delta_y: deltaY });
break;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gemini scroll missing pixel-to-notch conversion causes extreme scrolling

High Severity

The Gemini provider passes magnitude directly as the scroll delta, but Gemini reports magnitude in pixels (default ~400). The standalone gemini-computer-use template correctly converts pixels to notches via min(17, max(1, round(magnitude / 60))), yielding ~7 notches for a 400px scroll. The unified template skips this conversion entirely and also uses a wrong default of 3 instead of 400. When Gemini requests a scroll with its default magnitude (~400), the unified template fires 400 wheel notches instead of 7 — roughly 57× too much scrolling, making the Gemini provider essentially unusable for any scrolling task.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit f9cb9c7. Configure here.

Bring in main branch template updates and resolve the registry overlap by keeping both the unified CUA and Tzafon template entries.

Made-with: Cursor
@socket-security
Copy link
Copy Markdown

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addednpm/​@​onkernel/​sdk@​0.47.07210010097100
Addednpm/​@​anthropic-ai/​sdk@​0.86.17310088100100
Addednpm/​openai@​6.34.074100100100100

View full report

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

There are 3 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 143a6eb. Configure here.

keys: keys.length > 0 ? keys : parts,
...(holdKeys.length > 0 ? { hold_keys: holdKeys } : {}),
});
break;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gemini key_combination reads wrong argument name

High Severity

The key_combination handler reads args.key_combination (TS) / args.get("key_combination") (Python), but the Gemini API sends the key string in a parameter named keys. The standalone template correctly reads args.keys. This means combo will always be an empty string, and all keyboard shortcuts requested by the model will silently fail.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 143a6eb. Configure here.

const ex = this.denormalize(args.end_x, width);
const ey = this.denormalize(args.end_y, height);
await computer.dragMouse(sessionId, { path: [[sx, sy], [ex, ey]] });
break;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gemini drag_and_drop uses wrong parameter names

Medium Severity

The drag_and_drop handler reads start_x, start_y, end_x, end_y from the args, but the Gemini API sends x, y, destination_x, destination_y. The standalone template correctly reads the latter names. All four coordinates will resolve to 0 (the fallback for undefined), making every drag go from (0,0) to (0,0).

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 143a6eb. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants