Running Claude Code on Vertex AI keeps inference inside your Google Cloud project, subject to your GCP data agreements. The setup is simpler than Bedrock: three environment variables and a gcloud auth call.
Step 1: Authenticate with Google Cloud
gcloud auth application-default login
For service accounts in CI:
gcloud auth activate-service-account --key-file=service-account.json
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
The principal needs the Vertex AI User role (roles/aiplatform.user) on your project.
Step 2: Point Claude Code at Vertex
export CLAUDE_CODE_USE_VERTEX=1
export CLOUD_ML_REGION=global
export ANTHROPIC_VERTEX_PROJECT_ID=your-gcp-project-id
claude
CLOUD_ML_REGION=global uses Google’s global endpoint with automatic region routing. For a specific region, set it to us-central1, europe-west4, or whichever region has your model quota.
Step 3: Configure model IDs (optional)
Claude Code picks sensible defaults. To pin specific model versions:
export ANTHROPIC_MODEL=claude-sonnet-4-6@20251001
export ANTHROPIC_SMALL_FAST_MODEL=claude-haiku-4-5@20251001
Vertex model IDs use @ as the version separator instead of - and do not include a v1:0 suffix.
Credential rotation
GCP application default credentials expire after 1 hour unless refreshed. For sessions longer than an hour:
{
"gcpAuthRefresh": "gcloud auth application-default login --quiet"
}
Set this in .claude/settings.json. The --quiet flag skips the browser prompt, which is required for non-interactive refresh. Use gcloud auth activate-service-account for service accounts that do not support browser-based refresh.
Per-model region overrides
To route specific models to specific regions (useful when quota is region-restricted):
export VERTEX_REGION_CLAUDE_OPUS_4_6=us-central1
export VERTEX_REGION_CLAUDE_SONNET_4_6=europe-west4
export VERTEX_REGION_CLAUDE_HAIKU_4_5=us-east4
The naming convention is VERTEX_REGION_CLAUDE_ followed by the model name in SCREAMING_SNAKE_CASE. These override CLOUD_ML_REGION for that specific model only.
Enable MCP tool search
MCP tool search is disabled on Vertex by default. To enable it:
{
"enableMcpToolSearch": true
}
The reason it is off by default: tool search makes an extra API call per session to index available tools, which adds latency and cost that may be undesirable in high-volume Vertex deployments. Enable it explicitly if you need Claude to discover and use MCP tools.
What Vertex does NOT support
| Feature | Available on Vertex |
|---|---|
| Fast mode | No |
| Ultraplan | No |
| Remote Control | No |
| Push notifications | No |
| MCP tool search (default) | Off (enable explicitly) |
These features require claude.ai infrastructure. Vertex runs purely local CLI mode.
Footguns
CLOUD_ML_REGION is required. Unlike Bedrock, where the region is usually set via AWS config, Vertex needs it as an explicit environment variable. Without it, Claude Code fails to find the Vertex endpoint.
gcloud auth application-default login requires browser access. In a headless CI environment, use a service account with GOOGLE_APPLICATION_CREDENTIALS. The interactive browser flow hangs in a terminal without a display.
Application default credentials expire after 1 hour. A session running autonomously for 90 minutes will fail mid-task with an authentication error. Set gcpAuthRefresh before starting any session you expect to run longer than 45 minutes.
The global endpoint routes to the nearest region, not necessarily where your quota is. If you have Sonnet quota in us-central1 but the global endpoint routes to europe-west4, requests fail with quota errors. Set CLOUD_ML_REGION to your quota region explicitly, or use VERTEX_REGION_CLAUDE_* overrides per model.
MCP tool search is silently off on Vertex. If your Claude Code sessions seem unable to find MCP tools that work fine on claude.ai, check whether enableMcpToolSearch is set. There is no warning when it is disabled.
When NOT to use Vertex
- You need fast mode or ultraplan. Use claude.ai directly.
- You need push notifications for mobile monitoring. Not available on Vertex.
- You are in a GCP region with no Claude model quota. Check quota in the Vertex AI console before deploying. Quota requests can take days to approve.
- Your team uses Bedrock already. Running two cloud providers doubles configuration surface; pick one.