AnswerQA

How do I point Claude Code at our Google Cloud Vertex AI endpoint?

Answer

Set three environment variables, authenticate with gcloud, and configure gcpAuthRefresh for sessions longer than an hour. MCP tool search is off by default on Vertex. Per-model region overrides use VERTEX_REGION_CLAUDE_* variables.

By Kalle Lamminpää Verified May 12, 2026

Running Claude Code on Vertex AI keeps inference inside your Google Cloud project, subject to your GCP data agreements. The setup is simpler than Bedrock: three environment variables and a gcloud auth call.

Step 1: Authenticate with Google Cloud

gcloud auth application-default login

For service accounts in CI:

gcloud auth activate-service-account --key-file=service-account.json
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json

The principal needs the Vertex AI User role (roles/aiplatform.user) on your project.

Step 2: Point Claude Code at Vertex

export CLAUDE_CODE_USE_VERTEX=1
export CLOUD_ML_REGION=global
export ANTHROPIC_VERTEX_PROJECT_ID=your-gcp-project-id
claude

CLOUD_ML_REGION=global uses Google’s global endpoint with automatic region routing. For a specific region, set it to us-central1, europe-west4, or whichever region has your model quota.

Step 3: Configure model IDs (optional)

Claude Code picks sensible defaults. To pin specific model versions:

export ANTHROPIC_MODEL=claude-sonnet-4-6@20251001
export ANTHROPIC_SMALL_FAST_MODEL=claude-haiku-4-5@20251001

Vertex model IDs use @ as the version separator instead of - and do not include a v1:0 suffix.

Credential rotation

GCP application default credentials expire after 1 hour unless refreshed. For sessions longer than an hour:

{
  "gcpAuthRefresh": "gcloud auth application-default login --quiet"
}

Set this in .claude/settings.json. The --quiet flag skips the browser prompt, which is required for non-interactive refresh. Use gcloud auth activate-service-account for service accounts that do not support browser-based refresh.

Per-model region overrides

To route specific models to specific regions (useful when quota is region-restricted):

export VERTEX_REGION_CLAUDE_OPUS_4_6=us-central1
export VERTEX_REGION_CLAUDE_SONNET_4_6=europe-west4
export VERTEX_REGION_CLAUDE_HAIKU_4_5=us-east4

The naming convention is VERTEX_REGION_CLAUDE_ followed by the model name in SCREAMING_SNAKE_CASE. These override CLOUD_ML_REGION for that specific model only.

MCP tool search is disabled on Vertex by default. To enable it:

{
  "enableMcpToolSearch": true
}

The reason it is off by default: tool search makes an extra API call per session to index available tools, which adds latency and cost that may be undesirable in high-volume Vertex deployments. Enable it explicitly if you need Claude to discover and use MCP tools.

What Vertex does NOT support

FeatureAvailable on Vertex
Fast modeNo
UltraplanNo
Remote ControlNo
Push notificationsNo
MCP tool search (default)Off (enable explicitly)

These features require claude.ai infrastructure. Vertex runs purely local CLI mode.

Footguns

CLOUD_ML_REGION is required. Unlike Bedrock, where the region is usually set via AWS config, Vertex needs it as an explicit environment variable. Without it, Claude Code fails to find the Vertex endpoint.

gcloud auth application-default login requires browser access. In a headless CI environment, use a service account with GOOGLE_APPLICATION_CREDENTIALS. The interactive browser flow hangs in a terminal without a display.

Application default credentials expire after 1 hour. A session running autonomously for 90 minutes will fail mid-task with an authentication error. Set gcpAuthRefresh before starting any session you expect to run longer than 45 minutes.

The global endpoint routes to the nearest region, not necessarily where your quota is. If you have Sonnet quota in us-central1 but the global endpoint routes to europe-west4, requests fail with quota errors. Set CLOUD_ML_REGION to your quota region explicitly, or use VERTEX_REGION_CLAUDE_* overrides per model.

MCP tool search is silently off on Vertex. If your Claude Code sessions seem unable to find MCP tools that work fine on claude.ai, check whether enableMcpToolSearch is set. There is no warning when it is disabled.

When NOT to use Vertex

  • You need fast mode or ultraplan. Use claude.ai directly.
  • You need push notifications for mobile monitoring. Not available on Vertex.
  • You are in a GCP region with no Claude model quota. Check quota in the Vertex AI console before deploying. Quota requests can take days to approve.
  • Your team uses Bedrock already. Running two cloud providers doubles configuration surface; pick one.

Sources

Was this helpful?