How do I point Claude Code at our Google Cloud Vertex AI endpoint?

Question

Kalle Lamminpää · Accepted Answer

Running Claude Code on Vertex AI keeps inference inside your Google Cloud project, subject to your GCP data agreements. The setup is simpler than Bedrock: three environment variables and a gcloud auth call.

Step 1: Authenticate with Google Cloud

For service accounts in CI:

The principal needs the Vertex AI User role (roles/aiplatform.user) on your project.

Step 2: Point Claude Code at Vertex

CLOUDMLREGION=global uses Google's global endpoint with automatic region routing. For a specific region, set it to us-central1, europe-west4, or whichever region has your model quota.

Step 3: Configure model IDs (optional)

Claude Code picks sensible defaults. To pin specific model versions:

Vertex model IDs use @ as the version separator instead of - and do not include a v1:0 suffix.

Credential rotation

GCP application default credentials expire after 1 hour unless refreshed. For sessions longer than an hour:

Set this in .claude/settings.json. The --quiet flag skips the browser prompt, which is required for non-interactive refresh. Use gcloud auth activate-service-account for service accounts that do not support browser-based refresh.

Per-model region overrides

To route specific models to specific regions (useful when quota is region-restricted):

The naming convention is VERTEXREGIONCLAUDE followed by the model name in SCREAMINGSNAKECASE. These override CLOUDMLREGION for that specific model only.

Enable MCP tool search

MCP tool search is disabled on Vertex by default. To enable it:

The reason it is off by default: tool search makes an extra API call per session to index available tools, which adds latency and cost that may be undesirable in high-volume Vertex deployments. Enable it explicitly if you need Claude to discover and use MCP tools.

What Vertex does NOT support

These features require claude.ai infrastructure. Vertex runs purely local CLI mode.

Footguns

CLOUDMLREGION is required. Unlike Bedrock, where the region is usually set via AWS config, Vertex needs it as an explicit environment variable. Without it, Claude Code fails to find the Vertex endpoint.

gcloud auth application-default login requires browser access. In a headless CI environment, use a service account with GOOGLEAPPLICATIONCREDENTIALS. The interactive browser flow hangs in a terminal without a display.

Application default credentials expire after 1 hour. A session running autonomously for 90 minutes will fail mid-task with an authentication error. Set gcpAuthRefresh before starting any session you expect to run longer than 45 minutes.

The global endpoint routes to the nearest region, not necessarily where your quota is. If you have Sonnet quota in us-central1 but the global endpoint routes to europe-west4, requests fail with quota errors. Set CLOUDMLREGION to your quota region explicitly, or use VERTEXREGIONCLAUDE* overrides per model.

MCP tool search is silently off on Vertex. If your Claude Code sessions seem unable to find MCP tools that work fine on claude.ai, check whether enableMcpToolSearch is set. There is no warning when it is disabled.

When NOT to use Vertex

You need fast mode or ultraplan. Use claude.ai directly.
You need push notifications for mobile monitoring. Not available on Vertex.
You are in a GCP region with no Claude model quota. Check quota in the Vertex AI console before deploying. Quota requests can take days to approve.
Your team uses Bedrock already. Running two cloud providers doubles configuration surface; pick one.

Feature	Available on Vertex
Fast mode	No
Ultraplan	No
Remote Control	No
Push notifications	No
MCP tool search (default)	Off (enable explicitly)

How do I point Claude Code at our Google Cloud Vertex AI endpoint?

Step 1: Authenticate with Google Cloud

Step 2: Point Claude Code at Vertex

Step 3: Configure model IDs (optional)

Credential rotation

Per-model region overrides

Enable MCP tool search

What Vertex does NOT support

Footguns

When NOT to use Vertex

Sources

Read more