Skip to main content

Calling the Vertex AI Gemini API from PowerShell

· 4 min read

This covers how to call Gemini models via Google Cloud's Vertex AI from PowerShell. Both the OpenAI-compatible endpoint and the native Gemini endpoint are explained.

Authentication

No API key required. Uses your existing Google Cloud credentials.

$accessToken = (gcloud auth print-access-token)

API key

$apiKey = $env:VERTEX_API_KEY

Endpoints

OpenAI-compatible endpoint (gcloud auth)

https://{region}-aiplatform.googleapis.com/v1beta1/projects/{projectId}/locations/{region}/endpoints/openapi/chat/completions

The request and response format is identical to the OpenAI API. The model name requires a google/ prefix (e.g., google/gemini-2.5-flash-lite).

Native Gemini endpoint (API key)

https://{region}-aiplatform.googleapis.com/v1/projects/{projectId}/locations/{region}/publishers/google/models/{model}:generateContent

For streaming, use :streamGenerateContent.

Basic calls

OpenAI-compatible (gcloud auth)

$projectId   = "your-project-id"
$region = "us-central1"
$model = "google/gemini-2.5-flash-lite"
$accessToken = (gcloud auth print-access-token)

$body = @{
model = $model
messages = @(
@{
role = "user"
content = "What is the population of Tokyo?"
}
)
} | ConvertTo-Json -Depth 10

$uri = "https://$region-aiplatform.googleapis.com/v1beta1/projects/$projectId/locations/$region/endpoints/openapi/chat/completions"

$response = Invoke-RestMethod `
-Uri $uri `
-Method Post `
-ContentType "application/json" `
-Headers @{ Authorization = "Bearer $accessToken" } `
-Body $body

$response.choices[0].message.content

Native Gemini (API key)

$projectId = "your-project-id"
$region = "us-central1"
$model = "gemini-2.5-flash-lite"
$apiKey = $env:VERTEX_API_KEY

$body = @{
contents = @(
@{
role = "user"
parts = @(
@{ text = "What is the population of Tokyo?" }
)
}
)
} | ConvertTo-Json -Depth 10

$uri = "https://$region-aiplatform.googleapis.com/v1/projects/$projectId/locations/$region/publishers/google/models/${model}:generateContent?key=$apiKey"

$response = Invoke-RestMethod `
-Uri $uri `
-Method Post `
-ContentType "application/json" `
-Body $body

$response.candidates[0].content.parts[0].text

Response structure

OpenAI-compatible

$response.choices[0].message.content  # generated text
$response.usage.total_tokens # total token count
$response.model # model used

Native Gemini

$response.candidates[0].content.parts[0].text  # generated text
$response.usageMetadata.totalTokenCount # total token count
$response.modelVersion # model version used

For streaming (streamGenerateContent), an array of chunks is returned. Concatenate them to retrieve the full text.

$fullText = ($response | ForEach-Object {
$_.candidates[0].content.parts[0].text
}) -join ""

Adding a system prompt

OpenAI-compatible

$body = @{
model = $model
messages = @(
@{
role = "system"
content = "You are an AI assistant that responds in Japanese. Answer concisely."
}
@{
role = "user"
content = "What is the speed of light?"
}
)
} | ConvertTo-Json -Depth 10

Native Gemini

$body = @{
system_instruction = @{
parts = @(
@{ text = "You are an AI assistant that responds in Japanese. Answer concisely." }
)
}
contents = @(
@{
role = "user"
parts = @(@{ text = "What is the speed of light?" })
}
)
} | ConvertTo-Json -Depth 10

Multi-turn conversation

Place the conversation history in an array to achieve multi-turn conversation.

OpenAI-compatible

$body = @{
model = $model
messages = @(
@{ role = "user"; content = "Do you prefer cats or dogs?" }
@{ role = "assistant"; content = "I prefer cats." }
@{ role = "user"; content = "Why is that?" }
)
} | ConvertTo-Json -Depth 10

Native Gemini

The assistant role is specified as "model".

$body = @{
contents = @(
@{
role = "user"
parts = @(@{ text = "Do you prefer cats or dogs?" })
}
@{
role = "model"
parts = @(@{ text = "I prefer cats." })
}
@{
role = "user"
parts = @(@{ text = "Why is that?" })
}
)
} | ConvertTo-Json -Depth 10

Available models

ModelOpenAI-compatible nameDescription
gemini-2.5-flash-litegoogle/gemini-2.5-flash-liteLightweight, fast, low-cost
gemini-2.5-flashgoogle/gemini-2.5-flashBalanced
gemini-2.5-progoogle/gemini-2.5-proHigh-precision, for complex tasks

Which approach to use

SituationRecommended approach
GCP-authenticated environment (dev, CI, etc.)OpenAI-compatible + gcloud auth
Only an API key availableNative Gemini
Migrating from OpenAIOpenAI-compatible (minimizes code changes)
Streaming requiredNative Gemini

Notes

  • Do not hardcode the API key in scripts; load it from an environment variable ($env:VERTEX_API_KEY).
  • With gcloud auth, tokens expire in about 1 hour. Long-running scripts should refresh the token as needed.
  • Each project has its own rate limits and quotas. Check them before sending large numbers of requests.

Comments

Loading...

Post a Comment