Skip to main content

One post tagged with "GCP"

View all tags

Calling the Vertex AI Gemini API from PowerShell

This covers how to call Gemini models via Google Cloud's Vertex AI from PowerShell. Both the OpenAI-compatible endpoint and the native Gemini endpoint are explained.

Authentication

No API key required. Uses your existing Google Cloud credentials.

$accessToken = (gcloud auth print-access-token)

API key

$apiKey = $env:VERTEX_API_KEY

Endpoints

OpenAI-compatible endpoint (gcloud auth)

https://{region}-aiplatform.googleapis.com/v1beta1/projects/{projectId}/locations/{region}/endpoints/openapi/chat/completions

The request and response format is identical to the OpenAI API. The model name requires a google/ prefix (e.g., google/gemini-2.5-flash-lite).

Native Gemini endpoint (API key)

https://{region}-aiplatform.googleapis.com/v1/projects/{projectId}/locations/{region}/publishers/google/models/{model}:generateContent

For streaming, use :streamGenerateContent.

Basic calls

OpenAI-compatible (gcloud auth)

$projectId = "your-project-id"
$region = "us-central1"
$model = "google/gemini-2.5-flash-lite"
$accessToken = (gcloud auth print-access-token)

$body = @{
model = $model
messages = @(
@{
role = "user"
content = "What is the population of Tokyo?"
}
)
} | ConvertTo-Json -Depth 10

$uri = "https://$region-aiplatform.googleapis.com/v1beta1/projects/$projectId/locations/$region/endpoints/openapi/chat/completions"

$response = Invoke-RestMethod `
-Uri $uri `
-Method Post `
-ContentType "application/json" `
-Headers @{ Authorization = "Bearer $accessToken" } `
-Body $body

$response.choices[0].message.content

Native Gemini (API key)

$projectId = "your-project-id"
$region = "us-central1"
$model = "gemini-2.5-flash-lite"
$apiKey = $env:VERTEX_API_KEY

$body = @{
contents = @(
@{
role = "user"
parts = @(
@{ text = "What is the population of Tokyo?" }
)
}
)
} | ConvertTo-Json -Depth 10

$uri = "https://$region-aiplatform.googleapis.com/v1/projects/$projectId/locations/$region/publishers/google/models/${model}:generateContent?key=$apiKey"

$response = Invoke-RestMethod `
-Uri $uri `
-Method Post `
-ContentType "application/json" `
-Body $body

$response.candidates[0].content.parts[0].text

Response structure

OpenAI-compatible

$response.choices[0].message.content # generated text
$response.usage.total_tokens # total token count
$response.model # model used

Native Gemini

$response.candidates[0].content.parts[0].text # generated text
$response.usageMetadata.totalTokenCount # total token count
$response.modelVersion # model version used

For streaming (streamGenerateContent), an array of chunks is returned. Concatenate them to retrieve the full text.

$fullText = ($response | ForEach-Object {
$_.candidates[0].content.parts[0].text
}) -join ""

Adding a system prompt

OpenAI-compatible

$body = @{
model = $model
messages = @(
@{
role = "system"
content = "You are an AI assistant that responds in Japanese. Answer concisely."
}
@{
role = "user"
content = "What is the speed of light?"
}
)
} | ConvertTo-Json -Depth 10

Native Gemini

$body = @{
system_instruction = @{
parts = @(
@{ text = "You are an AI assistant that responds in Japanese. Answer concisely." }
)
}
contents = @(
@{
role = "user"
parts = @(@{ text = "What is the speed of light?" })
}
)
} | ConvertTo-Json -Depth 10

Multi-turn conversation

Place the conversation history in an array to achieve multi-turn conversation.

OpenAI-compatible

$body = @{
model = $model
messages = @(
@{ role = "user"; content = "Do you prefer cats or dogs?" }
@{ role = "assistant"; content = "I prefer cats." }
@{ role = "user"; content = "Why is that?" }
)
} | ConvertTo-Json -Depth 10

Native Gemini

The assistant role is specified as "model".

$body = @{
contents = @(
@{
role = "user"
parts = @(@{ text = "Do you prefer cats or dogs?" })
}
@{
role = "model"
parts = @(@{ text = "I prefer cats." })
}
@{
role = "user"
parts = @(@{ text = "Why is that?" })
}
)
} | ConvertTo-Json -Depth 10

Available models

ModelOpenAI-compatible nameDescription
gemini-2.5-flash-litegoogle/gemini-2.5-flash-liteLightweight, fast, low-cost
gemini-2.5-flashgoogle/gemini-2.5-flashBalanced
gemini-2.5-progoogle/gemini-2.5-proHigh-precision, for complex tasks

Which approach to use

SituationRecommended approach
GCP-authenticated environment (dev, CI, etc.)OpenAI-compatible + gcloud auth
Only an API key availableNative Gemini
Migrating from OpenAIOpenAI-compatible (minimizes code changes)
Streaming requiredNative Gemini

Notes

  • Do not hardcode the API key in scripts; load it from an environment variable ($env:VERTEX_API_KEY).
  • With gcloud auth, tokens expire in about 1 hour. Long-running scripts should refresh the token as needed.
  • Each project has its own rate limits and quotas. Check them before sending large numbers of requests.