Skip to main content

Configuration of Passkeys (WebAuthn) using Amazon Cognito

I am using Amazon Cognito for user authentication in a file storage API built with AWS SAM. Recently, I added login via passkeys (WebAuthn), so I will summarize the configuration details.

Prerequisites: Required Cognito Settings for Passkeys

To use passkeys with Cognito, the following must all be in place:

RequirementCurrent Configuration
UserPool TierESSENTIALS or higher
Managed Loginv2 (New login UI)
Custom Domainlogin.example.com (Used as Relying Party ID)

Cognito's passkeys will be registered and used through the Managed Login v2 UI. WebAuthn cannot be used with the LITE tier (free), so the ESSENTIALS tier is necessary.

Authentication Flow

Passkey Registration Flow

For the first time, log in with a password and register the passkey from the account settings.

Passkey Login Flow

After registration, authentication can be done directly via the "Sign in with passkey" button.

Configuration Details

The changes made to the template.yaml (SAM template) for adding the passkey amount to just 6 lines.

Before Changes

UserPool:
Type: AWS::Cognito::UserPool
Properties:
# ...
Policies:
PasswordPolicy:
MinimumLength: 8
# ...
MfaConfiguration: "OFF"

After Changes

UserPool:
Type: AWS::Cognito::UserPool
Properties:
# ...
Policies:
PasswordPolicy:
MinimumLength: 8
# ...
SignInPolicy:
AllowedFirstAuthFactors:
- PASSWORD
- WEB_AUTHN # ← Added passkey
MfaConfiguration: "OFF"
WebAuthnRelyingPartyID: login.example.com # ← Specify RP ID
WebAuthnUserVerification: required # ← Require biometric verification

Explanation of Each Parameter

SignInPolicy.AllowedFirstAuthFactors

This is the list of authentication methods that can be used during the first authentication step. With only PASSWORD, it allows password-only authentication; adding WEB_AUTHN allows passkeys as an option.

WebAuthnRelyingPartyID

This is the Relying Party ID (RP ID) for WebAuthn. Passkeys are generated and stored associated with this domain, so it must match the domain serving the actual login page.

In this case, I have directly specified the custom domain login.example.com. If you are using the Cognito default domain (xxx.auth.ap-northeast-1.amazoncognito.com), specify that one.

WebAuthnUserVerification

This defines the required level of user verification when using passkeys.

ValueDescription
requiredRequires biometric authentication or PIN
preferredPrefer user verification but allow even without it
discouragedSkip user verification (no biometric, etc.)

To enhance security, I chose required.

Managed Login UI

In the Managed Login v2 interface, after configuring the passkey, the "Sign in with passkey" button will be automatically added to the login screen. For initial registration, you can add a passkey from the account settings after logging in with a password.

Deployment

sam build
sam deploy --no-confirm-changeset

Since the stack name, region, and parameters are defined in samconfig.toml, there is no need to specify options each time.

Conclusion

The key points for enabling passkeys in Cognito are:

  1. Set to ESSENTIALS tier (LITE does not support WebAuthn)
  2. Use Managed Login v2
  3. Specify a custom domain (or the Cognito default domain) as the RP ID
  4. Add WEB_AUTHN to SignInPolicy.AllowedFirstAuthFactors
  5. Set WebAuthnUserVerification: required to make biometric verification mandatory

With just 6 lines of changes, passkey login has become available. The convenience of Cognito lies in the ability to gradually transition to passkeys while still retaining passwords.

Calling the Vertex AI Gemini API from PowerShell

This covers how to call Gemini models via Google Cloud's Vertex AI from PowerShell. Both the OpenAI-compatible endpoint and the native Gemini endpoint are explained.

Authentication

No API key required. Uses your existing Google Cloud credentials.

$accessToken = (gcloud auth print-access-token)

API key

$apiKey = $env:VERTEX_API_KEY

Endpoints

OpenAI-compatible endpoint (gcloud auth)

https://{region}-aiplatform.googleapis.com/v1beta1/projects/{projectId}/locations/{region}/endpoints/openapi/chat/completions

The request and response format is identical to the OpenAI API. The model name requires a google/ prefix (e.g., google/gemini-2.5-flash-lite).

Native Gemini endpoint (API key)

https://{region}-aiplatform.googleapis.com/v1/projects/{projectId}/locations/{region}/publishers/google/models/{model}:generateContent

For streaming, use :streamGenerateContent.

Basic calls

OpenAI-compatible (gcloud auth)

$projectId = "your-project-id"
$region = "us-central1"
$model = "google/gemini-2.5-flash-lite"
$accessToken = (gcloud auth print-access-token)

$body = @{
model = $model
messages = @(
@{
role = "user"
content = "What is the population of Tokyo?"
}
)
} | ConvertTo-Json -Depth 10

$uri = "https://$region-aiplatform.googleapis.com/v1beta1/projects/$projectId/locations/$region/endpoints/openapi/chat/completions"

$response = Invoke-RestMethod `
-Uri $uri `
-Method Post `
-ContentType "application/json" `
-Headers @{ Authorization = "Bearer $accessToken" } `
-Body $body

$response.choices[0].message.content

Native Gemini (API key)

$projectId = "your-project-id"
$region = "us-central1"
$model = "gemini-2.5-flash-lite"
$apiKey = $env:VERTEX_API_KEY

$body = @{
contents = @(
@{
role = "user"
parts = @(
@{ text = "What is the population of Tokyo?" }
)
}
)
} | ConvertTo-Json -Depth 10

$uri = "https://$region-aiplatform.googleapis.com/v1/projects/$projectId/locations/$region/publishers/google/models/${model}:generateContent?key=$apiKey"

$response = Invoke-RestMethod `
-Uri $uri `
-Method Post `
-ContentType "application/json" `
-Body $body

$response.candidates[0].content.parts[0].text

Response structure

OpenAI-compatible

$response.choices[0].message.content # generated text
$response.usage.total_tokens # total token count
$response.model # model used

Native Gemini

$response.candidates[0].content.parts[0].text # generated text
$response.usageMetadata.totalTokenCount # total token count
$response.modelVersion # model version used

For streaming (streamGenerateContent), an array of chunks is returned. Concatenate them to retrieve the full text.

$fullText = ($response | ForEach-Object {
$_.candidates[0].content.parts[0].text
}) -join ""

Adding a system prompt

OpenAI-compatible

$body = @{
model = $model
messages = @(
@{
role = "system"
content = "You are an AI assistant that responds in Japanese. Answer concisely."
}
@{
role = "user"
content = "What is the speed of light?"
}
)
} | ConvertTo-Json -Depth 10

Native Gemini

$body = @{
system_instruction = @{
parts = @(
@{ text = "You are an AI assistant that responds in Japanese. Answer concisely." }
)
}
contents = @(
@{
role = "user"
parts = @(@{ text = "What is the speed of light?" })
}
)
} | ConvertTo-Json -Depth 10

Multi-turn conversation

Place the conversation history in an array to achieve multi-turn conversation.

OpenAI-compatible

$body = @{
model = $model
messages = @(
@{ role = "user"; content = "Do you prefer cats or dogs?" }
@{ role = "assistant"; content = "I prefer cats." }
@{ role = "user"; content = "Why is that?" }
)
} | ConvertTo-Json -Depth 10

Native Gemini

The assistant role is specified as "model".

$body = @{
contents = @(
@{
role = "user"
parts = @(@{ text = "Do you prefer cats or dogs?" })
}
@{
role = "model"
parts = @(@{ text = "I prefer cats." })
}
@{
role = "user"
parts = @(@{ text = "Why is that?" })
}
)
} | ConvertTo-Json -Depth 10

Available models

ModelOpenAI-compatible nameDescription
gemini-2.5-flash-litegoogle/gemini-2.5-flash-liteLightweight, fast, low-cost
gemini-2.5-flashgoogle/gemini-2.5-flashBalanced
gemini-2.5-progoogle/gemini-2.5-proHigh-precision, for complex tasks

Which approach to use

SituationRecommended approach
GCP-authenticated environment (dev, CI, etc.)OpenAI-compatible + gcloud auth
Only an API key availableNative Gemini
Migrating from OpenAIOpenAI-compatible (minimizes code changes)
Streaming requiredNative Gemini

Notes

  • Do not hardcode the API key in scripts; load it from an environment variable ($env:VERTEX_API_KEY).
  • With gcloud auth, tokens expire in about 1 hour. Long-running scripts should refresh the token as needed.
  • Each project has its own rate limits and quotas. Check them before sending large numbers of requests.

What is CORS? An Explanation of Security for Beginners

This article explains CORS (Cross-Origin Resource Sharing), a web browser security feature, for beginners, covering "why it's necessary" and "what dangers it entails." Understanding it correctly will enable secure web development.

The Background of the Need for CORS: Same-Origin Policy

In the early 1990s, when JavaScript was incorporated into browsers, the concept of web security was almost nonexistent. At that time, malicious websites could freely access data from other sites, making it easy for session hijacking and data theft to occur.

To solve this problem, a restriction known as the Same-Origin Policy was introduced. This is a simple yet powerful rule that states, "JavaScript loaded from a web page cannot access data from a different origin."

For example, JavaScript loaded from a page at https://www.example.com cannot access data from https://www.bank.com. This ensures that even if a user accesses a malicious site while logged into their bank site, their banking information cannot be stolen.

What is an Origin?

Origin is determined by the following three factors:

  • Protocol: either http:// or https://
  • Host (domain): either example.com or api.example.com
  • Port: either 80 or 8080

For example,

URLProtocolHostPortOrigin
https://www.example.com/pageHTTPSwww.example.com443 (default)https://www.example.com
https://api.example.com/dataHTTPSapi.example.com443 (default)https://api.example.com

Since these are different hosts, they are considered different origins.

What the Same-Origin Policy Prevents

Requests to different origins via JavaScript's XHR (XMLHttpRequest) or Fetch API are restricted.

Example: A malicious script on evil.com

fetch('https://bank.example.com/api/transfer', {
method: 'POST',
body: JSON.stringify({ amount: 1000000 })
});

Without the Same-Origin Policy, JavaScript from a malicious site (evil.com) could send a transfer request while the user is logged into their bank site. Preventing this scenario is the purpose of the Same-Origin Policy.

Why CORS is Necessary

However, modern web design often involves cooperation among multiple origins.

  • Frontend: https://www.example.com
  • API Server: https://api.example.com
  • CDN / Static Files: https://cdn.example.com

These are operated by the same company and are legitimate communications. But if restricted by the Same-Origin Policy, the application would not function.

This is where CORS (Cross-Origin Resource Sharing) comes into play.

What is CORS: Explicitly Allowing Access

CORS is a mechanism by which the server explicitly declares, "requests from this origin are permitted."

By simply returning the following response headers, the browser can relax the restrictions.

Access-Control-Allow-Origin: https://www.example.com
Access-Control-Allow-Methods: GET, POST, PUT
Access-Control-Allow-Headers: Content-Type, Authorization

Unless the server says "allowed," the browser will not pass the results of the request back to JavaScript. This achieves cross-origin access while maintaining security.

Security Risks of CORS: Common Configuration Mistakes

Although CORS is convenient, incorrect settings can create security vulnerabilities.

Mistake: Allowing All Origins

Access-Control-Allow-Origin: *

This means, "Anyone from anywhere can access this server."

// JavaScript from https://evil.com
fetch('https://api.example.com/user/profile')
.then(r => r.json())
.then(data => {
// Process to steal user profile information
console.log(data);
});

This is particularly dangerous for requests that include authentication credentials (such as cookies), as a user logged into api.example.com could have their personal information stolen when accessing evil.com.

Half Dangerous: * Prohibited for Requests Including Cookies

fetch('https://api.example.com/user/profile', {
credentials: 'include' // Include cookies
})

When including authentication credentials, Access-Control-Allow-Origin: * cannot be used. You must always specify a specific origin.

Access-Control-Allow-Origin: https://www.example.com
Access-Control-Allow-Credentials: true

Mistake: Allowing User-Specified URLs Directly

// Dangerous implementation example (server-side)
const origin = request.headers.get('Origin');
response.headers.set('Access-Control-Allow-Origin', origin); // Return as is!

This can allow requests from https://evil.com, resulting in a response with Access-Control-Allow-Origin: https://evil.com, which can be exploited.

The correct approach is to prepare a whitelist and allow only those origins.

const allowedOrigins = [
'https://www.example.com',
'https://admin.example.com'
];

if (allowedOrigins.includes(origin)) {
response.headers.set('Access-Control-Allow-Origin', origin);
}

Mistake: Allowing All Headers

Access-Control-Allow-Headers: *

This means "Any header is accepted," allowing injections of malicious data through custom headers.

List only the necessary headers.

Access-Control-Allow-Headers: Content-Type, Authorization

CORS Preflight Request: Browser's Prior Check

For requests other than simple requests (GET, HEAD, POST), the browser automatically sends an OPTIONS method request to check "Is this okay?" This is known as a preflight request.

1. JavaScript tries to send a PUT request

2. The browser automatically sends an OPTIONS preflight request
OPTIONS /api/resource HTTP/1.1
Origin: https://www.example.com
Access-Control-Request-Method: PUT
Access-Control-Request-Headers: Content-Type

3. The server responds with "OK"
HTTP 200 OK
Access-Control-Allow-Origin: https://www.example.com
Access-Control-Allow-Methods: PUT
Access-Control-Allow-Headers: Content-Type

4. The browser sends the actual PUT request

If the server does not support the OPTIONS method, the preflight will fail, and the actual request will not be sent.

Key Points:

  • Explicitly specify allowed origins in the whitelist
  • Use credentials: true to handle requests including cookie authentication
  • Allow only necessary methods and headers
  • Options preflight request

Common Questions

Q. I encountered a CORS error. Can I allow all origins to resolve it?

A: No. It might work temporarily, but allowing * in a production environment poses a security risk. You need to revisit the server-side settings or redesign the API.

Q. I want to access origins at different ports during local development. Is that okay?

A: Disabling CORS just for the local development environment is acceptable.

Q. Does CORS matter when calling APIs from mobile apps?

A: CORS is a browser security feature, so it does not apply to mobile apps. Instead, you need to implement authentication and authorization using API keys or OAuth.

Summary

PointExplanation
Purpose of CORSAllow cross-origin access while maintaining browser security
Dangerous ConfigurationUsing Access-Control-Allow-Origin: * for APIs requiring authentication
Correct ConfigurationExplicitly specify allowed origins in the whitelist
When Including CookiesMust specify Access-Control-Allow-Credentials: true and a specific origin
PreflightComplex requests like PUT/DELETE require the browser to pre-check with OPTIONS

CORS is not just a "cause of errors," but an important mechanism in web security. Misconfigurations can lead to security incidents, so it must be handled with care.

References

Development of the translation CLI tool translate-mcp supporting multiple languages using OpenAI API

translate-mcp is a translation tool that utilizes OpenAI's API. It supports both CLI mode and MCP server usage. It is useful in a wide range of scenarios, from translating an entire file to being integrated into AI tools.

What is translate-mcp

translate-mcp is a translation-specific tool using the OpenAI API. It is implemented in Python and has two usage modes.

  1. CLI Mode: Translate files directly from the command line.
  2. MCP Server Mode: Operates as a Model Context Protocol (MCP) server, integrating with AI tools.

Features

  • Multi-language support: Supports various languages.
  • Simple usage: Can be started with just one API key.
  • Two usage modes: Operates both as a CLI script and as an MCP server.
  • Lightweight: Relies solely on the OpenAI API without dependency on external libraries.
  • Error handling: In CLI mode, errors are returned in stderr, while in MCP mode, errors are returned in JSON format.

Setup

Prerequisites

  • Python must be installed.
  • An OpenAI API key must be obtained.

Installing uv

Since translate-mcp is managed by uv, you first need to install uv.

For installation instructions, refer to Installation | uv.

# macOS / Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows (PowerShell)
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

Installing translate-mcp

You can install it using the following command with uv.

uv tool install git+https://github.com/Himeyama/translate-mcp

Alternatively, you can run it directly without installation using uvx.

uvx git+https://github.com/Himeyama/translate-mcp --help

How to Use

CLI Mode

To translate a file, run the following command:

translate --input blog/2026-04-02-example.md --from Japanese --to English

The result will be output to standard output. You can save it to a file using redirection.

translate \
--input blog/2026-04-02-example.md \
--from Japanese \
--to English \
--output i18n/en/blog/2026-04-02-example.md

Parameters

  • --mcp: MCP mode
  • --input: Path to the file to be translated
  • --from: Source language (e.g., Japanese, English)
  • --to: Target language (e.g., English, Taiwanese)
  • --output (optional): Destination to save the translated text
  • --model (optional): OpenAI model (e.g., gpt-5-mini)
  • --debug (optional): Debug mode

In case of errors

If an error occurs, it will be output to stderr.

MCP Server Mode

Start it as an MCP server, making it accessible from Claude Code and other AI tools.

translate --mcp

Practical Examples

This section describes how to translate a blog article from Japanese to English and Traditional Chinese (Taiwan).

Japanese Version (Original Article)

# Japanese article exists in blog/2026-04-02-example.md

Generate English Version

translate \
--input blog/2026-04-02-example.md \
--from Japanese \
--to English > i18n/en/docusaurus-plugin-content-blog/2026-04-02-example.md

Generate Taiwan Version (Traditional Chinese)

translate \
--input blog/2026-04-02-example.md \
--from Japanese \
--to Taiwanese > i18n/zh-TW/docusaurus-plugin-content-blog/2026-04-02-example.md

Advantages and Disadvantages

Advantages

  • High accuracy: Uses OpenAI's high-quality models (such as GPT-4).
  • Skill support: Can be integrated with various tools like ChatGPT and Claude.
  • Simplicity: Very easy to set up and use.
  • Customizable: The source code is open, allowing for customization.

Disadvantages

  • API costs: Costs incurred from OpenAI API usage based on translation volume.
  • Internet connection required: Cannot be used offline as API calls are necessary.
  • Rate limiting: Subject to OpenAI API's rate limits.

Conclusion

translate-mcp is a simple and high-quality translation tool that leverages the OpenAI API. It is effective in various scenarios, including multi-language support for blog articles, document translation, and integration into AI tools.

In particular, it is effective to use as an automation script for supporting multiple languages in static site generators like Docusaurus.

References

Comparing Anthropic API and AWS Bedrock Pricing

When using Claude via API, you have more than two options: in addition to calling the Anthropic API directly, you can also use it via AWS Bedrock, Google Vertex AI, or Microsoft Azure (Azure AI Foundry). Base pricing is the same across all routes, but there are differences in batch processing and cloud ecosystem integration.

Unit: USD / 1M tokens (MTok). Information as of March 2026.

On-Demand Base Pricing

ModelTypeAnthropic APIBedrockVertex AIAzure
Claude Opus 4.6Input$5.00$5.00$5.00$5.00
Output$25.00$25.00$25.00$25.00
Claude Sonnet 4.6Input$3.00$3.00$3.00$3.00
Output$15.00$15.00$15.00$15.00
Claude Haiku 4.5Input$1.00$1.00$1.00$1.00
Output$5.00$5.00$5.00$5.00
Claude Sonnet 4.5Input$3.00$3.00$3.00$3.00
Output$15.00$15.00$15.00$15.00

Base pricing is identical across all routes.

Note that Vertex AI regional endpoints carry a 10% surcharge over global endpoint pricing. Bedrock offers Long Context variants as separate SKUs at the same price; on the Anthropic API, Long Context is integrated into the standard models.

Cache Pricing

Prompt Caching rates are also identical across all routes.

ModelCache TypeAnthropic APIBedrockVertex AIAzure
Claude Opus 4.65-min cache write$6.25$6.25$6.25$6.25
1-hour cache write$10.00$10.00$10.00$10.00
Cache read$0.50$0.50$0.50$0.50
Claude Sonnet 4.65-min cache write$3.75$3.75$3.75$3.75
1-hour cache write$6.00$6.00$6.00$6.00
Cache read$0.30$0.30$0.30$0.30
Claude Haiku 4.55-min cache write$1.25$1.25$1.25$1.25
1-hour cache write$2.00$2.00$2.00$2.00
Cache read$0.10$0.10$0.10$0.10

Cache writes come in two TTL tiers: 5-minute (short-term) and 1-hour (long-term). Longer TTL means higher write cost, but for applications with lengthy system prompts that are read repeatedly, the savings on read pricing more than compensate.

Batch Processing Pricing

Bedrock, Vertex AI, and the Anthropic API all offer an asynchronous batch API at 50% off on-demand pricing. Azure does not explicitly list batch pricing at this time.

ModelBatch InputBatch Output
Claude Opus 4.6$2.50$12.50
Claude Sonnet 4.6$1.50$7.50
Claude Haiku 4.5$0.50$2.50
Claude Sonnet 4.5$1.50$7.50

For large-scale batch workloads (log analysis, embedding generation, etc.), any of these routes can cut costs in half.

Ecosystem Comparison

FeatureAnthropic APIBedrockVertex AIAzure
Base pricingSameSameSameSame
Regional surcharge+10% (regional)
Batch processing (50% off)Not listed
Tokyo region
IAM / audit log integrationAWSGoogle CloudAzure
VPC / PrivateLink
Billing integrationAnthropic directAWSGoogle CloudAzure
New feature rollout speedFastestDelayedDelayedDelayed

New features (such as Extended Thinking) roll out to the Anthropic API first; Vertex AI, Bedrock, and Azure typically follow weeks later.

Which Should You Choose?

  • Simple setup / prototyping: Anthropic API requires just one API key and gets new features first.
  • Deep AWS integration: If you need IAM, CloudWatch, or VPC, Bedrock is the natural choice. Tokyo region supported.
  • Deep Google Cloud integration: Vertex AI fits right in. Note the 10% surcharge on regional endpoints.
  • Deep Azure integration: Available via Azure AI Foundry, integrated with Azure billing and management.
  • Heavy batch workloads: Bedrock, Vertex AI, and the Anthropic API all offer 50% off batch pricing.

References

Why Bold Text Uses Gothic (Sans-serif) Font in Japanese Typography

Even when body text uses Mincho (serif) typeface, bold text typically renders in Gothic (sans-serif). This is an intentional design choice rooted in readability, visual clarity, and historical printing conventions.

Readability

Simply bolding Mincho type creates a severe contrast between the serifs and main strokes, causing the counters (the enclosed or partially enclosed spaces within a character) to collapse — especially at small sizes on screens. The result is dense, hard-to-read text.

Gothic type lacks serifs entirely, so increasing the weight preserves counters and maintains legibility.

Visual Clarity

Switching font families entirely sends a clearer "this is emphasized" signal to the human visual system than merely darkening the same typeface. The contrast between Mincho body text and Gothic bold is immediately recognizable as intentional emphasis.

Historical Convention

In traditional Japanese letterpress printing, emphasis was expressed by switching typefaces rather than increasing weight. Casting heavier Mincho type posed physical constraints, so Gothic — a structurally distinct face — was used for emphasis instead. This convention carried over into digital typesetting and the web.

CSS Implementation

When using Mincho as the base typeface on the web, switching to Gothic for bold elements is straightforward:

body {
font-family: "Noto Serif JP", serif;
}

strong, b, h1, h2, h3, h4, h5, h6 {
font-family: "Noto Sans JP", sans-serif;
font-weight: 700;
}

This approach also has a practical advantage: only two weights need to be loaded — Regular Mincho for body text and Bold Gothic for emphasis — reducing font delivery overhead.

Summary

Three reasons Japanese bold text uses Gothic type:

  1. Readability: Gothic maintains counters at higher weights, keeping text legible
  2. Visual clarity: A typeface switch communicates emphasis more effectively than weight alone
  3. Convention: The letterpress-era rule of "use a different face for emphasis" carried over into digital and web typography

This combination is rational from both a design and web performance perspective.

How I Achieved Near-Perfect Lighthouse Scores on a Docusaurus Blog — SEO, Performance & Accessibility

I improved this blog's mobile Lighthouse scores to Performance 99, Accessibility 100, Best Practices 100, and SEO 100. Here's what I did, broken down into SEO, performance, and accessibility improvements.

Problems Before the Improvements

Running Lighthouse on the Docusaurus blog revealed several issues:

  • SEO: No meta description, no OGP/Twitter Cards, no structured data, no sitemap priority
  • Performance: Synchronous Google Tag Manager (GTM) loading causing large unused JS, external CDN avatar fetching as a bottleneck
  • Accessibility: Primary color contrast ratio failing WCAG AA requirements

SEO Improvements

Adding meta description, OGP & Twitter Cards

I added default site-wide metadata to themeConfig.metadata in docusaurus.config.ts:

themeConfig: {
metadata: [
{ name: 'description', content: "Hikari's tech notebook..." },
{ property: 'og:locale', content: 'ja_JP' },
{ name: 'twitter:card', content: 'summary_large_image' },
{ name: 'twitter:site', content: '@ptrqr' },
],
}

I also swizzled src/theme/Layout/index.tsx to provide locale-specific fallback descriptions for pages without their own description (blog listing, tag pages, etc.).

Adding robots.txt

Added static/robots.txt to explicitly point crawlers to the sitemap.

BlogPosting JSON-LD (Structured Data)

I initially swizzled src/theme/BlogPostItem/index.tsx to output BlogPosting JSON-LD on article pages with headline, datePublished, dateModified, and author.

Later I discovered that Docusaurus's built-in BlogPostPage/StructuredData already outputs equivalent data. I removed the custom JSON-LD and instead added a keywords fallback (frontMatter.keywordstags) to the built-in component. Duplicate structured data can hurt SEO, so this cleanup was important.

WebSite JSON-LD

Added WebSite type JSON-LD in docusaurus.config.ts's headTags to help Google correctly identify the site name.

headTags: [
{
tagName: 'script',
attributes: { type: 'application/ld+json' },
innerHTML: JSON.stringify({
'@context': 'https://schema.org',
'@type': 'WebSite',
name: 'ひかりの備忘録',
url: 'https://www.hikari-dev.com/',
}),
},
],

Auto-Generated OGP Images for All Posts

Created scripts/generate-ogp.js to automatically generate OGP images with tag-based gradient backgrounds. This ensures every post has an eye-catching image when shared on social media. All posts now have an image: field in their frontmatter.

Sitemap Improvements

Used the createSitemapItems callback to set the homepage priority to 1.0 and blog posts to 0.8. Also added automatic lastmod extraction from the date in each URL.

hreflang x-default

In src/theme/Root.tsx, I inject an hreflang="x-default" <link> tag on every page, mapping English pages (/en/...) back to the default (Japanese) URL. This helps search engines correctly identify language variants.

const defaultPath = pathname.replace(/^\/en(?=\/|$)/, '') || '/';
const xDefaultUrl = `${siteConfig.url}${defaultPath}`;

<Head>
<link rel="alternate" hreflang="x-default" href={xDefaultUrl} />
</Head>

Performance Improvements

Lazy-Loading GTM

I replaced @docusaurus/plugin-google-gtag with a custom src/clientModules/gtag.js that dynamically injects the GTM script after the window.load event. This significantly reduced unused JS blocking initial render.

function loadGtag() {
const script = document.createElement('script');
script.async = true;
script.src = `https://www.googletagmanager.com/gtag/js?id=${GA_ID}`;
document.head.appendChild(script);
}

window.addEventListener('load', loadGtag, { once: true });

SPA page transitions use Docusaurus's onRouteDidUpdate hook to manually call window.gtag. A further improvement defers loading to requestIdleCallback for even better idle-time utilization.

Self-Hosting & WebP Avatar

Moved the avatar image from GitHub's CDN (avatars.githubusercontent.com) to self-hosted. GitHub CDN has a 5-minute cache TTL, which Lighthouse flagged on every run.

Converted the avatar to WebP format, reducing file size from 34 KB (PNG) to 3.5 KB — roughly a 90% reduction.

Image Size Optimization & CLS Fix

  • Added ?size=64 to the GitHub avatar URL, shrinking from 460 px to 64 px (saving 33 KB)
  • Added width/height attributes to the navbar logo to fix CLS (Cumulative Layout Shift)
  • Added loading="lazy" to <img> tags

rspack / SWC

Introduced @docusaurus/faster, replacing webpack with rspack + SWC + lightningCSS:

future: {
v4: true,
experimental_faster: true,
},

This improved both build speed and bundle size.

Disabling Unused Plugins

Disabled the unused docs plugin to prevent unnecessary JS from being shipped to clients.

Mobile-Only Google Fonts

Google Fonts (Noto Sans JP) was only needed on mobile. Using matchMedia, the font stylesheet is now dynamically injected only on mobile devices, saving approximately 130 KB of unused CSS on desktop.

Accessibility Improvements

Fixing Contrast Ratios

Changed the primary color from #F15EB4 to #C82273, achieving a contrast ratio of 5.3:1 against white (WCAG AA compliant). Dark mode uses #F36AB2 (7.0:1 against the dark background).

Post date text color is now managed via the --post-date-color CSS variable: #595959 (7.0:1) in light mode, #9e9e9e in dark mode.

Font Unification

Changed heading and <strong> fonts from Noto Serif JP to Noto Sans JP for consistency with body text.

Results

CategoryScore
Performance99
Accessibility100
Best Practices100
SEO100

Near-perfect scores on mobile.

Summary

The three most impactful changes were:

  1. Lazy-loading GTM: Dramatically reduced unused JS and boosted performance scores
  2. OGP & structured data: Achieved SEO 100 and improved social media sharing appearance
  3. Contrast ratio fixes: WCAG AA compliance brought accessibility to 100

Docusaurus generates high-quality sites by default, but achieving near-perfect Lighthouse scores requires fine-tuning GTM loading strategy, metadata, and accessibility details. I hope this helps others working on similar improvements.

How to Create Japanese PDFs with pLaTeX

A guide to creating Japanese PDF documents with pLaTeX — from document structure and templates to compilation and common error fixes.

What is pLaTeX?

pLaTeX is a LaTeX implementation designed for Japanese typesetting. It compiles .tex files with the platex command and converts them to PDF using dvipdfmx.

It is included in TeX Live and ready to use once installed. For installation instructions, see Installing TeX Live 2026 on Linux.

Choosing a Document Class

Use the following classes for Japanese documents.

ClassUse case
jarticleShort documents such as papers and reports
jbookLong documents with chapter structure
jreportTechnical reports (abstract + chapter structure)
beamerPresentations

Template (jarticle)

A comprehensive sample using jarticle is shown below.

Show sample code
% pLaTeX sample document — covers major features
\documentclass[a4paper,12pt]{jarticle}

%% ========== Packages ==========
\usepackage{amsmath, amssymb} % Enhanced math
\usepackage{graphicx} % Include figures
\usepackage{color} % Color support
\usepackage{fancyhdr} % Headers and footers
\usepackage{geometry} % Page layout
\usepackage{enumerate} % Custom list labels
\usepackage{url} % URL formatting
\usepackage{multicol} % Multi-column layout
\usepackage{booktabs} % High-quality table rules
\usepackage{array} % Extended table column formats
\usepackage{verbatim} % Verbatim environment
\usepackage{ascmac} % Box environments (screen, itembox, etc.)
\usepackage{okumacro} % Ruby, kenten, etc. (pLaTeX standard)
\usepackage{setspace} % Line spacing control
\usepackage{listings} % Source code listings
\usepackage{xcolor} % Colors (for listings)
\usepackage{caption} % Caption formatting

%% ========== Page Layout ==========
\geometry{top=25mm, bottom=25mm, left=25mm, right=25mm}
\setlength{\headheight}{17pt}
\addtolength{\topmargin}{-5pt}

%% ========== Headers and Footers ==========
\pagestyle{fancy}
\fancyhf{}
\lhead{pLaTeX Sample Document}
\rhead{\today}
\cfoot{\thepage}
\renewcommand{\headrulewidth}{0.4pt}

%% ========== listings Settings ==========
\lstset{
basicstyle=\ttfamily\small,
keywordstyle=\color{blue}\bfseries,
commentstyle=\color{green!50!black},
stringstyle=\color{red!70!black},
numbers=left,
numberstyle=\tiny\color{gray},
frame=single,
breaklines=true,
tabsize=4,
}

%% ========== Title Information ==========
\title{\textbf{pLaTeX Feature Showcase}\\[5pt]
\large --- Typical usage of Japanese \LaTeX ---}
\author{John Doe\thanks{Department of Computer Science, Sample University}
\and Jane Doe}
\date{\today}

%% ================================================================
\begin{document}
%% ================================================================

\maketitle
\thispagestyle{fancy}

\begin{abstract}
This document demonstrates the major features of pLaTeX.
It covers document class and package loading, Japanese typesetting features,
math environments, tables, figures, cross-references, footnotes,
ruby (furigana), kenten (emphasis dots), source code listings,
multi-column layout, and custom commands.
\end{abstract}

\tableofcontents
\newpage

%% ================================================================
\section{Japanese Typesetting Basics}
%% ================================================================

\subsection{Mixing Japanese and Latin Text}

pLaTeX handles Japanese and Latin text naturally in the same document.
ASCII characters such as ``Hello, World!'' and \texttt{LaTeX2e}
are automatically spaced appropriately.

Spacing between full-width and half-width characters is adjusted automatically:
JapaneseEnglishJapanese, numbers 123 Japanese, symbol \% Japanese.

\subsection{Ruby (Furigana)}

The \verb|\ruby| command from the \texttt{okumacro} package adds ruby annotations.

\begin{center}
\ruby{漢字}{かんじ}\ruby{情報処理}{じょうほうしょり}
\ruby{自然言語}{しぜんげんご}\ruby{処理}{しょり}
\end{center}

\subsection{Kenten (Emphasis Dots)}

The \verb|\kenten| command from \texttt{okumacro} adds emphasis dots above characters.

\begin{center}
\kenten{Important text} can be emphasized with kenten dots.
\end{center}

\subsection{Font Sizes}

{\tiny tiny} {\scriptsize scriptsize} {\footnotesize footnotesize} {\small small}
{\normalsize normalsize} {\large large} {\Large Large} {\LARGE LARGE}
{\huge huge} {\Huge Huge}

\subsection{Text Decoration}

\begin{itemize}
\item \textbf{Bold}
\item \textit{Italic}
\item \textsl{Slanted}
\item \textsc{Small Caps}
\item \texttt{Typewriter (monospace)}
\item \underline{Underlined text}
\item \textcolor{red}{Red text}
\item \textcolor{blue!70!black}{Blue text}
\item \colorbox{yellow}{Highlighted background}
\end{itemize}

%% ================================================================
\section{Document Structure}
%% ================================================================

\subsection{Heading Levels}

\texttt{jarticle} supports the following heading levels:
\verb|\section|, \verb|\subsection|, \verb|\subsubsection|,
\verb|\paragraph|, and \verb|\subparagraph|.

\subsubsection{Subsubsection Example}
This is a subsubsection.

\paragraph{Paragraph Heading}
This is an example of a paragraph heading. Body text follows without indentation.

\subsection{Footnotes}

Footnotes\footnote{This is the footnote text. It is placed automatically at the bottom of the page.}
can be added inline. Multiple footnotes\footnote{A second footnote. Numbers are assigned automatically.}
are supported.

\subsection{Cross-references}

Adding \verb|\label| allows referencing with \verb|\ref| and \verb|\pageref|.
Example: the next section is Section~\ref{sec:math} (page~\pageref{sec:math}).

%% ================================================================
\section{Math Environments}
\label{sec:math}
%% ================================================================

\subsection{Inline and Display Math}

Inline math: $E = mc^2$, $\alpha + \beta = \gamma$,
$\sum_{i=1}^{n} i = \frac{n(n+1)}{2}$.

Display math:
\[
\int_{-\infty}^{\infty} e^{-x^2}\,dx = \sqrt{\pi}
\]

\subsection{equation and align Environments}

Numbered equation (\texttt{equation}):
\begin{equation}
\label{eq:euler}
e^{i\pi} + 1 = 0
\end{equation}
Equation~\eqref{eq:euler} is Euler's identity.

Multi-line alignment (\texttt{align}):
\begin{align}
(a+b)^2 &= a^2 + 2ab + b^2 \\
(a-b)^2 &= a^2 - 2ab + b^2 \\
(a+b)(a-b) &= a^2 - b^2
\end{align}

\subsection{Matrices and Vectors}

\begin{equation}
A = \begin{pmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{pmatrix},\quad
\det(A) = a_{11}a_{22} - a_{12}a_{21}
\end{equation}

\subsection{Cases, Fractions, and Limits}

\begin{equation}
f(x) = \begin{cases}
x^2 & (x \geq 0) \\
-x^2 & (x < 0)
\end{cases}, \qquad
\lim_{n\to\infty} \left(1 + \frac{1}{n}\right)^n = e
\end{equation}

%% ================================================================
\section{Tables (tabular Environment)}
%% ================================================================

\subsection{Basic Table}

\begin{table}[h]
\centering
\caption{Temperature Data by City (sample)}
\label{tab:temp}
\begin{tabular}{lrrr}
\toprule
City & High (°C) & Low (°C) & Avg (°C) \\
\midrule
Tokyo & 35.2 & 25.1 & 29.8 \\
Osaka & 36.4 & 26.3 & 31.0 \\
Sapporo & 28.7 & 19.5 & 23.9 \\
Naha & 32.1 & 27.4 & 29.8 \\
\bottomrule
\end{tabular}
\end{table}

\subsection{Complex Table (\texttt{array} Extension)}

\begin{table}[h]
\centering
\caption{Programming Language Comparison}
\label{tab:lang}
\begin{tabular}{|l|c|c|c|}
\hline
Language & Typing & Paradigm & Main Use \\
\hline\hline
Python & Dynamic & Multi & AI/Data Science \\
Rust & Static & Systems & Systems Programming \\
Haskell & Static & Functional & Research/Finance \\
JavaScript & Dynamic & Multi & Web Frontend \\
\hline
\end{tabular}
\end{table}

Tables~\ref{tab:temp} and~\ref{tab:lang} can be referenced using \verb|\label/\ref|.

%% ================================================================
\section{List Environments}
%% ================================================================

\subsection{Bullet List (itemize)}

\begin{itemize}
\item First item
\item Second item
\begin{itemize}
\item Nested item A
\item Nested item B
\end{itemize}
\item Third item
\end{itemize}

\subsection{Numbered List (enumerate)}

\begin{enumerate}[(1)] % Custom label format via enumerate package
\item First step
\item Next step
\item Final step
\end{enumerate}

\subsection{Description List (description)}

\begin{description}
\item[pLaTeX] Japanese-capable \LaTeX engine
\item[upLaTeX] Unicode-aware successor to pLaTeX
\item[LuaLaTeX] \LaTeX engine with Lua scripting support
\end{description}

%% ================================================================
\section{Verbatim and Source Code}
%% ================================================================

\subsection{verbatim Environment}

\begin{verbatim}
#include <stdio.h>
int main(void) {
printf("Hello, pLaTeX!\n");
return 0;
}
\end{verbatim}

\subsection{Syntax Highlighting with listings}

\begin{lstlisting}[language=Python, caption={Fibonacci sequence (Python)}]
def fibonacci(n: int) -> int:
"""Compute Fibonacci number recursively."""
if n <= 1:
return n
return fibonacci(n - 1) + fibonacci(n - 2)

# Print the first 10 terms
for i in range(10):
print(f"F({i}) = {fibonacci(i)}")
\end{lstlisting}

%% ================================================================
\section{Box Environments}
%% ================================================================

The \texttt{ascmac} package provides \texttt{itembox}, \texttt{screen}, and similar environments.

\begin{itembox}[l]{Key Points}
\begin{itemize}
\item Use \texttt{jarticle} / \texttt{jbook} document classes for Japanese
\item \texttt{okumacro} provides ruby and kenten
\item Generate PDFs with \texttt{platex + dvipdfmx}
\end{itemize}
\end{itembox}

\vspace{5pt}

\begin{screen}
\texttt{screen} environment: a terminal-style box, useful for showing command output or code examples.
\end{screen}

%% ================================================================
\section{Multi-Column Layout}
%% ================================================================

The \texttt{multicol} package enables mid-document column switching.

\begin{multicols}{2}
\noindent
This is the left column of a two-column layout.
pLaTeX handles Japanese text in multi-column layouts naturally.
Long text is distributed evenly across columns automatically,
making it suitable for newspaper or magazine-style typesetting.

\columnbreak

\noindent
This is the right column.
Use \verb|\columnbreak| to force a column break.
Inline math such as $x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}$
works as expected.
\end{multicols}

%% ================================================================
\section{Spacing Adjustments}
%% ================================================================

\subsection{Horizontal Space}

Word spacing: a\,b (\verb|\,| thin space),
a\enspace b (\verb|\enspace|),
a\quad b (\verb|\quad|),
a\qquad b (\verb|\qquad|).

\subsection{Vertical Space}

Use \verb|\vspace| to insert vertical space.

\vspace{5mm}
Text after a 5mm vertical space.
\vspace{5mm}

\subsection{Line Spacing}

{\setstretch{1.8}
This paragraph uses \texttt{setstretch} from the \texttt{setspace} package
to set line spacing to 1.8x.
A spacing of 1.5 to 2.0 is generally considered readable for Japanese text.
}

%% ================================================================
\section{Custom Commands and Environments}
%% ================================================================

\subsection{Defining Custom Commands}

\newcommand{\R}{\mathbb{R}}
\newcommand{\N}{\mathbb{N}}
\newcommand{\highlight}[1]{\colorbox{yellow!60}{#1}}
\newcommand{\term}[1]{\textbf{\textit{#1}}} % Term emphasis

Examples: $\R$ (real numbers), $\N$ (natural numbers).
\highlight{Highlighted text}.
\term{Machine learning} is a subfield of artificial intelligence.

\subsection{Defining Custom Environments}

\newenvironment{mybox}[1]{%
\begin{center}\begin{tabular}{|p{0.85\linewidth}|}
\hline\vspace{1pt}
\textbf{#1}\\[2pt]
}{%
\\\hline\end{tabular}\end{center}
}

\begin{mybox}{Custom Box}
Use \verb|\newenvironment| to define reusable environments.
They can accept arguments for flexible customization.
\end{mybox}

%% ================================================================
\section{References}
%% ================================================================

Cite references with the \verb|\cite| command~\cite{knuth1984,lamport1994,okumura2020}.
The reference list is output at the end of the document.

%% ================================================================
%% Reference list (thebibliography environment)
%% ================================================================
\begin{thebibliography}{99}

\bibitem{knuth1984}
D.~E. Knuth,
\textit{The \TeX book},
Addison-Wesley, 1984.

\bibitem{lamport1994}
L.~Lamport,
\textit{\LaTeX: A Document Preparation System}, 2nd ed.,
Addison-Wesley, 1994.

\bibitem{okumura2020}
H.~Okumura and Y.~Kuroki,
\textit{\LaTeXe Beautiful Document Creation Guide}, 8th ed.,
Gijutsu-Hyoronsha, 2020.

\end{thebibliography}

\end{document}

Compilation

Basic (platex + dvipdfmx)

Run platex twice to resolve cross-references and table of contents, then convert to PDF with dvipdfmx.

platex -interaction=nonstopmode -kanji=utf8 document.tex
platex -interaction=nonstopmode -kanji=utf8 document.tex
dvipdfmx document.dvi

ptex2pdf

ptex2pdf combines the platex + dvipdfmx two-step process into a single command.

Basic Command

ptex2pdf -l -ot "-kanji=utf8 -interaction=nonstopmode" document.tex
OptionMeaning
-lUse latex-based engine (platex)
-ot "..."Options passed to platex
-kanji=utf8Set input encoding to UTF-8
-interaction=nonstopmodeContinue processing without stopping on errors

Steps

  1. Run ptex2pdf twice (to resolve cross-references and table of contents)

    ptex2pdf -l -ot "-kanji=utf8 -interaction=nonstopmode" document.tex
    ptex2pdf -l -ot "-kanji=utf8 -interaction=nonstopmode" document.tex
  2. Check the result

    • Success: document.pdf is generated
    • Error: Check lines starting with ! in the .log file and the 3 surrounding lines
    • Warnings only: Check for Overfull/Underfull hbox, undefined references, etc.
  3. When BibTeX is needed (when a .bib file exists or \bibliography is used)

    ptex2pdf -l -ot "-kanji=utf8 -interaction=nonstopmode" document.tex
    bibtex document
    ptex2pdf -l -ot "-kanji=utf8 -interaction=nonstopmode" document.tex
    ptex2pdf -l -ot "-kanji=utf8 -interaction=nonstopmode" document.tex

    Running ptex2pdf twice after bibtex resolves the cross-references between the reference list and in-text citations.

Common Errors and Fixes

Undefined control sequence

! Undefined control sequence.
l.42 \somecommand

An undefined command is used. The cause is usually a missing package or a typo.

Missing $ inserted

! Missing $ inserted.

A math symbol such as _ (subscript) or ^ (superscript) is used outside math mode. Wrap the expression with $...$.

File not found

! LaTeX Error: File `image.png' not found.

The path specified in \includegraphics is incorrect. Verify the path relative to the .tex file.

Overfull \hbox

Overfull \hbox (12.3pt too wide) in paragraph at lines 55--60

This is a warning, not an error. A line does not fit within the specified width. Long URLs or words are often the cause; use \allowbreak or the \url{} package to fix it.

Checking the Log

The .log file generated after compilation contains detailed error and warning information. Lines starting with ! are errors; lines containing Warning are warnings.

grep -n "^!" document.log # Extract error lines
grep -n "Warning" document.log # Extract warning lines

Cleaning Up Intermediate Files

Compilation generates many intermediate files. To clean up while keeping the PDF:

rm -f document.aux document.log document.dvi \
document.toc document.lof document.lot \
document.out document.bbl document.blg \
document.synctex.gz

To remove everything including the PDF, add .pdf to the list.

Installing TeX Live 2026 on Linux

A step-by-step guide to installing TeX Live 2026 on RHEL-based Linux using an ISO image. These steps apply to RHEL-based distributions in general.

Prerequisites

  • sudo privileges
  • At least 10 GB of free disk space (ISO ~6.4 GB + installation ~8 GB)
  • Internet connection

Step 1: Download the ISO Image

Download the ISO from the RIKEN mirror site.

cd
curl -C - -O --progress-bar https://ftp.riken.jp/CTAN/systems/texlive/Images/texlive2026.iso

The -C - option resumes the download if it is interrupted.

After the download completes, verify the file size (~6.4 GiB):

ls -lh ~/texlive2026.iso

Step 2: Mount the ISO Image

Create a mount point and mount the ISO as a loop device.

sudo mkdir -p /mnt/texlive
sudo mount -o loop,ro ~/texlive2026.iso /mnt/texlive

Verify the mount:

ls /mnt/texlive

You should see output like the following:

install-tl archive tlpkg README ...

Step 3: Run the Installer

sudo /mnt/texlive/install-tl

A text-based interactive menu will launch.

Key Controls

KeyAction
SSelect installation scheme
DChange installation directory
IStart installation
QQuit (cancel)
  • Scheme: scheme-full (includes all packages; contains collection-langjapanese required for Japanese LaTeX)
  • Installation directory: /usr/local/texlive/2026 (default)

After confirming the settings, press I to start the installation. The process takes several tens of minutes.

Step 4: Configure PATH

Add the TeX Live binary path to ~/.bashrc.

echo 'export PATH="/usr/local/texlive/2026/bin/x86_64-linux:$PATH"' >> ~/.bashrc
source ~/.bashrc

Step 5: Fix Locale (RHEL-based Systems)

On RHEL-based Linux, lualatex may fail with an error if the locale is not configured. Run the following:

sudo dnf install -y glibc-langpack-en

Also set the locale on shell startup:

echo 'export LANG=C.UTF-8' >> ~/.bashrc
source ~/.bashrc

Step 6: Verify Installation

Check the version of each command:

tex --version
lualatex --version
platex --version

Expected output:

TeX 3.141592653 (TeX Live 2026)
LuaHBTeX, Version 1.24.0 (TeX Live 2026)
e-upTeX 3.141592653-p4.1.2-u2.02 (TeX Live 2026)

Step 7: Test Japanese Compilation

Create a test file:

cat > /tmp/test.tex << 'EOF'
\documentclass{jlreq}
\begin{document}
日本語のテスト。TeX Live 2026 による日本語組版のサンプルです。
\end{document}
EOF

Compile it:

cd /tmp && lualatex test.tex

If test.pdf is generated, the installation is successful.

Step 8: Cleanup (Optional)

After installation, you can remove the ISO and mount point:

sudo umount /mnt/texlive
rm ~/texlive2026.iso

Mirror Sites

If the download is slow, try another mirror:

MirrorURL
RIKEN (recommended)https://ftp.riken.jp/CTAN/systems/texlive/Images/
JAISThttps://ftp.jaist.ac.jp/pub/CTAN/systems/texlive/Images/
Yamagata Universityhttps://ftp.yz.yamagata-u.ac.jp/pub/CTAN/systems/texlive/Images/
CTAN mirrorhttps://mirror.ctan.org/systems/texlive/Images/

What Are AI Agent Skills? How They Work, Explained Simply

Adding "skills" to an AI agent lets you extend its capabilities, just like installing a plugin for an app. This article explains how Agent Skills work and what an agent actually does internally when using them.

What Is an AI Agent?

First, an AI agent is an AI program that receives instructions and autonomously completes tasks.

Unlike a simple AI that just answers questions (like ChatGPT in basic use), an agent can:

  • Read and write files
  • Execute code and check results
  • Call external APIs and tools
  • Make decisions across multiple steps on its own

What Are Skills?

Agent Skills is a mechanism for giving agents new abilities and domain knowledge.

Think of it like handing a new employee a work manual. Once the agent reads the manual (the skill), it understands how to approach that task correctly.

Without skills: "Write a blog post" → Agent writes something generic
With skills: "Write a blog post" → Agent follows the manual and produces consistent, quality output

Skills are primarily written as Markdown files (SKILL.md) and can include:

  • Step-by-step procedures: What to do and in what order
  • Scripts: Automatable processes
  • Samples and config: Resources for the agent to reference

Why Are Skills Needed?

AI agents are extremely capable, but they don't know anything specific about your project.

For example:

  • "How does this team write commit messages?"
  • "What frontmatter format does this blog use?"
  • "Which commands are used for deployment?"

Without skills, agents can't know any of this. Skills let agents understand "the right way to do things" before acting.

How an Agent Processes a Skill

Let's look at what's happening inside the agent.

Here are the key points:

1. Loading the Skill

The agent reads the skill at the start. The skill content is passed as part of the LLM's input (prompt). The LLM reads this and understands "the right approach for this task."

2. Breaking Down the Task

Based on the instructions, the LLM breaks the task into smaller steps: "Read 3 existing posts first," "then decide on a filename," "then write the frontmatter," and so on.

3. Calling Tools

At each step, the agent calls tools as needed — reading files, searching the web, executing code — following the procedure defined by the skill.

4. Feeding Back Results

Tool results are passed back to the LLM. The LLM looks at the results and decides what to do next, looping until the task is complete.

Skill Commands

Skills can be invoked as slash commands (/command-name).

When a command is called, the corresponding Markdown file's content is expanded as a prompt, and the agent begins executing those steps.

Skills Are Growing

The Agent Skills format was developed and open-sourced by Anthropic and is now supported by many tools:

ToolSupported
Claude Code
GitHub Copilot
Cursor
Gemini CLI
OpenAI Codex
VS Code

The biggest advantage is that the same skill can be reused across different tools.

Summary

  • Skills are a mechanism for giving agents specialized knowledge and procedures
  • You can create one by writing steps and rules in a Markdown file (SKILL.md)
  • The agent receives the skill as a prompt; the LLM interprets it and executes each step
  • It's an open standard supported by Claude Code, Cursor, GitHub Copilot, and many more

With skills, you no longer have to explain the same things to your AI every time — agents can perform tasks with consistent quality, exactly the way you want.