Is your YAML hiding something?

Infrastructure teams use YAML everywhere:

But YAML has a subtle security and reliability problem - Unicode characters that look harmless to humans — or that are invisible altogether — can cause surprisingly dangerous behavior.

This can lead to a variety of problems, like broken deployments, silently ignored fields.


Index


The problem

A simple example

Consider this YAML:

apiVersion​: v1
kind: ConfigMap

Or if you speak command line:

printf 'apiVersion\u200b: v1\nkind: ConfigMap\n' > bad.yaml
cat bad.yaml

The key visually appears as:

apiVersion

But the parser actually sees:

apiVersion\u200b

If your Go struct is:

type Config struct {
    APIVersion string `yaml:"apiVersion"`
}

then unmarshaling silently fails:

Config{
    APIVersion: "",
}

Homoglyphs

Unicode also enables homoglyph attacks — visually similar characters from different scripts.

CharacterUnicodeScript
aU+0061Latin
аU+0430Cyrillic

This YAML:

аpiVersion: v1

contains a Cyrillic а, not a Latin a.

Humans may never notice the difference.


Why this happens

Unicode and UTF-8 are related but different concepts.

ConceptMeaning
Unicodeabstract character set
UTF-8byte encoding of Unicode

Most risky YAML files are:

Humans review visually, and parsers compare exact characters.

That mismatch creates room for confusion and spoofing.


The risky Unicode characters

Invisible formatting characters

CharacterUnicodeRisk
Zero-width spaceU+200Binvisible key corruption
Zero-width joinerU+200Drendering tricks
BOMU+FEFFparser inconsistencies
RTL overrideU+202Evisual spoofing

Confusing whitespace

CharacterUnicode
Non-breaking spaceU+00A0
Thin spaceU+2009

These are especially dangerous in YAML indentation.


Defending your CI/CD pipelines

A layered approach works best:

Lint YAML

---
extends: default

rules:
  document-start: enable
  key-duplicates: enable
  trailing-spaces: enable

Scan for Unicode

grep -rPn '[^\x00-\x7F]' . --include='*.yaml'

Use strict unmarshaling

decoder.KnownFields(true)

Validate schemas

Validate against:


A reasonable Unicode policy

Not all Unicode is dangerous.

For example, these are legitimate:

serviceName: 東京-api
description: 宮本武蔵のAPI

A practical policy:

AreaPolicy
YAML keysASCII-only
Identifiersrestricted
Values/commentsUnicode allowed

Using yuc

Alternatively, you could use yuc - a configurable tool for checking your yaml. This lets you define the severity, and rules for checking unicode in your yaml.

Install

Install yuc with a single command:

curl -fsSL https://raw.githubusercontent.com/harshadixit12/yuc/refs/heads/main/install.sh | bash

Source Repository:

https://github.com/harshadixit12/yuc

Example:

printf 'apiVersion\u00a0: v1\nkind: ConfigMap\n' > bad.yaml
yuc bad.yaml