Infrastructure teams use YAML everywhere:
But YAML has a subtle security and reliability problem - Unicode characters that look harmless to humans — or that are invisible altogether — can cause surprisingly dangerous behavior.
This can lead to a variety of problems, like broken deployments, silently ignored fields.
Consider this YAML:
apiVersion: v1
kind: ConfigMap
Or if you speak command line:
printf 'apiVersion\u200b: v1\nkind: ConfigMap\n' > bad.yaml
cat bad.yaml
The key visually appears as:
apiVersion
But the parser actually sees:
apiVersion\u200b
If your Go struct is:
type Config struct {
APIVersion string `yaml:"apiVersion"`
}
then unmarshaling silently fails:
Config{
APIVersion: "",
}
Unicode also enables homoglyph attacks — visually similar characters from different scripts.
| Character | Unicode | Script |
|---|---|---|
a | U+0061 | Latin |
а | U+0430 | Cyrillic |
This YAML:
аpiVersion: v1
contains a Cyrillic а, not a Latin a.
Humans may never notice the difference.
Unicode and UTF-8 are related but different concepts.
| Concept | Meaning |
|---|---|
| Unicode | abstract character set |
| UTF-8 | byte encoding of Unicode |
Most risky YAML files are:
Humans review visually, and parsers compare exact characters.
That mismatch creates room for confusion and spoofing.
| Character | Unicode | Risk |
|---|---|---|
| Zero-width space | U+200B | invisible key corruption |
| Zero-width joiner | U+200D | rendering tricks |
| BOM | U+FEFF | parser inconsistencies |
| RTL override | U+202E | visual spoofing |
| Character | Unicode |
|---|---|
| Non-breaking space | U+00A0 |
| Thin space | U+2009 |
These are especially dangerous in YAML indentation.
A layered approach works best:
---
extends: default
rules:
document-start: enable
key-duplicates: enable
trailing-spaces: enable
grep -rPn '[^\x00-\x7F]' . --include='*.yaml'
decoder.KnownFields(true)
Validate against:
Not all Unicode is dangerous.
For example, these are legitimate:
serviceName: 東京-api
description: 宮本武蔵のAPI
A practical policy:
| Area | Policy |
|---|---|
| YAML keys | ASCII-only |
| Identifiers | restricted |
| Values/comments | Unicode allowed |
Alternatively, you could use yuc - a configurable tool for checking your yaml.
This lets you define the severity, and rules for checking unicode in your yaml.
Install yuc with a single command:
curl -fsSL https://raw.githubusercontent.com/harshadixit12/yuc/refs/heads/main/install.sh | bash
Source Repository:
https://github.com/harshadixit12/yuc
Example:
printf 'apiVersion\u00a0: v1\nkind: ConfigMap\n' > bad.yaml
yuc bad.yaml