CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project overview
A Java library providing ASCII-safe Charset SPI
implementations that transliterate Unicode to ASCII subsets rather than
simply rejecting non-ASCII input. Published to GitHub Packages as
com.maybeitssquid:ascii-safe-charsets.
Commands
./gradlew build # compile, run tests, spotless check
./gradlew test # tests only
./gradlew spotlessApply # auto-format Java source (required before commit)
./gradlew javadoc # generate Javadoc
# Run a single test class
./gradlew test --tests "com.maybeitssquid.safeascii.CacheTest"Build versions are timestamped (1.0.0-YYYYMMDDHHMMSS);
this is intentional for snapshot publishing.
Architecture
The library wires together two subsystems: a Charset
implementation and a configurable transliteration pipeline.
Charset layer
TransliteratingASCIIProvider—CharsetProviderSPI entry point, registered viasrc/main/resources/META-INF/services/java.nio.charset.spi.CharsetProvider. Provides four charsets lazily:ASCII-Printable— strict printable ASCII (0x20–0x7E only, controls blocked)ASCII-Plain— same but allows LF and normalises CRLF to LFX-Transliterating— aggressive Unicode-to-ASCII transliterationX-Transliterating-Single-Byte(aliasACH) — same but guarantees 1:1 character output
TransliteratingASCII— extendsjava.nio.charset.Charset. Takes anIntFunction<CharSequence>transliterator at construction; the encoder/decoder delegate all codepoint mapping to it.
Transliterator pipeline
Each step implements IntFunction<CharSequence> and
chains to the next. The actual pipelines assembled by the provider
are:
ASCII-Printable / ASCII-Plain:
Cache → ASCIIFilterX-Transliterating:
Cache → Decompose → Name → ASCIIFilterX-Transliterating-Single-Byte:
Cache → SingleCharacterFilter → Decompose → Name → ASCIIFilterASCIIFilter— terminal step; passes ASCII codepoints not in the blocked Unicode categories, rejects everything else with"".Categorize— maps Unicode categories (digits, spaces, dashes, brackets, quotes, etc.) to ASCII equivalents; passes ASCII straight to delegate.Name— extendsCategorize; usesCharacter.getName()to match LATIN LETTERs, brackets, quotation marks, punctuation by name keyword.Decompose— extendsChainable; applies NFKD (or NFD) normalization before further processing; skips codepoints below U+00A0 as an optimization.Cache— extendsChainable; caches results in aCharSequence[128]array for ASCII and aHashMapfor the rest; supports manual pre-population viacache(int, CharSequence).Chainable— abstract base; holds thedelegate, implementsapply()which callsprocess()then fans out the result’s codepoints through the delegate chain.SingleCharacterFilter— wraps another transliterator; returns""for any input that produces a result length ≠ 1, ensuring length-preserving (fixed-width) output.
Code style
Spotless enforces Google Java Format. Run
./gradlew spotlessApply before committing. The formatter
excludes module-info.java.
Security patches
Transitive dependency CVEs are pinned in
gradle/libs.versions.toml as patch-* library
entries collected in the security-patches bundle.
build.gradle applies them as implementation
constraints. settings.gradle also loads them into the
buildscript classpath via regex. New CVE patches follow the same
patch-cve-XXXX-NNNN naming convention.
The OWASP dependency check plugin
(./gradlew dependencyCheckAnalyze) fails the build at CVSS
≥ 7.