Class TransliteratingASCII

java.lang.Object
java.nio.charset.Charset
com.maybeitssquid.safeascii.TransliteratingASCII
All Implemented Interfaces:
Comparable<Charset>

public class TransliteratingASCII extends Charset
A custom charset implementation that transliterates Unicode code points to ASCII characters using a configurable transliteration function.

This charset provides bidirectional encoding and decoding between Unicode text and a restricted ASCII-based byte representation. It uses a supplied transliterator function to map Unicode code points to their ASCII equivalents (which may be zero or more characters).

  • Constructor Details

    • TransliteratingASCII

      protected TransliteratingASCII(IntFunction<CharSequence> transliterator, String... names)
      Initializes a new charset with the given canonical name and alias set.
      Parameters:
      transliterator - The function to convert a code point into zero or more characters
      names - The canonical name of this charset followed by any aliases
  • Method Details

    • containsASCII

      public boolean containsASCII()
      Determines whether this charset provides identity mapping for all ASCII characters.

      This method verifies that the transliterator preserves every character in the ASCII range (0x00-0x7F) without modification. For each ASCII code point, it checks that:

      • The transliterator returns a non-null result
      • The result contains exactly one character
      • That character is identical to the input character

      This method is used internally by contains(Charset) to determine whether this charset can be considered to contain StandardCharsets.US_ASCII. According to the Charset specification, a charset C contains charset D if every character representable in D is also representable in C with the same byte sequence.

      Returns:
      true if all ASCII characters (0x00-0x7F) are mapped to themselves by the transliterator; false otherwise
      See Also:
    • contains

      public boolean contains(Charset cs)
      Tests whether this charset contains the given charset.

      Behavior: returns true if cs is this instance; false if cs is null; delegates to containsASCII() when StandardCharsets.US_ASCII is supplied; otherwise false.

      Specified by:
      contains in class Charset
      Parameters:
      cs - the charset to test, may be null
      Returns:
      true if this charset contains cs per Charset.contains(Charset)
      See Also:
    • newDecoder

      public CharsetDecoder newDecoder()
      Creates a decoder that transliterates single bytes (0x00–0x7F) to characters. Only characters that are allowed by the configured transliteration function are returned.

      Recommendation: prefer standard decoders (e.g., UTF-8, US-ASCII, ISO-8859-1, or windows-1252) for general-purpose ASCII decoding; they handle a wider range of inputs. Use this decoder only when input processing must be extremely rigid and only exactly compliant input is allowed.

      Specified by:
      newDecoder in class Charset
      Returns:
      a CharsetDecoder that applies the transliterator to single bytes
      See Also:
    • newEncoder

      public CharsetEncoder newEncoder()
      Creates an encoder that maps Unicode code points to ASCII bytes using the transliterator.

      Behavior: the transliterator is applied per code point. If the result is empty, the input is reported unmappable for its length (1 or 2 for supplementary code points). If any resulting character is > 0x7F the input is unmappable. If the output buffer lacks space the encoder returns CoderResult.OVERFLOW. Valid transliterations are written as their low-7-bit byte values.

      The encoder consumes the correct input length for supplementary code points and uses '?' as the replacement byte for unmappable input.

      Specified by:
      newEncoder in class Charset
      Returns:
      a CharsetEncoder that emits ASCII bytes per the transliterator