Class TransliteratingASCII
- All Implemented Interfaces:
Comparable<Charset>
This charset provides bidirectional encoding and decoding between Unicode text and a restricted ASCII-based byte representation. It uses a supplied transliterator function to map Unicode code points to their ASCII equivalents (which may be zero or more characters).
-
Constructor Summary
ConstructorsModifierConstructorDescriptionprotectedTransliteratingASCII(IntFunction<CharSequence> transliterator, String... names) Initializes a new charset with the given canonical name and alias set. -
Method Summary
Modifier and TypeMethodDescriptionbooleanTests whether this charset contains the given charset.booleanDetermines whether this charset provides identity mapping for all ASCII characters.Creates a decoder that transliterates single bytes (0x00–0x7F) to characters.Creates an encoder that maps Unicode code points to ASCII bytes using the transliterator.Methods inherited from class java.nio.charset.Charset
aliases, availableCharsets, canEncode, compareTo, decode, defaultCharset, displayName, displayName, encode, encode, equals, forName, forName, hashCode, isRegistered, isSupported, name, toString
-
Constructor Details
-
TransliteratingASCII
Initializes a new charset with the given canonical name and alias set.- Parameters:
transliterator- The function to convert a code point into zero or more charactersnames- The canonical name of this charset followed by any aliases
-
-
Method Details
-
containsASCII
public boolean containsASCII()Determines whether this charset provides identity mapping for all ASCII characters.This method verifies that the transliterator preserves every character in the ASCII range (0x00-0x7F) without modification. For each ASCII code point, it checks that:
- The transliterator returns a non-null result
- The result contains exactly one character
- That character is identical to the input character
This method is used internally by
contains(Charset)to determine whether this charset can be considered to containStandardCharsets.US_ASCII. According to theCharsetspecification, a charset C contains charset D if every character representable in D is also representable in C with the same byte sequence.- Returns:
trueif all ASCII characters (0x00-0x7F) are mapped to themselves by the transliterator;falseotherwise- See Also:
-
contains
Tests whether this charset contains the given charset.Behavior: returns
trueifcsis this instance;falseifcsisnull; delegates tocontainsASCII()whenStandardCharsets.US_ASCIIis supplied; otherwisefalse.- Specified by:
containsin classCharset- Parameters:
cs- the charset to test, may benull- Returns:
trueif this charset containscsperCharset.contains(Charset)- See Also:
-
newDecoder
Creates a decoder that transliterates single bytes (0x00–0x7F) to characters. Only characters that are allowed by the configured transliteration function are returned.Recommendation: prefer standard decoders (e.g., UTF-8, US-ASCII, ISO-8859-1, or windows-1252) for general-purpose ASCII decoding; they handle a wider range of inputs. Use this decoder only when input processing must be extremely rigid and only exactly compliant input is allowed.
- Specified by:
newDecoderin classCharset- Returns:
- a
CharsetDecoderthat applies the transliterator to single bytes - See Also:
-
newEncoder
Creates an encoder that maps Unicode code points to ASCII bytes using the transliterator.Behavior: the transliterator is applied per code point. If the result is empty, the input is reported unmappable for its length (1 or 2 for supplementary code points). If any resulting character is > 0x7F the input is unmappable. If the output buffer lacks space the encoder returns
CoderResult.OVERFLOW. Valid transliterations are written as their low-7-bit byte values.The encoder consumes the correct input length for supplementary code points and uses
'?'as the replacement byte for unmappable input.- Specified by:
newEncoderin classCharset- Returns:
- a
CharsetEncoderthat emits ASCII bytes per the transliterator
-