Package com.maybeitssquid.safeascii
Class Decompose
java.lang.Object
com.maybeitssquid.safeascii.Chainable
com.maybeitssquid.safeascii.Decompose
- All Implemented Interfaces:
IntFunction<CharSequence>
A
Chainable step that normalizes Unicode characters to a specific form.
This class uses Normalizer to decompose characters. By default, it uses
Normalizer.Form.NFKD (Compatibility Decomposition), which is useful for
converting compatibility characters (like ligatures or wide characters) into their base
components before further processing. Only decomposition is supported, not composition.
- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final intThe lowest codepoint value that is not decomposed, corresponding to NO-BREAK SPACE -
Constructor Summary
ConstructorsConstructorDescriptionDecompose(IntFunction<CharSequence> delegate) Creates a new Decompose instance using the defaultNormalizer.Form.NFKDnormalization form.Decompose(IntFunction<CharSequence> delegate, Normalizer.Form form) Creates a new Decompose instance with the specified normalization (decomposition) form. -
Method Summary
Modifier and TypeMethodDescriptionapply(int value) Applies normalization to the input value.protected CharSequenceprocess(int codepoint) Normalizes a single codepoint.
-
Field Details
-
LOWEST_COMPOSED_CODEPOINT
public static final int LOWEST_COMPOSED_CODEPOINTThe lowest codepoint value that is not decomposed, corresponding to NO-BREAK SPACE- See Also:
-
-
Constructor Details
-
Decompose
Creates a new Decompose instance with the specified normalization (decomposition) form.- Parameters:
delegate- the next step in the processing chainform- theNormalizer.Formto use for character decomposition, which must be NFD or NFKD.
-
Decompose
Creates a new Decompose instance using the defaultNormalizer.Form.NFKDnormalization form.- Parameters:
delegate- the next step in the processing chain
-
-
Method Details
-
process
Normalizes a single codepoint. -
apply
Applies normalization to the input value.This method includes an optimization to skip normalization for characters below
LOWEST_COMPOSED_CODEPOINT, because they are invariant under all normalization forms.- Specified by:
applyin interfaceIntFunction<CharSequence>- Overrides:
applyin classChainable- Parameters:
value- the input codepoint- Returns:
- the processed character sequence from the delegate chain
-