Class Decompose

java.lang.Object
com.maybeitssquid.safeascii.Chainable
com.maybeitssquid.safeascii.Decompose
All Implemented Interfaces:
IntFunction<CharSequence>

public class Decompose extends Chainable
A Chainable step that normalizes Unicode characters to a specific form.

This class uses Normalizer to decompose characters. By default, it uses Normalizer.Form.NFKD (Compatibility Decomposition), which is useful for converting compatibility characters (like ligatures or wide characters) into their base components before further processing. Only decomposition is supported, not composition.

See Also:
  • Field Details

    • LOWEST_COMPOSED_CODEPOINT

      public static final int LOWEST_COMPOSED_CODEPOINT
      The lowest codepoint value that is not decomposed, corresponding to NO-BREAK SPACE
      See Also:
  • Constructor Details

    • Decompose

      public Decompose(IntFunction<CharSequence> delegate, Normalizer.Form form)
      Creates a new Decompose instance with the specified normalization (decomposition) form.
      Parameters:
      delegate - the next step in the processing chain
      form - the Normalizer.Form to use for character decomposition, which must be NFD or NFKD.
    • Decompose

      public Decompose(IntFunction<CharSequence> delegate)
      Creates a new Decompose instance using the default Normalizer.Form.NFKD normalization form.
      Parameters:
      delegate - the next step in the processing chain
  • Method Details

    • process

      protected CharSequence process(int codepoint)
      Normalizes a single codepoint.
      Specified by:
      process in class Chainable
      Parameters:
      codepoint - the Unicode codepoint to process
      Returns:
      the normalized string representation of the codepoint
    • apply

      public CharSequence apply(int value)
      Applies normalization to the input value.

      This method includes an optimization to skip normalization for characters below LOWEST_COMPOSED_CODEPOINT, because they are invariant under all normalization forms.

      Specified by:
      apply in interface IntFunction<CharSequence>
      Overrides:
      apply in class Chainable
      Parameters:
      value - the input codepoint
      Returns:
      the processed character sequence from the delegate chain