com.maybeitssquid.safeascii.Name

All Implemented Interfaces:: IntFunction<CharSequence>

public class Name extends Categorize

A transliteration step that converts Unicode characters to ASCII based on their Unicode names.

This class extends Categorize to provide more granular mappings for characters that cannot be mapped simply by category. It parses the Unicode name of a character (retrieved via Character.getName(int)) to find ASCII equivalents for:

Latin letters (including those with diacritics)
Bracket types (square, curly, angle)
Quotation marks
Various symbols and punctuation

Field Summary

Fields

Modifier and Type

Field

Description

static final int

UNICODE_CIRCLED_C_WITH_OVERLAID_BACKSLASH

Unicode codepoint for CIRCLED C WITH OVERLAID BACKSLASH (U+1F16E), transliterates to C.

static final int

UNICODE_CIRCLED_DOLLAR_SIGN_WITH_OVERLAID_BACKSLASH

Unicode codepoint for CIRCLED DOLLAR SIGN WITH OVERLAID BACKSLASH (U+1F10F), transliterates to $.

static final int

UNICODE_CIRCLED_ZERO_WITH_SLASH

Unicode codepoint for CIRCLED ZERO WITH SLASH (U+1F10D), transliterates to 0.

static final int

UNICODE_COLON_EQUALS

Unicode codepoint for COLON EQUALS (U+2254), transliterates to :=.

static final int

UNICODE_COLON_SIGN

Unicode codepoint for COLON SIGN (U+20A1), the Colombian currency symbol.

static final int

UNICODE_DOUBLE_SOLIDUS_OPERATOR

Unicode codepoint for DOUBLE SOLIDUS OPERATOR (U+2AFD), transliterates to //.

static final int

UNICODE_EQUALS_COLON

Unicode codepoint for EQUALS COLON (U+2255), transliterates to =:.

static final int

UNICODE_OCR_DOUBLE_BACKSLASH

Unicode codepoint for OCR DOUBLE BACKSLASH (U+244A), transliterates to \\.

static final int

UNICODE_TRIPLE_SOLIDUS_BINARY_RELATION

Unicode codepoint for TRIPLE SOLIDUS BINARY RELATION (U+2AFB), transliterates to ///.

Fields inherited from class com.maybeitssquid.safeascii.Categorize
identity, UNICODE_NEL, UNICODE_REPLACEMENT

Fields inherited from class com.maybeitssquid.safeascii.Chainable
ASCII, delegate
Constructor Summary

Constructors

Constructor

Description

Name()

Creates a new Name transliterator with an identity delegate and default line separator.

Name(IntFunction<CharSequence> delegate)

Creates a new Name transliterator with the specified delegate and default line separator.

Name(IntFunction<CharSequence> delegate, CharSequence lineSeparator)

Creates a new Name transliterator with the specified delegate and line separator.
Method Summary

Modifier and Type

Method

Description

protected CharSequence

byName(int codepoint)

Converts a Unicode codepoint to ASCII by analyzing its character name.

protected CharSequence

endPunctuation(int codepoint)

Maps end punctuation to ASCII brackets based on name.

protected CharSequence

equal(int codepoint)

Converts equality-related characters to ASCII equivalents.

protected CharSequence

lowercase(int codepoint)

Extracts the base ASCII character for a lowercase letter from its name.

protected CharSequence

process(int codepoint)

Transliterates a codepoint based on its type and name.

protected CharSequence

quotePunctuation(int codepoint)

Maps quote punctuation to ASCII quotes based on name.

protected CharSequence

solidus(String name, int codepoint)

Converts solidus (slash) and backslash characters to ASCII equivalents.

protected CharSequence

startPunctuation(int codepoint)

Maps start punctuation to ASCII brackets based on name.

protected CharSequence

titlecase(int codepoint)

Processes the titlecase characters.

protected CharSequence

uppercase(int codepoint)

Extracts the base ASCII character for an uppercase letter from its name.

Methods inherited from class com.maybeitssquid.safeascii.Categorize
apply, getLineSeparator

Methods inherited from class com.maybeitssquid.safeascii.Chainable
delegate

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- UNICODE_CIRCLED_ZERO_WITH_SLASH
  
  public static final int UNICODE_CIRCLED_ZERO_WITH_SLASH
  
  Unicode codepoint for CIRCLED ZERO WITH SLASH (U+1F10D), transliterates to 0.
  See Also:
  
  Constant Field Values
- UNICODE_TRIPLE_SOLIDUS_BINARY_RELATION
  
  public static final int UNICODE_TRIPLE_SOLIDUS_BINARY_RELATION
  
  Unicode codepoint for TRIPLE SOLIDUS BINARY RELATION (U+2AFB), transliterates to ///.
  See Also:
  
  Constant Field Values
- UNICODE_DOUBLE_SOLIDUS_OPERATOR
  
  public static final int UNICODE_DOUBLE_SOLIDUS_OPERATOR
  
  Unicode codepoint for DOUBLE SOLIDUS OPERATOR (U+2AFD), transliterates to //.
  See Also:
  
  Constant Field Values
- UNICODE_OCR_DOUBLE_BACKSLASH
  
  public static final int UNICODE_OCR_DOUBLE_BACKSLASH
  
  Unicode codepoint for OCR DOUBLE BACKSLASH (U+244A), transliterates to \\.
  See Also:
  
  Constant Field Values
- UNICODE_COLON_SIGN
  
  public static final int UNICODE_COLON_SIGN
  
  Unicode codepoint for COLON SIGN (U+20A1), the Colombian currency symbol. Not transliterated to colon.
  See Also:
  
  Constant Field Values
- UNICODE_COLON_EQUALS
  
  public static final int UNICODE_COLON_EQUALS
  
  Unicode codepoint for COLON EQUALS (U+2254), transliterates to :=.
  See Also:
  
  Constant Field Values
- UNICODE_EQUALS_COLON
  
  public static final int UNICODE_EQUALS_COLON
  
  Unicode codepoint for EQUALS COLON (U+2255), transliterates to =:.
  See Also:
  
  Constant Field Values
- UNICODE_CIRCLED_DOLLAR_SIGN_WITH_OVERLAID_BACKSLASH
  
  public static final int UNICODE_CIRCLED_DOLLAR_SIGN_WITH_OVERLAID_BACKSLASH
  
  Unicode codepoint for CIRCLED DOLLAR SIGN WITH OVERLAID BACKSLASH (U+1F10F), transliterates to $.
  See Also:
  
  Constant Field Values
- UNICODE_CIRCLED_C_WITH_OVERLAID_BACKSLASH
  
  public static final int UNICODE_CIRCLED_C_WITH_OVERLAID_BACKSLASH
  
  Unicode codepoint for CIRCLED C WITH OVERLAID BACKSLASH (U+1F16E), transliterates to C.
  See Also:
  
  Constant Field Values
Constructor Details
- Name
  
  public Name(IntFunction<CharSequence> delegate, CharSequence lineSeparator)
  
  Creates a new Name transliterator with the specified delegate and line separator.
  
  Parameters:
  
  delegate - the next step in the processing chain
  
  lineSeparator - the string to use for line separators
- Name
  
  public Name(IntFunction<CharSequence> delegate)
  
  Creates a new Name transliterator with the specified delegate and default line separator.
  
  Parameters:
  
  delegate - the next step in the processing chain
- Name
  
  public Name()
  
  Creates a new Name transliterator with an identity delegate and default line separator.
Method Details
- process
  
  protected CharSequence process(int codepoint)
  
  Transliterates a codepoint based on its type and name.
  Overrides:
  
  process in class Categorize
  
  Parameters:
  
  codepoint - the Unicode codepoint to process
  
  Returns:
  
  the transliterated ASCII string, or the result of the superclass processing if no specific name-based rule applies
  
  See Also:
  
  Character.getType(int)
- uppercase
  
  protected CharSequence uppercase(int codepoint)
  
  Extracts the base ASCII character for an uppercase letter from its name.
  
  Parameters:
  
  codepoint - the uppercase codepoint to process
  
  Returns:
  
  the base letter if found in the name, otherwise an empty string
- lowercase
  
  protected CharSequence lowercase(int codepoint)
  
  Extracts the base ASCII character for a lowercase letter from its name.
  
  Parameters:
  
  codepoint - the lowercase codepoint to process
  
  Returns:
  
  the base letter converted to lowercase if found, otherwise an empty string
- titlecase
  
  protected CharSequence titlecase(int codepoint)
  
  Processes the titlecase characters. There are only four that transliterate to ASCII; the remainder are Greek. This method will have no effect if a prior processing step has already decomposed the codepoint.
  
  Parameters:
  
  codepoint - the title case codepoint to process
  
  Returns:
  
  the ASCII equivalent, or an empty string if not found
- startPunctuation
  
  protected CharSequence startPunctuation(int codepoint)
  Maps start punctuation to ASCII brackets based on name.
  Detects:
  
  Square brackets: [
  Curly braces: {
  Angle brackets: <
  Others (default): (
  Overrides:
  
  startPunctuation in class Categorize
  
  Parameters:
  
  codepoint - the codepoint to process
  
  Returns:
  
  the corresponding ASCII opening bracket
- endPunctuation
  
  protected CharSequence endPunctuation(int codepoint)
  Maps end punctuation to ASCII brackets based on name.
  Detects:
  
  Square brackets: ]
  Curly braces: }
  Angle brackets: >
  Others (default): )
  Overrides:
  
  endPunctuation in class Categorize
  
  Parameters:
  
  codepoint - the codepoint to process
  
  Returns:
  
  the corresponding ASCII closing bracket
- quotePunctuation
  
  protected CharSequence quotePunctuation(int codepoint)
  
  Maps quote punctuation to ASCII quotes based on name.
  Maps to " if the name contains "DOUBLE" or "DOTTED", otherwise maps to '.
  
  Overrides:
  
  quotePunctuation in class Categorize
  
  Parameters:
  
  codepoint - the codepoint to process
  
  Returns:
  
  the corresponding ASCII quote character
- solidus
  
  protected CharSequence solidus(String name, int codepoint)
  
  Converts solidus (slash) and backslash characters to ASCII equivalents.
  
  Parameters:
  
  name - the Unicode name of the character
  
  codepoint - the codepoint to process
  
  Returns:
  
  the ASCII equivalent: \ for reverse solidus/backslash, / for solidus/slash, with special handling for double variants
- equal
  
  protected CharSequence equal(int codepoint)
  
  Converts equality-related characters to ASCII equivalents.
  
  Parameters:
  
  codepoint - the codepoint to process
  
  Returns:
  
  := for COLON EQUALS, =: for EQUALS COLON, or = for other equality symbols
- byName
  
  protected CharSequence byName(int codepoint)
  Converts a Unicode codepoint to ASCII by analyzing its character name.
  This method uses Character.getName(int) to retrieve the Unicode character name and matches it against known naming patterns to determine the appropriate ASCII equivalent. It handles common punctuation marks and symbols by checking if their names contain specific keywords.
  Recognized patterns include:
  
  LATIN [CAPITAL|SMALL] LETTER → corresponding letter
  REVERSE SOLIDUS, BACKSLASH → \ (exceptions noted)
  SOLIDUS, SLASH → / (exceptions noted)
  EQUAL → = (special cases for colon equals and equals colon)
  AMPERSAND → &
  FULL STOP → . (does not handle composed "[DIGIT|NUMBER] x FULL STOP")
  APOSTROPHE → '
  EXCLAMATION MARK → !
  QUESTION → ?
  INTERROBANG → ?!
  ASTERISK → *
  SEMICOLON → ;
  PERCENT → %
  PLUS SIGN → +
  MULTIPLICATION → X
  COMMA → ,
  COLON → : (except Colombian currency symbol)
  TILDE → ~
  Parameters:
  
  codepoint - the Unicode codepoint to process
  
  Returns:
  
  the ASCII equivalent based on name patterns, or the original character via Categorize.identity if no pattern matches

Class Name

Field Summary

Fields inherited from class com.maybeitssquid.safeascii.Categorize

Fields inherited from class com.maybeitssquid.safeascii.Chainable

Constructor Summary

Method Summary

Methods inherited from class com.maybeitssquid.safeascii.Categorize

Methods inherited from class com.maybeitssquid.safeascii.Chainable

Methods inherited from class java.lang.Object

Field Details

UNICODE_CIRCLED_ZERO_WITH_SLASH

UNICODE_TRIPLE_SOLIDUS_BINARY_RELATION

UNICODE_DOUBLE_SOLIDUS_OPERATOR

UNICODE_OCR_DOUBLE_BACKSLASH

UNICODE_COLON_SIGN

UNICODE_COLON_EQUALS

UNICODE_EQUALS_COLON

UNICODE_CIRCLED_DOLLAR_SIGN_WITH_OVERLAID_BACKSLASH

UNICODE_CIRCLED_C_WITH_OVERLAID_BACKSLASH

Constructor Details

Name

Name

Name

Method Details

process

uppercase

lowercase

titlecase

startPunctuation

endPunctuation

quotePunctuation

solidus

equal

byName