Class OkapiWordBreaker


  • public final class OkapiWordBreaker
    extends Object
    Returns positions where inline codes can be safely inserted in text. Uses ICU4J word breaking for proper language support, augmented with positions after punctuation and whitespace. Includes: - Position 0 (start of text) - All ICU4J word boundaries (language-aware) - After each punctuation and whitespace character - Position text.length() (end of text)
     "Hello, world!" → [0, 5, 6, 7, 12, 13]
     "日本語のテスト" → [0, 3, 4, 7] (CJK word breaks)
     
    • Method Detail

      • getWordBreakPositions

        public static List<Integer> getWordBreakPositions​(String text,
                                                          net.sf.okapi.common.LocaleId locId)