Package com.acumenvelocity.ath.common
Class OkapiWordBreaker
- java.lang.Object
-
- com.acumenvelocity.ath.common.OkapiWordBreaker
-
public final class OkapiWordBreaker extends Object
Returns positions where inline codes can be safely inserted in text. Uses ICU4J word breaking for proper language support, augmented with positions after punctuation and whitespace. Includes: - Position 0 (start of text) - All ICU4J word boundaries (language-aware) - After each punctuation and whitespace character - Position text.length() (end of text)"Hello, world!" → [0, 5, 6, 7, 12, 13] "日本語のテスト" → [0, 3, 4, 7] (CJK word breaks)
-
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static List<Integer>getWordBreakPositions(String text, net.sf.okapi.common.LocaleId locId)
-