Class HeuristicAligner


  • public class HeuristicAligner
    extends Object
    Final, production-ready heuristic aligner: - Plain-text similarity (no inline codes) → highest accuracy - Preserves all inline codes in output - Fixes broken PDFs via translation-aware re-segmentation - Uses sophisticated heuristics for both paragraph AND sentence matching
    • Field Detail

      • PARAGRAPH_MATCH_THRESHOLD

        public static final double PARAGRAPH_MATCH_THRESHOLD
        See Also:
        Constant Field Values
    • Constructor Detail

      • HeuristicAligner

        public HeuristicAligner()
    • Method Detail

      • setTranslationAwareResegmentation

        public void setTranslationAwareResegmentation​(boolean enabled)
      • calculateParagraphSimilarity

        public double calculateParagraphSimilarity​(String src,
                                                   String trg,
                                                   net.sf.okapi.common.LocaleId srcLocale,
                                                   net.sf.okapi.common.LocaleId trgLocale)
        Calculate paragraph-level similarity using sophisticated heuristics with back-translation. This is the MAIN method for paragraph matching in performImprovedAlignment().
      • batchBackTranslate

        public Map<String,​String> batchBackTranslate​(List<String> texts,
                                                           net.sf.okapi.common.LocaleId from,
                                                           net.sf.okapi.common.LocaleId to)
        Batch back-translate multiple texts (used for sentence-level alignment)
      • alignSentencesInTu

        public List<net.sf.okapi.steps.heuristicaligner.HeuristicSentenceAlignerStep.SentenceMatch> alignSentencesInTu​(net.sf.okapi.common.resource.ITextUnit tu,
                                                                                                                       net.sf.okapi.common.LocaleId sourceLocale,
                                                                                                                       net.sf.okapi.common.LocaleId targetLocale)
        Align sentences within a text unit using DP and sophisticated similarity