Class HeuristicAligner
- java.lang.Object
-
- net.sf.okapi.steps.heuristicaligner.HeuristicAligner
-
public class HeuristicAligner extends Object
Final, production-ready heuristic aligner: - Plain-text similarity (no inline codes) → highest accuracy - Preserves all inline codes in output - Fixes broken PDFs via translation-aware re-segmentation - Uses sophisticated heuristics for both paragraph AND sentence matching
-
-
Field Summary
Fields Modifier and Type Field Description static doublePARAGRAPH_MATCH_THRESHOLD
-
Constructor Summary
Constructors Constructor Description HeuristicAligner()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description List<net.sf.okapi.steps.heuristicaligner.HeuristicSentenceAlignerStep.SentenceMatch>alignSentencesInTu(net.sf.okapi.common.resource.ITextUnit tu, net.sf.okapi.common.LocaleId sourceLocale, net.sf.okapi.common.LocaleId targetLocale)Align sentences within a text unit using DP and sophisticated similarityMap<String,String>batchBackTranslate(List<String> texts, net.sf.okapi.common.LocaleId from, net.sf.okapi.common.LocaleId to)Batch back-translate multiple texts (used for sentence-level alignment)doublecalculateParagraphSimilarity(String src, String trg, net.sf.okapi.common.LocaleId srcLocale, net.sf.okapi.common.LocaleId trgLocale)Calculate paragraph-level similarity using sophisticated heuristics with back-translation.voidsetTranslationAwareResegmentation(boolean enabled)
-
-
-
Field Detail
-
PARAGRAPH_MATCH_THRESHOLD
public static final double PARAGRAPH_MATCH_THRESHOLD
- See Also:
- Constant Field Values
-
-
Method Detail
-
setTranslationAwareResegmentation
public void setTranslationAwareResegmentation(boolean enabled)
-
calculateParagraphSimilarity
public double calculateParagraphSimilarity(String src, String trg, net.sf.okapi.common.LocaleId srcLocale, net.sf.okapi.common.LocaleId trgLocale)
Calculate paragraph-level similarity using sophisticated heuristics with back-translation. This is the MAIN method for paragraph matching in performImprovedAlignment().
-
batchBackTranslate
public Map<String,String> batchBackTranslate(List<String> texts, net.sf.okapi.common.LocaleId from, net.sf.okapi.common.LocaleId to)
Batch back-translate multiple texts (used for sentence-level alignment)
-
alignSentencesInTu
public List<net.sf.okapi.steps.heuristicaligner.HeuristicSentenceAlignerStep.SentenceMatch> alignSentencesInTu(net.sf.okapi.common.resource.ITextUnit tu, net.sf.okapi.common.LocaleId sourceLocale, net.sf.okapi.common.LocaleId targetLocale)
Align sentences within a text unit using DP and sophisticated similarity
-
-