Class PdfUtil


  • public final class PdfUtil
    extends Object
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static InputStream convertDocxToPdf​(InputStream docxInputStream, net.sf.okapi.common.LocaleId locale)
      Convert DOCX to PDF using Adobe PDF Services
      static InputStream convertPdfToDocx​(InputStream pdfInputStream, net.sf.okapi.common.LocaleId locale, OcrMode ocrMode)
      Convert PDF to DOCX using Adobe PDF Services
      static com.adobe.pdfservices.operation.pdfjobs.params.createpdf.word.DocumentLanguage getDocumentLanguage​(net.sf.okapi.common.LocaleId locale)
      Converts an Okapi LocaleId to Adobe DocumentLanguage.
      static com.adobe.pdfservices.operation.PDFServices getPdfServices()
      Get the PDF Services instance for the writer to use
      static void init()  
      static boolean needsOcr​(File pdfFile)
      Determines whether the given PDF file likely needs OCR (i.e., contains no selectable text).
      static com.adobe.pdfservices.operation.pdfjobs.params.exportpdf.ExportOCRLocale toAdobeLocale​(net.sf.okapi.common.LocaleId locale)
      Converts an Okapi LocaleId to Adobe ExportOCRLocale.
    • Method Detail

      • init

        public static void init()
      • needsOcr

        public static boolean needsOcr​(File pdfFile)
        Determines whether the given PDF file likely needs OCR (i.e., contains no selectable text).

        Works with PDFBox 3.x (uses Loader.loadPDF()).

        Parameters:
        pdfFile - the local PDF file
        Returns:
        true if the PDF appears image-only (no selectable text), false otherwise
      • toAdobeLocale

        public static com.adobe.pdfservices.operation.pdfjobs.params.exportpdf.ExportOCRLocale toAdobeLocale​(net.sf.okapi.common.LocaleId locale)
        Converts an Okapi LocaleId to Adobe ExportOCRLocale. Falls back to EN_US with a warning if the locale is not supported.
        Parameters:
        locale - the Okapi LocaleId to convert
        Returns:
        the corresponding ExportOCRLocale, or EN_US as fallback
      • getDocumentLanguage

        public static com.adobe.pdfservices.operation.pdfjobs.params.createpdf.word.DocumentLanguage getDocumentLanguage​(net.sf.okapi.common.LocaleId locale)
        Converts an Okapi LocaleId to Adobe DocumentLanguage. Falls back to EN_US with a warning if the locale is not supported.
        Parameters:
        locale - the Okapi LocaleId to convert
        Returns:
        the corresponding DocumentLanguage, or EN_US as fallback
      • convertDocxToPdf

        public static InputStream convertDocxToPdf​(InputStream docxInputStream,
                                                   net.sf.okapi.common.LocaleId locale)
                                            throws Exception
        Convert DOCX to PDF using Adobe PDF Services
        Throws:
        Exception
      • getPdfServices

        public static com.adobe.pdfservices.operation.PDFServices getPdfServices()
        Get the PDF Services instance for the writer to use