Class SolrTmFilter

  • All Implemented Interfaces:
    AutoCloseable, Iterator<net.sf.okapi.common.Event>, net.sf.okapi.common.filters.IFilter

    public class SolrTmFilter
    extends net.sf.okapi.common.filters.AbstractFilter
    Streaming Solr translation memory filter designed for large-scale TM operations. Leverages Solr's deep paging capabilities through cursor marks to efficiently process millions of translation units without exhausting heap memory. This filter transforms Solr query results into Okapi event streams suitable for integration with translation processing pipelines. Documents are retrieved in configurable page sizes and processed incrementally, making it ideal for production environments with substantial translation memory databases.
    • Field Summary

      • Fields inherited from interface net.sf.okapi.common.filters.IFilter

        SUB_FILTER
    • Constructor Summary

      Constructors 
      Constructor Description
      SolrTmFilter​(org.apache.solr.client.solrj.SolrClient solrClient, String tmCollection, UUID tmId)
      Constructs a streaming filter for a specific translation memory.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void close()
      Terminates the filter and releases associated resources.
      long estimateTotalSegments()
      Queries Solr for the total count of matching segments without retrieval.
      org.apache.solr.client.solrj.SolrQuery getQuery()
      Provides read access to the query configuration.
      String getTmCollection()
      Returns the collection name being queried.
      UUID getTmId()
      Returns the translation memory identifier.
      boolean hasNext()
      Indicates whether more events are available in the processing stream.
      boolean isActive()
      Indicates whether the filter is currently active.
      net.sf.okapi.common.Event next()
      Retrieves the next event from the processing stream.
      void open​(net.sf.okapi.common.resource.RawDocument input)
      Activates the filter and establishes the streaming connection to Solr.
      • Methods inherited from class net.sf.okapi.common.filters.AbstractFilter

        addConfiguration, cancel, createFilterWriter, createSkeletonWriter, getConfiguration, getConfigurations, getDisplayName, getDocumentId, getDocumentName, getEncoderManager, getEncoding, getMimeType, getName, getNewlineType, getParameters, getParameters, getParametersClassName, getParentId, getSrcLoc, getTrgLoc, isCanceled, isGenerateSkeleton, isMultilingual, open, removeConfiguration, setFilterConfigurationMapper, setMimeType, setOptions, setParameters, setParentId, setSrcLoc, setTrgLoc
      • Methods inherited from interface net.sf.okapi.common.filters.IFilter

        stream
    • Constructor Detail

      • SolrTmFilter

        public SolrTmFilter​(org.apache.solr.client.solrj.SolrClient solrClient,
                            String tmCollection,
                            UUID tmId)
        Constructs a streaming filter for a specific translation memory.
        Parameters:
        solrClient - Connection to the Solr instance
        tmCollection - Target translation memory collection
        tmId - Translation memory identifier to filter by
        Throws:
        IllegalArgumentException - if any required parameter is null
    • Method Detail

      • open

        public void open​(net.sf.okapi.common.resource.RawDocument input)
        Activates the filter and establishes the streaming connection to Solr. Validates connectivity before initializing the document iterator.
        Parameters:
        input - Raw document wrapper providing filter context
        Throws:
        net.sf.okapi.common.exceptions.OkapiException - if Solr connectivity fails
      • hasNext

        public boolean hasNext()
        Indicates whether more events are available in the processing stream.
        Returns:
        true if additional events can be retrieved
      • next

        public net.sf.okapi.common.Event next()
        Retrieves the next event from the processing stream. Emits document boundary markers and text unit events in sequence.
        Returns:
        The next available event
        Throws:
        NoSuchElementException - when the stream is depleted
      • close

        public void close()
        Terminates the filter and releases associated resources. The Solr client remains open as it's externally managed.
        Specified by:
        close in interface AutoCloseable
        Specified by:
        close in interface net.sf.okapi.common.filters.IFilter
        Overrides:
        close in class net.sf.okapi.common.filters.AbstractFilter
      • estimateTotalSegments

        public long estimateTotalSegments()
                                   throws net.sf.okapi.common.exceptions.OkapiException
        Queries Solr for the total count of matching segments without retrieval. Useful for displaying progress indicators or estimating resource needs.
        Returns:
        Total segment count matching the query
        Throws:
        net.sf.okapi.common.exceptions.OkapiException - if the count operation fails
      • getTmCollection

        public String getTmCollection()
        Returns the collection name being queried.
        Returns:
        Solr collection identifier
      • getTmId

        public UUID getTmId()
        Returns the translation memory identifier.
        Returns:
        TM ID being filtered
      • getQuery

        public org.apache.solr.client.solrj.SolrQuery getQuery()
        Provides read access to the query configuration.
        Returns:
        Copy of the configured query
      • isActive

        public boolean isActive()
        Indicates whether the filter is currently active.
        Returns:
        true if filter is operational