Class PipedRDFIterator<T>

  • Type Parameters:
    T - The type of the RDF primitive, should be one of Triple, Quad, or Tuple<Node>
    All Implemented Interfaces:
    java.util.Iterator<T>, org.apache.jena.atlas.lib.Closeable

    public class PipedRDFIterator<T>
    extends java.lang.Object
    implements java.util.Iterator<T>, org.apache.jena.atlas.lib.Closeable

    A PipedRDFIterator should be connected to a PipedRDFStream implementation; the piped iterator then provides whatever RDF primitives are written to the PipedRDFStream

    Typically, data is read from a PipedRDFIterator by one thread (the consumer) and data is written to the corresponding PipedRDFStream by some other thread (the producer). Attempting to use both objects from a single thread is not recommended, as it may deadlock the thread. The PipedRDFIterator contains a buffer, decoupling read operations from write operations, within limits.

    Inspired by Java's PipedInputStream and PipedOutputStream

    See Also:
    PipedTriplesStream, PipedQuadsStream, PipedTuplesStream
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static int DEFAULT_BUFFER_SIZE
      Constant for default buffer size
      static int DEFAULT_MAX_POLLS
      Constant for max number of failed poll attempts before the producer will be declared as dead
      static int DEFAULT_POLL_TIMEOUT
      Constant for default poll timeout in milliseconds, used to stop the consumer deadlocking in certain circumstances
    • Constructor Summary

      Constructors 
      Constructor Description
      PipedRDFIterator()
      Creates a new piped RDF iterator with the default buffer size of DEFAULT_BUFFER_SIZE.
      PipedRDFIterator​(int bufferSize)
      Creates a new piped RDF iterator
      PipedRDFIterator​(int bufferSize, boolean fair)
      Creates a new piped RDF iterator
      PipedRDFIterator​(int bufferSize, boolean fair, int pollTimeout, int maxPolls)
      Creates a new piped RDF iterator
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void close()
      May be called by the consumer when it is finished reading from the iterator, if the producer thread has not finished it will receive an error the next time it tries to write to the iterator
      java.lang.String getBaseIri()
      Gets the most recently seen Base IRI
      PrefixMap getPrefixes()
      Gets the prefix map which contains the prefixes seen so far in the stream
      boolean hasNext()  
      T next()  
      void remove()  
      • Methods inherited from class java.lang.Object

        equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
      • Methods inherited from interface java.util.Iterator

        forEachRemaining
    • Field Detail

      • DEFAULT_BUFFER_SIZE

        public static final int DEFAULT_BUFFER_SIZE
        Constant for default buffer size
        See Also:
        Constant Field Values
      • DEFAULT_POLL_TIMEOUT

        public static final int DEFAULT_POLL_TIMEOUT
        Constant for default poll timeout in milliseconds, used to stop the consumer deadlocking in certain circumstances
        See Also:
        Constant Field Values
      • DEFAULT_MAX_POLLS

        public static final int DEFAULT_MAX_POLLS
        Constant for max number of failed poll attempts before the producer will be declared as dead
        See Also:
        Constant Field Values
    • Constructor Detail

      • PipedRDFIterator

        public PipedRDFIterator()
        Creates a new piped RDF iterator with the default buffer size of DEFAULT_BUFFER_SIZE.

        Buffer size must be chosen carefully in order to avoid performance problems, if you set the buffer size too low you will experience a lot of blocked calls so it will take longer to consume the data from the iterator. For best performance the buffer size should be at least 10% of the expected input size though you may need to tune this depending on how fast your consumer thread is.

      • PipedRDFIterator

        public PipedRDFIterator​(int bufferSize)
        Creates a new piped RDF iterator

        Buffer size must be chosen carefully in order to avoid performance problems, if you set the buffer size too low you will experience a lot of blocked calls so it will take longer to consume the data from the iterator. For best performance the buffer size should be roughly 10% of the expected input size though you may need to tune this depending on how fast your consumer thread is.

        Parameters:
        bufferSize - Buffer size
      • PipedRDFIterator

        public PipedRDFIterator​(int bufferSize,
                                boolean fair)
        Creates a new piped RDF iterator

        Buffer size must be chosen carefully in order to avoid performance problems, if you set the buffer size too low you will experience a lot of blocked calls so it will take longer to consume the data from the iterator. For best performance the buffer size should be roughly 10% of the expected input size though you may need to tune this depending on how fast your consumer thread is.

        The fair parameter controls whether the locking policy used for the buffer is fair. When enabled this reduces throughput but also reduces the chance of thread starvation. This likely need only be set to true if there will be multiple consumers.

        Parameters:
        bufferSize - Buffer size
        fair - Whether the buffer should use a fair locking policy
      • PipedRDFIterator

        public PipedRDFIterator​(int bufferSize,
                                boolean fair,
                                int pollTimeout,
                                int maxPolls)
        Creates a new piped RDF iterator

        Buffer size must be chosen carefully in order to avoid performance problems, if you set the buffer size too low you will experience a lot of blocked calls so it will take longer to consume the data from the iterator. For best performance the buffer size should be roughly 10% of the expected input size though you may need to tune this depending on how fast your consumer thread is.

        The fair parameter controls whether the locking policy used for the buffer is fair. When enabled this reduces throughput but also reduces the chance of thread starvation. This likely need only be set to true if there will be multiple consumers.

        The pollTimeout parameter controls how long each poll attempt waits for data to be produced. This prevents the consumer thread from blocking indefinitely and allows it to detect various potential deadlock conditions e.g. dead producer thread, another consumer closed the iterator etc. and errors out accordingly. It is unlikely that you will ever need to adjust this from the default value provided by DEFAULT_POLL_TIMEOUT.

        The maxPolls parameter controls how many poll attempts will be made by a single consumer thread within the context of a single call to hasNext() before the iterator declares the producer to be dead and errors out accordingly. You may need to adjust this if you have a slow producer thread or many consumer threads.

        Parameters:
        bufferSize - Buffer size
        fair - Whether the buffer should use a fair locking policy
        pollTimeout - Poll timeout in milliseconds
        maxPolls - Max poll attempts
    • Method Detail

      • hasNext

        public boolean hasNext()
        Specified by:
        hasNext in interface java.util.Iterator<T>
      • next

        public T next()
        Specified by:
        next in interface java.util.Iterator<T>
      • remove

        public void remove()
        Specified by:
        remove in interface java.util.Iterator<T>
      • getBaseIri

        public java.lang.String getBaseIri()
        Gets the most recently seen Base IRI
        Returns:
        Base IRI
      • getPrefixes

        public PrefixMap getPrefixes()
        Gets the prefix map which contains the prefixes seen so far in the stream
        Returns:
        Prefix Map
      • close

        public void close()
        May be called by the consumer when it is finished reading from the iterator, if the producer thread has not finished it will receive an error the next time it tries to write to the iterator
        Specified by:
        close in interface org.apache.jena.atlas.lib.Closeable