Class StreamingDataset

  • All Implemented Interfaces:
    Dataset, StreamingDatasetWriter, java.io.Serializable

    public class StreamingDataset
    extends java.lang.Object
    implements Dataset, StreamingDatasetWriter
    This is an implementation of Dataset that received data in a streaming manner. All of the functions will block until the information requested is available, dictated by an overall timeout that can be set.

    The dataset can operate as a normal dataset, or if "transientDataMode" is true, will discard data after it is read, freeing up memory when using a large dataset for data processing purposes.

    Since the streaming dataset won't normally know how many rows to expect (unless it was told the number in the initialize), the interaction with it is a little different than normal. Instead of calling getRowCount(), which will return -1 if the number of rows isn't known, it's better to call hasRow(x) (with x being the next row you hope to get) until it returns False.

    If you need to treat this dataset as a normal dataset, you can call readFully() to block and retrieve all data. After that, it will behave like a normal dataset, with getRowCount() returning the correct value.

    See Also:
    Serialized Form
    • Constructor Summary

      Constructors 
      Constructor Description
      StreamingDataset()  
      StreamingDataset​(long waitTimeout, boolean transientDataMode)
      Constructs the dataset with the given timeout and mode.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      int binarySearch​(int column, java.lang.Object key)
      Performs a binary search on the specified column, looking for the specified key.
      protected void checkError()  
      void finish()
      Notifies the writer to close- all data has been written.
      void finishWithError​(java.lang.Exception e)
      Finishes the stream while indicating an error occurred.
      int getColumnCount()
      Returns the number of columns.
      int getColumnIndex​(java.lang.String name)
      Returns the index of the given column, case insensitive
      java.lang.String getColumnName​(int col)
      Returns the name of the given column.
      java.util.List<java.lang.String> getColumnNames()
      Returns an unmodifiable list of this dataset's column names, in order.
      java.lang.Class<?> getColumnType​(int col)
      Returns the type of the given column.
      java.util.List<java.lang.Class<?>> getColumnTypes()
      Returns an unmodifiable list of this dataset's column types, in order.
      int getInternalRowCount()
      Mostly for unit testing
      double getPrimitiveValueAt​(int row, int col)
      If the given column is a numeric type or a Date, then the value will be returned as a double.
      QualityCode getQualityAt​(int row, int col)
      Returns the quality of the value at the given location.
      StreamingDataset.Row getRow​(int row)  
      int getRowCount()
      Return the number of rows in this dataset
      java.lang.Object getValueAt​(int row, int col)
      Returns the value in the dataset at the given location.
      java.lang.Object getValueAt​(int row, java.lang.String colName)
      Returns the value at the given row and at a column named colName.
      boolean hasRow​(int row)
      This is somewhat similar to isClosed, but will potentially wait for more data to come in, or for the dataset to close.
      void initialize​(java.lang.String[] columnNames, java.lang.Class<?>[] columnTypes, boolean supportsQuality, int expectedRows)
      Initializes the streaming dataset with important information, primarily the column names and types.
      boolean isClosed()
      Returns whether this dataset has been closed to further writing.
      static StreamingDataset newWriter()
      Returns a new streaming dataset that is only intended to write, and not read.
      void readFully()
      Blocks until all of the data has been read.
      void setBlockingWrites​(boolean value)
      If true, the dataset will block on writes until the previously written row is read (standard producer/consumer).
      void setTransientMode​(boolean transientMode)  
      protected StreamingDataset.Row waitFor​(int row)  
      protected void waitForColumns()  
      void write​(StreamingDataset.Row row)  
      void write​(java.lang.Object[] data, QualityCode[] quality)
      Writes a row with the given data.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • StreamingDataset

        public StreamingDataset()
      • StreamingDataset

        public StreamingDataset​(long waitTimeout,
                                boolean transientDataMode)
        Constructs the dataset with the given timeout and mode. "Transient Data Mode" means that data will be discarded after it is read, so all requests to getData() should only move forward in row index. This is useful for minimizing memory usage with very large datasets when this object is only being used for data processing.
    • Method Detail

      • setTransientMode

        public void setTransientMode​(boolean transientMode)
      • newWriter

        public static StreamingDataset newWriter()
        Returns a new streaming dataset that is only intended to write, and not read. Essentially is a plain streaming dataset, with blocking writes enabled.
      • setBlockingWrites

        public void setBlockingWrites​(boolean value)
        If true, the dataset will block on writes until the previously written row is read (standard producer/consumer). This prevents a build up in memory of unread rows.
      • readFully

        public void readFully()
        Blocks until all of the data has been read. Will throw a StreamingException if the overall timeout is exceeded.
      • isClosed

        public boolean isClosed()
        Returns whether this dataset has been closed to further writing.
      • hasRow

        public boolean hasRow​(int row)
        This is somewhat similar to isClosed, but will potentially wait for more data to come in, or for the dataset to close. Will return immediately if we know there is a row greater than the requested row available. In general, this function should be called to wait for new data before each call to getValueAt.
      • checkError

        protected void checkError()
      • waitForColumns

        protected void waitForColumns()
      • getColumnNames

        public java.util.List<java.lang.String> getColumnNames()
        Description copied from interface: Dataset
        Returns an unmodifiable list of this dataset's column names, in order.
        Specified by:
        getColumnNames in interface Dataset
      • getColumnTypes

        public java.util.List<java.lang.Class<?>> getColumnTypes()
        Description copied from interface: Dataset
        Returns an unmodifiable list of this dataset's column types, in order.
        Specified by:
        getColumnTypes in interface Dataset
      • getColumnCount

        public int getColumnCount()
        Description copied from interface: Dataset
        Returns the number of columns.
        Specified by:
        getColumnCount in interface Dataset
      • getRowCount

        public int getRowCount()
        Description copied from interface: Dataset
        Return the number of rows in this dataset
        Specified by:
        getRowCount in interface Dataset
      • getColumnIndex

        public int getColumnIndex​(java.lang.String name)
        Description copied from interface: Dataset
        Returns the index of the given column, case insensitive
        Specified by:
        getColumnIndex in interface Dataset
        Parameters:
        name - the name of the column to look up
        Returns:
        the index of the column
      • getColumnName

        public java.lang.String getColumnName​(int col)
        Description copied from interface: Dataset
        Returns the name of the given column.
        Specified by:
        getColumnName in interface Dataset
      • getColumnType

        public java.lang.Class<?> getColumnType​(int col)
        Description copied from interface: Dataset
        Returns the type of the given column.
        Specified by:
        getColumnType in interface Dataset
      • getValueAt

        public java.lang.Object getValueAt​(int row,
                                           int col)
        Description copied from interface: Dataset
        Returns the value in the dataset at the given location.
        Specified by:
        getValueAt in interface Dataset
      • getQualityAt

        public QualityCode getQualityAt​(int row,
                                        int col)
        Description copied from interface: Dataset
        Returns the quality of the value at the given location.
        Specified by:
        getQualityAt in interface Dataset
      • getValueAt

        public java.lang.Object getValueAt​(int row,
                                           java.lang.String colName)
        Description copied from interface: Dataset
        Returns the value at the given row and at a column named colName. Column name matching is case insensitive.
        Specified by:
        getValueAt in interface Dataset
      • getPrimitiveValueAt

        public double getPrimitiveValueAt​(int row,
                                          int col)
        Description copied from interface: Dataset
        If the given column is a numeric type or a Date, then the value will be returned as a double. (Charts uses this functionality to provide a seamless interface with certain optimized dataset implementations.
        Specified by:
        getPrimitiveValueAt in interface Dataset
      • binarySearch

        public int binarySearch​(int column,
                                java.lang.Object key)
        Description copied from interface: Dataset
        Performs a binary search on the specified column, looking for the specified key. Column MUST be sorted in ascending order. Dataset provides an inefficient default implementation of binary search that allocates a new array list for the entire column.
        Specified by:
        binarySearch in interface Dataset
        Returns:
        index of the search key, if it is contained in the list; otherwise, (-(insertion point) - 1). The insertion point is defined as the point at which the key would be inserted into the list: the index of the first element greater than the key, or list.size(), if all elements in the list are less than the specified key. Note that this guarantees the return value will be greater than or equal to 0, but only if the key is found.
      • initialize

        public void initialize​(java.lang.String[] columnNames,
                               java.lang.Class<?>[] columnTypes,
                               boolean supportsQuality,
                               int expectedRows)
        Description copied from interface: StreamingDatasetWriter
        Initializes the streaming dataset with important information, primarily the column names and types. If supportsQuality is true, it is expected that every value added has an associated quality. expectedRows gives an idea as to how many rows will be returned- useful for progress indication. Should be -1 if the number cannot be known in advance.
        Specified by:
        initialize in interface StreamingDatasetWriter
        expectedRows - -1 if not known, otherwise the number of rows that will be in the dataset.
      • write

        public void write​(java.lang.Object[] data,
                          QualityCode[] quality)
                   throws java.io.IOException
        Description copied from interface: StreamingDatasetWriter
        Writes a row with the given data. Bounds and order of arrays must match that used to call initialize. quality may be null if dataset does not support quality.
        Specified by:
        write in interface StreamingDatasetWriter
        Throws:
        java.io.IOException
      • finishWithError

        public void finishWithError​(java.lang.Exception e)
        Description copied from interface: StreamingDatasetWriter
        Finishes the stream while indicating an error occurred. Either this, or the successful finish(), must be called.
        Specified by:
        finishWithError in interface StreamingDatasetWriter
      • getInternalRowCount

        public int getInternalRowCount()
        Mostly for unit testing