Class DataSource

  • Direct Known Subclasses:
    ByteArrayDataSource, FileDataSource, ProcessDataSource, ResourceDataSource, URLDataSource

    public abstract class DataSource
    extends java.lang.Object
    Represents a stream-like source of data. Instances of this class can be used to encapsulate the data available from a stream. The idea is that the stream should return the same sequence of bytes each time.

    As well as the ability to return a stream, a DataSource may also have a position, which corresponds to the 'ref' or 'frag' part of a URL (the bit after the #). This is an indication of a location in the stream; it is a string, and its interpretation is entirely up to the application (though may be specified by the documentation of specific DataSource subclasses).

    As well as providing the facility for several different objects to get their own copy of the underlying input stream, this class also handles decompression of the stream. Compression types are as understood by the associated Compression class.

    For efficiency, a buffer of the bytes at the start of the stream called the 'intro buffer' is recorded the first time that the stream is read. This can then be used for magic number queries cheaply, without having to open a new input stream. In the case that the whole input stream is shorter than the intro buffer, the underlying input stream never has to be read again.

    Any implementation which implements getRawInputStream() in such a way as to return different byte sequences on different occasions may lead to unpredictable behaviour from this class.

    Author:
    Mark Taylor (Starlink)
    See Also:
    Compression
    • Constructor Summary

      Constructors 
      Constructor Description
      DataSource()
      Constructs a DataSource with a default size of intro buffer.
      DataSource​(int introLimit)
      Constructs a DataSource with a given size of intro buffer.
    • Method Summary

      All Methods Static Methods Instance Methods Abstract Methods Concrete Methods 
      Modifier and Type Method Description
      void close()
      Closes any open streams owned and not yet dispatched by this DataSource.
      DataSource forceCompression​(Compression compress)
      Returns a DataSource representing the same underlying stream, but with a forced compression mode compress.
      Compression getCompression()
      Returns an object which will handle any required decompression for this stream.
      java.io.InputStream getHybridInputStream()
      Returns an input stream which appears just the same as the one returned by getInputStream(), but only incurs the expense of obtaining an actual input stream (by calling getRawInputStream() if more bytes are read than the cached magic number.
      java.io.InputStream getInputStream()
      Returns an InputStream containing the whole of this DataSource.
      static java.io.InputStream getInputStream​(java.lang.String location, boolean allowSystem)
      Returns an input stream based on the given location string.
      byte[] getIntro()
      Returns the intro buffer, first reading it if this hasn't been done before.
      int getIntroLimit()
      Returns the maximum length of the intro buffer.
      long getLength()
      Returns the length of the stream returned by getInputStream in bytes, if known.
      static boolean getMarkWorkaround()
      Returns true if we are working around potential bugs in InputStream InputStream.mark(int)/InputStream.reset() methods (common, including in J2SE classes).
      java.lang.String getName()
      Returns a name for this source.
      java.lang.String getPosition()
      Returns the position associated with this source.
      protected abstract java.io.InputStream getRawInputStream()
      Provides a new InputStream for this data source.
      long getRawLength()
      Returns the length in bytes of the stream returned by getRawInputStream, if known.
      java.lang.String getSystemId()
      Returns a System ID for this DataSource; this is a string representation of a file name or URL, as used by Source and friends.
      java.net.URL getURL()
      Returns a URL which corresponds to this data source, if one exists.
      static DataSource makeDataSource​(java.lang.String loc)
      Attempts to make a source given a string identifying its location as a file, URL or system command output.
      static DataSource makeDataSource​(java.lang.String loc, boolean allowSystem)
      Attempts to make a source given a string identifying its location as a file, URL or optionally a system command output.
      static DataSource makeDataSource​(java.net.URL url)
      Makes a source from a URL.
      void setCompression​(Compression compress)
      Sets the compression to be associated with this data source.
      void setIntroLimit​(int limit)
      Sets the maximum size of the intro buffer to a new value.
      static void setMarkWorkaround​(boolean workaround)
      Sets whether we want to work around bugs in InputStream mark/reset methods.
      void setName​(java.lang.String name)
      Sets the name of this source.
      void setPosition​(java.lang.String position)
      Sets the position associated with this source.
      java.lang.String toString()
      Returns a short description of this source (name plus compression type).
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
    • Field Detail

      • MARK_WORKAROUND_PROPERTY

        public static final java.lang.String MARK_WORKAROUND_PROPERTY
        See Also:
        Constant Field Values
    • Constructor Detail

      • DataSource

        public DataSource​(int introLimit)
        Constructs a DataSource with a given size of intro buffer.
        Parameters:
        introLimit - the maximum number of bytes in the intro buffer
      • DataSource

        public DataSource()
        Constructs a DataSource with a default size of intro buffer.
    • Method Detail

      • getRawInputStream

        protected abstract java.io.InputStream getRawInputStream()
                                                          throws java.io.IOException
        Provides a new InputStream for this data source. This method should be implemented by subclasses to provide a new InputStream giving the raw content of the source each time it is called. The general contract of this method is that each time it is called it will return a stream with the same content.
        Returns:
        an InputStream containing the data of this source
        Throws:
        java.io.IOException
      • getURL

        public java.net.URL getURL()
        Returns a URL which corresponds to this data source, if one exists. An URL.openConnection() method call on the URL returned by this method should provide a stream with the same content as the getRawInputStream() method of this data source. If no such URL exists or is known, then null should be returned.

        If this source has a non-null position value, it will be appended to the main part of the URL after a '#' character (as the URL's ref part).

        Returns:
        a URL corresponding to this source, or null
      • getIntroLimit

        public int getIntroLimit()
        Returns the maximum length of the intro buffer.
        Returns:
        maximum length of the intro buffer
      • setIntroLimit

        public void setIntroLimit​(int limit)
        Sets the maximum size of the intro buffer to a new value. Setting the intro limit to a new value will discard any state which this source has, so for reasons of efficiency it's not a good idea to call this method except immediately after the source has been constructed and before any reads have taken place.
        Parameters:
        limit - the new maximum length of the intro buffer
      • getRawLength

        public long getRawLength()
        Returns the length in bytes of the stream returned by getRawInputStream, if known. If the length is not known then -1 should be returned. The implementation of this method in DataSource returns -1; subclasses should override it if they can determine their length.
        Returns:
        the length of the raw input stream, or -1
      • getLength

        public long getLength()
        Returns the length of the stream returned by getInputStream in bytes, if known. A return value of -1 indicates that the length is unknown. The return value of this method may change from -1 to a positive value during the life of this object if it happens to work out how long it is.
        Returns:
        the length of the stream in bytes, or -1
      • getName

        public java.lang.String getName()
        Returns a name for this source. This name is mainly intended as a label identifying the source for use in informational messages; it is not in general intended to be used to provide an absolute reference to the source. Thus, for instance, if the source references a file, its name might be a relative pathname or simple filename, rather than its absolute pathname. To identify the source absolutely, the getURL() method (or some suitable class-specific method) should be used. If this source has a position, it should probably form part of this name.
        Returns:
        a name
      • setName

        public void setName​(java.lang.String name)
        Sets the name of this source.
        Parameters:
        name - a name
        See Also:
        getName()
      • getPosition

        public java.lang.String getPosition()
        Returns the position associated with this source. It is a string giving an indication of the part of the stream which is of interest. Its interpretation is up to the application.
        Returns:
        the position string, or null
      • setPosition

        public void setPosition​(java.lang.String position)
        Sets the position associated with this source. It is a string giving an indication of the part of the stream which is of interest. Its interpretation is up to the application.
        Parameters:
        position - the new posisition (may be null)
      • getSystemId

        public java.lang.String getSystemId()
        Returns a System ID for this DataSource; this is a string representation of a file name or URL, as used by Source and friends. The return value may be null if none is known. This does not contain any reference to the position.
        Returns:
        the System ID string for this source, or null
      • getCompression

        public Compression getCompression()
                                   throws java.io.IOException
        Returns an object which will handle any required decompression for this stream. A raw data stream is read and its magic number (first few bytes) matched against known patterns to determine if any known compression method is in use. If no known compression is being used, the value Compression.NONE is returned.
        Returns:
        a Compression object encoding this stream
        Throws:
        java.io.IOException
      • getIntro

        public byte[] getIntro()
                        throws java.io.IOException
        Returns the intro buffer, first reading it if this hasn't been done before. The intro buffer will contain the first few bytes of the decompressed stream. The number of bytes it contains (the size of the returned byte[] array) will be the smaller of introLimit and the length of the underlying uncompressed stream.

        The returned buffer is the original not a copy - don't change its contents!

        Returns:
        the first few bytes of the uncompressed stream, up to a limit of introLimit
        Throws:
        java.io.IOException
      • setCompression

        public void setCompression​(Compression compress)
        Sets the compression to be associated with this data source. In general it will not be necessary or advisable to call this method, since this object will figure it out using magic numbers of the underlying stream. It can be used if the compression method is known, or to force use of a particular compression; in particular setCompression(Compression.NONE) can be used to force direct examination of the underlying stream without decompression, even if the underlying stream is in fact compressed.

        The effects of setting a compression to a mode (other than NONE) which does not match the actual compression mode of the underlying stream are undefined, so this method should be used with care.

        Parameters:
        compress - the compression mode encoding the underlying stream
      • forceCompression

        public DataSource forceCompression​(Compression compress)
        Returns a DataSource representing the same underlying stream, but with a forced compression mode compress. The returned DataSource object may be the same object as this one, but if it has a different compression mode from compress a new one will be created. As with setCompression(uk.ac.starlink.util.Compression), the consequences of using a different value of compress than the correct one (other than Compression.NONE are unpredictable.
        Parameters:
        compress - the compression mode to be used for the returned data source
        Returns:
        a data source with the same underlying stream as this, but a compression mode given by compress
      • getInputStream

        public java.io.InputStream getInputStream()
                                           throws java.io.IOException
        Returns an InputStream containing the whole of this DataSource. If compression is detected in the underlying stream, it will be decompressed. The returned stream should be closed by the user when no longer required.
        Returns:
        an input stream that reads from the beginning of the underlying data source, decompressing it if appropriate
        Throws:
        java.io.IOException
      • getHybridInputStream

        public java.io.InputStream getHybridInputStream()
                                                 throws java.io.IOException
        Returns an input stream which appears just the same as the one returned by getInputStream(), but only incurs the expense of obtaining an actual input stream (by calling getRawInputStream() if more bytes are read than the cached magic number. This is an efficient way to read if you need an InputStream but may only end up reading the first few bytes of it.
        Returns:
        an input stream that reads from the beginning of the underlying data source, decompressing it if appropriate
        Throws:
        java.io.IOException
      • close

        public void close()
        Closes any open streams owned and not yet dispatched by this DataSource. Should be called if this object is no longer required, or if it may not be required for some while. Calling this method does not prevent any other method being called on this object in the future. This method throws no checked exceptions; any IOException thrown during closing any owned streams are simply discarded.
      • toString

        public java.lang.String toString()
        Returns a short description of this source (name plus compression type).
        Overrides:
        toString in class java.lang.Object
        Returns:
        description of this DataSource
      • makeDataSource

        public static DataSource makeDataSource​(java.lang.String loc)
                                         throws java.io.IOException
        Attempts to make a source given a string identifying its location as a file, URL or system command output. This may be one of the following options:
        • filename
        • URL
        • a string preceded by "<" or followed by "|", giving a shell command line (may not work on all platforms)

        If a '#' character exists in the string, text after it will be interpreted as a position value. Otherwise, the position is considered to be null.

        Note: this method presents a security risk if the loc string is vulnerable to injection. Consider using the variant method makeDataSource(loc,false) in such cases. This method just calls makeDataSource(loc,true).

        Parameters:
        loc - the location of the data, with optional position
        Returns:
        a DataSource based on the data at loc
        Throws:
        java.io.IOException - if loc does not name an existing readable file or valid URL
      • makeDataSource

        public static DataSource makeDataSource​(java.lang.String loc,
                                                boolean allowSystem)
                                         throws java.io.IOException
        Attempts to make a source given a string identifying its location as a file, URL or optionally a system command output.

        The supplied loc may be one of the following:

        • filename
        • URL
        • only if allowSystem=true: a string preceded by "<" or followed by "|", giving a shell command line (may not work on all platforms)

        If a '#' character exists in the string, text after it will be interpreted as a position value. Otherwise, the position is considered to be null.

        Note: setting allowSystem=true may introduce a security risk if the loc string is vulnerable to injection.

        Parameters:
        loc - the location of the data, with optional position
        allowSystem - whether to allow system commands using the format above
        Returns:
        a DataSource based on the data at loc
        Throws:
        java.io.IOException - if loc does not name an existing readable file or valid URL
      • makeDataSource

        public static DataSource makeDataSource​(java.net.URL url)
        Makes a source from a URL. If url is a file-protocol URL referencing an existing file then a FileDataSource will be returned, otherwise it will be a URLDataSource. Under certain circumstances, it may be more efficient to use a FileDataSource than a URLDataSource, which is why this method may be worth using.
        Parameters:
        url - location of the data stream
        Returns:
        data source which returns the data at url
      • getInputStream

        public static java.io.InputStream getInputStream​(java.lang.String location,
                                                         boolean allowSystem)
                                                  throws java.io.IOException
        Returns an input stream based on the given location string. The content of the stream may be compressed or uncompressed data; the returned stream will be an uncompressed version. The following options are allowed for the location:
        • filename
        • URL
        • "-" meaning standard input
        • only if allowSystem=true: a string preceded by "<" or followed by "|", giving a shell command line (may not work on all platforms)

        Note: setting allowSystem=true may introduce a security risk if the loc string is vulnerable to injection.

        Parameters:
        location - URL, filename, "cmdline|"/"<cmdline", or "-"
        allowSystem - whether to allow system commands using the format above
        Returns:
        uncompressed stream containing the data at location
        Throws:
        java.io.FileNotFoundException - if location cannot be interpreted as a source of bytes
        java.io.IOException - if there is an error obtaining the stream
      • getMarkWorkaround

        public static boolean getMarkWorkaround()
        Returns true if we are working around potential bugs in InputStream InputStream.mark(int)/InputStream.reset() methods (common, including in J2SE classes). The return value is dependent on the system property named MARK_WORKAROUND_PROPERTY.
        Returns:
        true iff we are working around mark/reset bugs
      • setMarkWorkaround

        public static void setMarkWorkaround​(boolean workaround)
        Sets whether we want to work around bugs in InputStream mark/reset methods.
        Parameters:
        workaround - true to employ the workaround