WARC version 0.0.1
==================

!!! This release is an "open beta"; see limitations below. !!!

The WARC library provides Perl support for accessing Web ARChive files.
The WARC format is, as of this writing, the generally accepted standard for
archiving documents obtained by crawling the World Wide Web.

This distribution focuses on the basic, low-level interfaces for reading
records from WARC files and building WARC files.

This distribution contains:

    - WARC
	The convenience loader, Single Point Of Truth for $VERSION, and
	overview POD for the WARC reader support.

    - WARC::Builder
	The basic interface for writing WARC files.

    - other modules mentioned in those POD pages

Rationale for Placing WARC at Top-level
---------------------------------------

I had initially planned to put this library in the Archive::WARC::
namespace, but eventually decided to move it to top-level because it did
not seem to fit in the Archive:: namespace.

Other packages in Archive:: generally map string-like file names to archive
members, with varying levels of functionality associated with those archive
members.  This model does not fit WARC beyond only the smallest uses, but
Archive::WARC:: could be a useful future interface for some of these cases.

While Archive::Web:: could be reasonable, considering that the WARC format
is literally named "Web Archive", people will most likely be searching for
the keyword "WARC", so the name needs to include it.  I considered
Archive::Web::WARC:: and Archive::WWW::WARC:: but those violate what I call
the "namespace branching rule": each label in a hierarchical namespace
should plausibly have multiple immediate children and the "Web" or "WWW"
labels are unlikely to have other children than "WARC" and eliminating them
lands us right back at Archive::WARC::.  The WARC::Alike::* namespace
envisioned in this package is specifically intended to support other
similar formats as nearly transparently as possible.

Following the examples of HTTP::* and LWP::*, which are also top-level, I
have decided to go through with placing WARC::* at top-level.  I hope that
this library will live up to the promise of broad usefulness that this
placement implies.

Limitations in this release
---------------------------

This is an "open beta" release and some features are still incomplete.

Most notably:

    - writing WARC volumes is not yet implemented but some supporting APIs
      are available and subject to change as needed or convenient

    - HTTP transfer and content decoding is not yet implemented

    - the lack of HTTP transfer decoding means that WARC-Payload-Digests
      can only be accurately calculated in some cases at this time

      - the WARC::Record::Sponge API will change to accommodate future HTTP
        decoding support

    - support for SDBM indexes is planned but not yet implemented

The support for reading WARC volumes is mostly complete aside from the
aforementioned lack of payload decoding.

INSTALLATION
------------

Even non-wizards should find the following incantation useful:

   perl Makefile.PL
   make
   make test
   make install

DEPENDENCIES
------------

The WARC library requires:

    - At least version 5.8.1 of perl, due to bugs in tied file handle
      support in 5.8.0 that affect IO::Uncompress::Gunzip.  5.8.1 is
      ancient as of this writing, so no problems are expected from this.

    - Support for "version" objects, either built-in or using the "version"
      pragmatic module available from CPAN.

    - IO::Uncompress::Gunzip, since most WARC files are written as .warc.gz.

    - IO::Compress::Gzip, for writing WARC files as .warc.gz.  Version
      2.024 or later is required to ensure that we can record the zlib
      version in the "warcinfo" record.

    - LWP, for the base classes for the HTTP objects that can be replayed
      from request and response records.

    - MIME::Base32, for base-32 encodings of WARC record digests.

    - Scalar::Util, specifically the XS version, for Scalar::Util::weaken,
      used to support caching anonymous tied aggregates for WARC::Fields.

    - Time::Local, for translating string form to epoch time in WARC::Date.

COPYRIGHT AND LICENCE
---------------------

Copyright (C) 2019, 2020 Jacob Bachmeyer

This library is free software; you can redistribute it and/or modify
it under the same terms as Perl itself.