( for arXivBib version 1.00 ) Click for: download arXivBib arXivBib manual |
C o n t e n t s | a r X i v B i b S u m m a r y | |||||||||||||||||||||||||||||||||||||||
|
|
arXivBib, licensed under the gpl, retrieves abstract pages from arXiv.org and reformats them as BibTeX entries. It saves lots of typing if you have many such references to cite, but isn't worth the trouble if you only have a few.
An input file containing lines in the form
quant-ph/9812037 \\ cite=Aha:99b \\ entry=article math.OA/0404553 \\ cite = Kribs:04 \\ etc = etc quant-ph/0506082 \\ ?publisher=unpublished \\ +note=additional comments etc.
specifies the abstracts retrieved.
Each input entry consists of one or more fields separated by \\. The first field is usually the arXiv reference, as illustrated above. Subsequent fields are optional, and all have the form name = value. Whitespace before and after \\ and = is optional and ignored. Any number of fields, and any name and value whatsoever are permitted. (How your fields are used, or if they're even used at all, depends on your entry-type definitions.)
\\ at the end of an input line signals that another field for the same entry continues on the next line. A single \ at the end of a line signals that the same field continues on the next line. Otherwise, lines ending with neither \ nor \\ signal the end of that entry.
A cite= field, if present, specifies the LaTeX \cite{key} for the entry, and defaults to your arXiv reference, e.g., quant-ph/9812037, if not present. An entry= field, if present, specifies the BibTeX entry-type, e.g., entry=book for @book, and usually defaults to @article or @unpublished if not present.
Additional extra fields are merged with the abstract data read from arxiv.org. For example, author=John Doe replaces the paper's true author(s). But the leading ? on ?publisher=unpublished (in the example input above) means it's used _only_ if the abstract contains no publisher field. Similarly, the leading + on +note=more comments means it's concatanated to the abstract's note.
Lines beginning with (i.e., whose first non-whitespace characters are) @string or @preamble are just written to the output file, so these directives are just passed on to BibTeX. Similarly, an arXivBib comment is any line that begins with #, and comments are just written to the output file.
Papers submitted to arXiv.org are accompanied by formatted abstracts parsed by arXivBib, which uses the following abstract-to-entry field mapping:
arXiv abstract | BibTeX entry | |
Authors: | author = | |
lines comprising the title | title = | |
Date: becomes both | year = month = | |
Journal-ref: | journal = | |
Report-no: | report = | |
Comments: | note = | |
Subj-class: | subj-class = | |
MSC-class: | msc-class = | |
ACM-class: | acm-class = | |
DOI: | doi = | |
lines comprising the abstract | abstract = |
Parsed abstract information is outputted as BibTeX entries in a more-or-less obvious manner. Fields like report=, subj-class=, msc-class=, acm-class= and doi= are ignored by BibTeX .bst styles I'm aware of.
Future versions of arXivBib will parse abstracts more carefully. For example, the arXiv Journal-ref: field frequently contains volume, number, pages, year, etc, that arXivBib can try to interpret as separate BibTeX fields. (In the particular case of Journal-ref:, authors usually write standard references like Phys.Lett. B630 (2005) 68-72, which are typeset nicely in your bibliography even when formatted as single BibTeX fields.)
Instead of arXiv references like quant-ph/9812037, if your first field begins with an @, like @article (but not @string or @preamble, mentioned above), it's taken to be a BibTeX entry-type. In this case, arXivBib doesn't retrieve any abstract, but merely reformats your remaining fields as a BibTeX entry. For example,
@book \\ cite = nielsen:2000 \\ title = Quantum \ Computation and Quantum Information \\ author = Michael A. \ Nielsen, Isaac L. Chuang \\ publisher = Cambridge U.P. \\ year = 2000 \\ ISBN = 0-521-63503-9
just produces a BibTeX @book entry
@book{nielsen:2000, author = "Michael A. Nielsen and Isaac L. Chuang", title = "Quantum Computation and Quanum Information", year = 2000, publisher = "Cambridge U.P.", isbn = "0-521-63503-9" }
In principle, you can write all your .bib files this way, even without any abstract retrievals at all, and just use arXivBib as a simple reformatting tool.
Any @entry-type in the first field (or entry=entry-type in a subsequent field when retrieving arXiv abstracts) is permitted, whether or not BibTeX defines it. ArXivBib simply outputs every field it has data for. Input like quant-ph/9812037 \\ entry=anytype \\ etc simply formats a BibTeX @anytype{etc...} entry containing all fields parsed from the arXiv abstract, merged with any additional fields from the input.
Similarly, as already mentioned above, arXivBib's optional name = value fields may contain any name and value whatsoever. Silly example input like @anytype \\ cite=tester \\ salutation=Hello just outputs
@anytype{tester, salutation = "Hello" }
Note that arXivBib wasn't written as a reformatting tool, but thinking about it that way may help clarify its intended purpose to parse arXiv abstracts, merge that data with any additional fields you supply, and output a BibTeX-formatted entry.
After preparing your input file, you can issue the
following command from the Unix shell prompt
nohup ./arxivbib < inputfile
> outputfile &
The nohup...& runs arXivBib in the background
becasue it waits 15 seconds between abstract retrievals
to avoid tripping arXiv's robot detection.
The program eventually runs to completion, and an output
file containing (by default) BibTeX-formatted
entries is produced. Some other (non-default) output
options are also discussed.
Abstracts on arxiv.org aren't required to comply with BibTeX or with LaTeX formatting rules. Authors separated by commas, names like Schr\"odinger (which BibTeX sees as as the beginning of an unmatched "..." string), math mode expressions not surrounded by $...$, etc, are very common. arXivBib tries to remediate these problems and generate output that's simultaneously accepted by both BibTeX and LaTeX. That doesn't always work, and sometimes results in over-aggressive editing. For example, authors often use a \symbol that they have a \newcommand for, so arXivBib rewrites this as $\symbol$ but it's still undefined in your bibliography.
You should thus anticipate some errors from BibTeX and/or LaTeX when using arXivBib. Just edit its output .bib file and manually correct any problems arXivBib couldn't handle.
At this early release, arXivBib's preliminary rewrite rules
are subject to change, so I won't document them in detail.
But you can turn off all arXivBib edits with the -e
command-line switch, e.g.,
nohup ./arxivbib -e < inputfile
> outputfile &
makes no changes to text from retrieved abstracts, authors,
titles, etc.
By default, arXivBib's output consists of BibTeX entries in the form (the quant-ph/9812037 example is illustrated)
@article{Aha:99b, author = "Dorit Aharonov", month = "December", note = "77 pages, figures included in the ps file. To appear in: Annual Reviews of Computational Physics, ed. Dietrich Stauffer, World Scientific, vol VI, 1998. The paper can be down loaded also from this http URL", title = "Quantum Computation", year = 1998, abstract = "In the last few years, theoretical study of quantum systems serving as computational devices has achieved tremendous progress. ...etc... Quantum algorithms, including Shor's factorization algorithm and Grover's algorithm for", abstract1 = "searching databases, are explained. ...etc... discussing the possible implications of quantum computation on fundamental physical questions, such as the transition from quantum to classical physics.", date = "Tue, 15 Dec, 1998", eprint = "quant-ph/9812037" } @etc
If your input contains no entry= field, then arXivBib defaults
to @article if the abstract contains a Journal-ref: field,
or to @unpublished if it doesn't. You can change one or both
defaults by using the -t and/or -u switches when
you run arXivBib. For example,
nohup ./arxivbib -t techreport -u misc
< inputfile > arxivoutputfile &
formats a @techreport entry if the abstract contains a
Journal-ref: field, or a @misc entry if it doesn't.
The text of the abstract is also retrieved by arXivBib, and placed in the abstract= field. BibTeX seems to have a 1000-character field length limit, so arXivBib breaks long fields into several shorter ones, as illustrated above. Had the example abstract been even longer, you'd see an abstract2 field, etc. How that appears on LaTeX output is up to your .bst style file, but at least you'll see the abstract in your personal .bib file.
Several (two at this time) output formats are available from arXivBib, selected with the optional -t1 or -t2 switch on the command line. We just described the default -t2 BibTeX format (though you needn't write the -t-switch at all if you want the default). Format -t1, for which you can just write -t (without an entry-type), produces output in arXivBib's input format, described above. Use this format in conjunction with the -e switch to produce unedited output that arXivBib can re-read on subsequent runs to format BibTeX-readable entries without re-querying arxiv.org for the same information.
For example,
nohup ./arxivbib -e -t
< inputfile > arxivoutputfile &
produces arxivoutputfile containing unedited
arxiv.org abstract data, merged with any optional fields
you provided, in the form
(for the same quant-ph/9812037 illustrated above)
quant-ph/9812037 \\ cite = Aha:99b \\ author = Dorit Aharonov \\ month = December \\ note = 77 pages, figures included in the ps file. To appear in: \ Annual Reviews of Computational Physics, ed. Dietrich Stauffer, \ World Scientific, vol VI, 1998. The paper can be down loaded \ also from this http URL \\ title = Quantum Computation \\ year = 1998 \\ abstract = In the last few years, theoretical study of quantum \ systems serving as computational devices has achieved tremendous \ progress. ...etc... In the end of this review I make these \ connections explicit, discussing the possible implications of \ quantum computation on fundamental physical questions, such as \ the transition from quantum to classical physics. \\ date = Tue, 15 Dec 1998 \\ eprint = quant-ph/9812037
This arxivoutputfile file produced as output by
the first run of arXivBib can now be used as input to
a second run
./arxivbib -x
< arxivoutputfile > finaloutputfile
that produces finaloutputfile containing edited
default -t2 BibTeX-formatted entries. Particularly note
the -x switch on this second run. It completely turns off
all arxiv.org abstract retrievals, and all the accompanying 15-second
delays (nohup...& isn't necessary with -x).
So this second runs only uses data from the input file.
And that's okay since this file already contains the fields derived
from arxiv.org abstracts.
Thus, this finaloutputfile is byte-for-byte identical to the BibTeX output described above. So why use a more complicated two-step process to obtain exactly the same result? Several reasons:
Very quickly --- download arxivbib.zip to any Unix/Linux box and then type
|
I've built and run arXivBib under Linux and NetBSD using gcc.
The source code is ansi-standard C, and should compile and run
under any Unix-like environment without change. During execution,
arXivBib issues system( ) commands of the form
lynx -dump http://arxiv.org/abs/quant-ph/0506082 > tempfile
so the
lynx
browser must be on your path.
The steps needed to download and compile arXivBib are
README | arXivBib release notes |
LICENSE | GPL license, under which you may use arXivBib |
arxivbib.c | arXivBib source program and all required functions |
arxivbib.dat | sample arXivBib input file |
arxivbib.tex | sample tex document with a \nocite{*} to format arXivBib's output |
abstract.bst | BibTeX style that formats abstracts, from CTAN |
arxivbib.html | this file, the arXivBib user's manual |
That's all there is to compiling arXivBib. You may also optionally include the following -D switches on the compile line, whose functionality is as follows...
For example,
cc -DXFILE=\"tempfile\" -DMAXFLDLEN=9999
arxivbib.c -o arxivbib
compiles arxivbib so that it writes abstracts to a file called
tempfile in the current working directory from which it was
launched, and with a very large maximum field length (effectively
turning off that option).
ArXivbib runs from the command line on Unix-compliant boxes.
It sleeps 15 seconds between abstract retrievals,
as recommended by mjf at
www-admin@arXiv.org,
to avoid tripping arXiv's robot detection. Because of this
sleep delay, arXivBib should either be run using nohup, or
as a daemon using the -d switch on its command line.
In the simplest case, this would look either like
nohup ./arxivbib < inputfile
> outputfile &
or like
./arxivbib -d -f inputfile
-o outputfile
Stdin and stdout cannot be redirected when run as a daemon,
so use the provided -f and -o switches instead,
as illustrated. The examples in this document all use the
nohup...& form, but both forms are equivalent,
and you can use whichever is more convenient (e.g., some
systems don't permit ordinary users to run nohup).
In addition to the preceding -d and -f and -o, arXivBib provides various additional command-line switches (most of which have already been mentioned) as follows...
arXivBib's copyright is registered by me with the US Copyright Office, and I hereby license it to you under the terms and conditions of the GPL. There is no official support of any kind whatsoever, and you use arXivBib entirely at your own risk, with no guarantee of any kind, in particular with no warranty of merchantability.
By using arXivBib, you warrant that you have read, understood and agreed to these terms and conditions, and that you possess the legal right and ability to enter into this agreement and to use arXivBib in accordance with it.
Hopefully, the law and ethics regarding computer programs will evolve to make this kind of obnoxious banter unnecessary. In the meantime, please forgive me my paranoia.
To protect your own intellectual property, I recommend Copyright Basics from The Library of Congress, and similarly, Copyright Basics from The American Bar Association. Very briefly, download Form TX and follow the included instructions. In principle, you automatically own the copyright to anything you write the moment it's on paper. In practice, if the matter comes under dispute, the courts look _very_ favorably on you for demonstrating your intent by registering the copyright.
I hope you find arXivBib useful. If so, a contribution to your country's TeX Users Group, or to the GNU project, is suggested.
email: john@forkosh.com |
|
|