What's New in Rev.3.4

On Julius/Julian rev.3.4, remarkable progress has been done, including integration of grammar-based recognizer Julian, real-time word confidence scoring, support for class N-gram, improved stability, and so on. The changes from rev.3.3p4 to 3.4 are: Details are shown in the following.

Julian (grammar-based recognizer) has been Integrated

A grammar-based two-pass speech recognizer "Julian", has been integrated to Julius. Now, Julius and Julian are distributed together with one package as an open source software.

Julian is a speech recognition software based on definite finite state automaton grammar. Instead of word N-gram, Julian uses a task grammar as a language model. A task grammar is a set of rule and patterns of acceptable words or word sequences. Unlike Julius, which uses statistical word N-gram as language model, Julian uses hand-written (or auto-generated) task grammar as a lingistic constraint. As the allowed hypotheses are strictly defined by the grammar, it is efficient for recognition system of small vocabulary (i.e. voice command, isolated word recognition, spoken dialogue system of small task).

Julian is derived from Julius to directly drive the grammar constraint, and most codes are shared with Julius. Actually, Julian can be compiled from the source code of Julius by simply specifying "--enable-julian" to configure. Since major speech recognition techniques in Julius are incorporated to Julian, it can also performs very well. For example, it can execute recognition of over a thousand words in real time with less than PentiumIII 300MHz machine.

Julian was once a product of Continuous Speech Recognition Consortium, Japan, and has been distributed only for the members of CSRC. Since the consortium has been successfully finished three years activity, now it becomes available for free from rev.3.4.

The archive of Julius-3.4 also includes Julian and several grammar construction tool. Please see the documents below about their usage.

License Modified to Conform Requirement as an Open-Source Product

The license term of Julius/Julian has been modified to conforms to a definition of "open-source software."

ABOUT LICENSE: the original license term is in Japanese, but we summarized the license term as below for convenience and quick understanding. Please consult the original Japanese LICENSE file in the source archive for precise details.

Generally, the license of Julius is similar to that of BSD license.
There are NO obligation to make your source code free like GPL, and NO
restriction on its usage, even for a commercial purpose.

Re-distribution and modification of all or part of Julius is also
permitted, provided that you attach the copyright notice below to your
package, along with the original Japanese license document in the package
(LICENSE.txt)

  Copyright (c) 1991-2004 Kawahara Lab., Kyoto University
  Copyright (c) 2000-2004 Shikano Lab., Nara Institute of Science and Technology

Word Confidence Scoring Support

Julius/Julian is now capable of annotating a so-called "confidence scores" (aka confidence measure) to the recognized words.

General description of confidence scoring

A word confidence score indicates how the recognizer is "confident" abotu each recognized word. They are usually computed on the basis of its relative outstandingness among many competing word hypotheses generated on the recognition process. Confidence score can be considered as the reliability of the reconizer's output. So it can be used to reject out-of-vocabulary or out-of-grammar words by defining some threshold.

Algorithm used in Julius/Julian

The confidence scoring algorithm in Julius/Julian is a kind of word posterior probability based scoring, and uses originally depeloped algorithm for faster, light-weight, real-time computation using search-time heuristic scores. For more details, please see the paper below:

How to use

Confidence scoring is enabled by default. They will be output with "cmscore" heading with other recognition results. Value range is from 0.0 to 1.0, and higher value means that it is more confident.

In module mode, the confidence score will be annotated by "CM" attribute in WHYPO tag. To output the confidence score on module mode, please add "C" argument to the "-outcode" option. Below is an example to tell Julius / Julian to output recognized words ("W"), their LM entries ("L"), phoneme sequences ("P"), scores ("S"), and confidence scores ("C") to a client on module mode

 % julius .... -outcode WLPSC -module

Configuration

To fine-tune the scoring accuracy, the smoothing coefficient alpha can be changed by runtime option "-cmalpha". This coefficient is used to smooth and compensate the dynamic range of hypotheses likelihoods for computation of word confidence. The default value is 0.05, and smaller value (close to zero) will cause the total distribution of confidence scores to be leveled to the middle (0.5). The performance of confidence scores may varies by this value, and optimizing this value to the target set may improve the scoring accuracy. (However, leaving this value to the default may work well in most cases).

To disable confidence measuring, specify "--disable-cm" to configure.

Class N-gram support

Class N-gram is newly supported in Julius. Below describes the format of class N-gram for Julius.

How to write a class N-gram for Julius

In Julius, class N-gram should be written into two parts,
  1. Inter-class N-gram connection probability on N-gram file, and
  2. Intra-class word probability on word dictionary.
The inter-class N-gram connection probabilities should be written in the same way as the word N-gram, using the class names as their entry.

The intra-class word probability, i.e. word appearance probabilities within the belonging class, should be written as an additional field in the word dictionary. The normal word dictionary is written in the following style:

WordName [OutputString] phone1 phone2 ...
When using a class N-gram, you should insert the belonging class entry name and the intra-class probability of the word at the beginning. The probability should be written in log10, with the preceding indicator "@".
ClassName @IntraClassLogProb WordName [OutputString] phone1 phone2 ...

Table below shows the correspondence between word N-gram and Class N-gram. Files
Word N-gramClass N-gram
N-gram file(s)Word N-gramInter-class N-gram
DictionaryWord entryWord entry + Intra-class probability

Configuration

Class N-gram support is enabled by default. When you are not using class N-gram, the recognitino speed will not degrade. However, there are unused memory for intra-class probability which wastes memory for 4bytes per word. If you want to disable class N-gram for memory efficiency, please specify "--disable-class-ngram" to configure.

Recording format changed to .wav

Speech/sound recording format in Julius/Julian (-record), adintool, adinrec is changed to Microsoft WAVE format (.wav). You can still record in RAW format by specifying "-raw" on adintool and adinrec.

Bug Fixes

Fixed bug in A/D-in input detection

Following bugs relating speech input and detection has been fixed:

Fixed a computation for a tied-mixture AM

Fixed mis-computation of acoustic likelihood that happens when all the following conditions are matched: tied-mixture acoustic model is used, the model has some lacking mixtures in the mixture PDF, and specifying no Gaussian pruning option "-gprune none".

Fixed message bug in module mode

Others

List of Modified Options

configure options: Runtime options:
$Id: WhatsNew_3.4.html,v 1.1.1.1 2005/11/17 11:11:49 sumomo Exp $