What's New in Rev.3.4.1

Julius rev.3.4.1 is a minor revision of rev.3.4, but has many improvements including recognition algorithm fixes, stable I/O via network, support for binary HMM file and new visualization feature. The changes from rev.3.4 are: Details are shown in the following.

Some Fixes on Search Algorithm

Two improvements for search algorithm of Julius has been implemented. They may improve accuracy on narrow beam-width condition. These are enabled by default, but you can still disable them. The details and their effects are summarized below.

Fix double LM scoring of expanded words at the 2nd pass

In the previous version, both left-to-right 2-gram probability at the 1st pass and right-to-left 3-gram probability at the 2nd pass were doubly added to the expanded words while the recognition process. This double LM scoring problem has been fixed in thie version.

Fixing this problem resulted in slight improvement of accuracy. The following table is an experimental result on IPA'98 testset (Japanese newspaper reading task of 46 speakers, 200 sentences, PTM(GD)). The fixed decoder ("3.4+LMfix") improves accuracy to a small amount.

RevisionWord %CorrectWord Accuracy
3.492.190.6
3.4+LMfix92.391.2

This fix is enabled by default. To disable this fix and make the search algorihm as same as the previous versions, please specify "--disable-lmfix" to configure.

New inter-word triphone computation

On acoustic likelihood computation of cross-word triphone on word edge on the 1st pass, Julius used the maximum likelihood of all the possible triphone whose bi-phone contexts are the same. From this version, we implemented another method that uses the average of N-best likelihoods.

The new method can be specified by the option "-iwcd1 best N", where N is the number of best triphone scores to be averaged.

From rev.3.4.1, this new method has become default, but previous methods still exist for backward compatibility. As a result, rev.3.4.1 has three options for the inter-word triphone computation.

"-iwcd1 best 1" equals to "-iwcd1 max", and "-iwcd1 best [maximum_value]" equals to "-iwcd1 average".

Experimental result of this new method on the same testset is show in the table below. Small improvement was observed on this testset.

RevisionWord %CorrectWord Accuracy
3.492.190.6
3.4+iwcdbest(N=3)92.291.2
To disable this fix and make the search algorihm as the same as previous versions, please specify "-iwcd1 max" to the runtime option.

Evaluation summary

With these two improvements, The recognition performance of rev.3.4.1 as compared with rev. 3.4 are improved in our testset, as summarized in the next figure.

The improvement was found especially for the narrow beam condition (fast). Optimizing language model weights and insertion penalty ("3.4.1-LMtuned" in the figure) resulted in more improvement.

Faster MFCC computation

Feature extraction routine for MFCC parameters now uses trigonometric table to speed up the computation. Approximately only from 8kBytes to 12kBytes are used for the table. The computation result is not changed.

If you want to disable this, please specify "--disable-mfcc-table" to configure.

Binary HMM support

Original binary HMM format is implemented to Julius/Julian, which makes startup faster. A convertion tool called "mkbinhmm" is also included in the distribution. You can convert an ascii hmmdefs file in HTK format to the binary format like this:
 % mkbinhmm hmmdefs binary_HMM
To use the binary HMM on Julius/Julian, you can specify it as the same way, with "-h" option. The format will be automatically detected by Julius/Julian.
 % julius ... -h binary_HMM

This binary format is not compatible with the Binary format used in HTK.

Remove input DC offset

New option "-zmean" to remove DC offset of speech input. If you specify "-zmean", computed offset will be subtracted from the input. For file input, the offset will be computed using the whole input. For microphone input, the first 3 seconds of silence from system startup are used to determine the offset.

Search Space Visualization Option

From this version, Julius/Julian can show the recognition process with X11/GTK. For each input, whole word hypotheses generated in the 1st pass, and hypothesis expansion process on the 2nd pass are dynamically displayed on X11 Window system. When you are using word graph option ("configure --enable-word-graph --enable-wpair"), the word graph will be also displayed. You can also play the recognized input with sox.

This options is disabled by default. To enable this, specify "--enable-visualize" to configure. It needs GTK library for compilation.

 % ./configure --enable-visualize
Figure below shows the screen shot.

Output Input Length in Module Mode

The input length will be output to client when using module mode ("-module"). The output style is as follows:
<INPUTPARAM FRAMES="frame_length" MSEC="length_in_msec">

Bug Fixes

Networked speech input/output fixed

The following bugs are fixed relating networked speech input / output.

Successive Decoding updated

Many bugs relating successive decoding option (--enable-sp-segment) has been fixed.

Memory leak with alignment fixed

Memory leaks in word/phone/state-level alignment option ("-walign", "-palign", "-salign") has been fixed.

CMN can now be off

CMN was always applied even if acoustic model does not require it. Now it can be switched correctly according to the property of acoustic model.

Other fixes

Fix reading of wav files made by Windows sound recoder.

"Sample.jconf", "Sample-julian.jconf" updated for recent version.

Manual for grammar construction tool has been added.

List of Modified Options

configure option: Runtime options:
$Id: WhatsNew_3.4.1.html,v 1.1 2004/04/28 08:02:13 ri Exp $