What's New in Rev.3.4.1
Julius rev.3.4.1 is a minor revision of rev.3.4, but has many improvements including recognition algorithm fixes, stable I/O via network, support for binary HMM file and new visualization feature. The changes from rev.3.4 are:
- Improvements
- New features
- Bug fixes, mainly for module mode and adinnet network speech input
Details are shown in the following.
Two improvements for search algorithm of Julius has been implemented. They may improve accuracy on narrow beam-width condition. These are enabled by default, but you can still disable them. The details and their effects are summarized below.
In the previous version, both left-to-right 2-gram probability at the 1st pass and right-to-left 3-gram probability at the 2nd pass were doubly added to the expanded words while the recognition process. This double LM scoring problem has been fixed in thie version.
Fixing this problem resulted in slight improvement of accuracy. The following table is an experimental result on IPA'98 testset (Japanese newspaper reading task of 46 speakers, 200 sentences, PTM(GD)). The fixed decoder ("3.4+LMfix") improves accuracy to a small amount.
Revision | Word %Correct | Word Accuracy |
3.4 | 92.1 | 90.6 |
3.4+LMfix | 92.3 | 91.2 |
This fix is enabled by default. To disable this fix and make the search algorihm as same as the previous versions, please specify "--disable-lmfix
" to configure
.
On acoustic likelihood computation of cross-word triphone on word edge on the 1st pass, Julius used the maximum likelihood of all the possible triphone whose bi-phone contexts are the same. From this version, we implemented another method that uses the average of N-best likelihoods.
The new method can be specified by the option "-iwcd1 best N
", where N
is the number of best triphone scores to be averaged.
From rev.3.4.1, this new method has become default, but previous methods still exist for backward compatibility. As a result, rev.3.4.1 has three options for the inter-word triphone computation.
- "
-iwcd1 best N
": average of top N score (default of Julius-3.4.1and later, N=3)
- "
-iwcd1 max
": maximum score (default of Julius-3.4 and earlier)
- "
-iwcd1 avg
": average score (default of Julian)
"-iwcd1 best 1
" equals to "-iwcd1 max
", and "-iwcd1 best [maximum_value]
" equals to "-iwcd1 average
".
Experimental result of this new method on the same testset is show in the table below. Small improvement was observed on this testset.
Revision | Word %Correct | Word Accuracy |
3.4 | 92.1 | 90.6 |
3.4+iwcdbest(N=3) | 92.2 | 91.2 |
To disable this fix and make the search algorihm as the same as previous versions, please specify "-iwcd1 max
" to the runtime option.
Evaluation summary
With these two improvements, The recognition performance of rev.3.4.1 as compared with rev. 3.4 are improved in our testset, as summarized in the next figure.
The improvement was found especially for the narrow beam condition (fast). Optimizing language model weights and insertion penalty ("3.4.1-LMtuned" in the figure) resulted in more improvement.
Feature extraction routine for MFCC parameters now uses trigonometric table to speed up the computation. Approximately only from 8kBytes to 12kBytes are used for the table. The computation result is not changed.
If you want to disable this, please specify "--disable-mfcc-table
" to configure
.
Original binary HMM format is implemented to Julius/Julian, which makes startup faster. A convertion tool called "mkbinhmm
" is also included in the distribution. You can convert an ascii hmmdefs file in HTK format to the binary format like this:
% mkbinhmm hmmdefs binary_HMM
To use the binary HMM on Julius/Julian, you can specify it as the same way, with "-h
" option. The format will be automatically detected by Julius/Julian.
% julius ... -h binary_HMM
This binary format is not compatible with the Binary format used in HTK.
New option "-zmean
" to remove DC offset of speech input. If you specify "-zmean
", computed offset will be subtracted from the input.
For file input, the offset will be computed using the whole input. For microphone input, the first 3 seconds of silence from system startup are used to determine the offset.
From this version, Julius/Julian can show the recognition process with X11/GTK. For each input, whole word hypotheses generated in the 1st pass, and hypothesis expansion process on the 2nd pass are dynamically displayed on X11 Window system. When you are using word graph option ("configure --enable-word-graph --enable-wpair
"), the word graph will be also displayed. You can also play the recognized input with sox.
This options is disabled by default. To enable this, specify "--enable-visualize
" to configure
. It needs GTK library for compilation.
% ./configure --enable-visualize
Figure below shows the screen shot.
The input length will be output to client when using module mode ("-module
"). The output style is as follows:
<INPUTPARAM FRAMES="frame_length" MSEC="length_in_msec">
Networked speech input/output fixed
The following bugs are fixed relating networked speech input / output.
- Recognition does not start immediately when begin receiving speech.
- With (
-module
), PAUSE
and TERMINATE
command from client does not work.
- Multiple fork on connection.
- Memory leak in adin-cut.c.
- Uncapable of sending multiple files with
adintool
.
Successive Decoding updated
Many bugs relating successive decoding option (--enable-sp-segment) has been fixed.
- Microphone input has been supported, fixing two bugs
- Immediate segmentation after resuming.
- Mis-computation of acoustic likelihood for the first several frames.
- Fixed wrong word hypothesis passing between segment when sentence end.
- CMN will be done for detection basis, not segment basis.
- Short pause model to detect segmental point can be specified using "
-spmodel
" option.
- File counter now works correctly.
- Output more messages relating segment length and retract point.
- Work with "
-record
".
Memory leak with alignment fixed
Memory leaks in word/phone/state-level alignment option ("-walign
", "-palign
", "-salign
") has been fixed.
CMN can now be off
CMN was always applied even if acoustic model does not require it. Now it can be switched correctly according to the property of acoustic model.
Other fixes
Fix reading of wav files made by Windows sound recoder.
"Sample.jconf
", "Sample-julian.jconf
" updated for recent version.
Manual for grammar construction tool has been added.
configure
option:
Runtime options:
$Id: WhatsNew_3.4.1.html,v 1.1.1.1 2005/11/17 11:11:49 sumomo Exp $