HOWTO: Make festival TTS use better voices (MBROLA / CMU / HTS)

**Rasa1111** · June 26th, 2011

Not sure what to do there, sorry mate.

However, Since using 11.04, i have discovered a program/app called "Gespeaker"
Its got a nice GUI, and you can choose from voices, languages, etc.
Just type it in, and press play.

Check out "Gespeaker" in software center.l
Pretty decent!

**frytek** · August 3rd, 2011

On this forum there's also a script for converting txt do speech with espeak (+ mbrola)

http://ubuntuforums.org/showthread.p...eech+synthesis

**andreasvc** · August 20th, 2011

I succeeded in getting festival to work with the HTS voices on Ubuntu 11.04. I simply installed the festival package from lucid [1] and followed the procedure for installing the HTS 2.1 voices. You need one additional dependency from lucid, libestools1.2. Simply download these two packages and install the .debs manually with dpkg -i.

[1] https://launchpad.net/ubuntu/+source...+build/1335123

**gefthebest** · September 30th, 2011

Originally Posted by redaxe

I am new to Festival and New to Unix/Linux envoirnment.

I have Setup Festival 2.1 on Ubuntu 10.10 successfully with instruction given in INSTALL file.

I am also looking for some help using those versions of ubuntu and Festival...

Thanks

**Calrama** · October 15th, 2011

Hi,

I got one of the HTS 2.2 voices compiled, trained (arctic_slt_hts) and then working with festival 2.1. Compilation took about 8 hours, training ~30 hours. From my perspective it was worth it. I did not use Ubuntu for that, but Archlinux - no flame please -, but the produced voice should be usable on Ubuntu as well, just drop the tarball's content in /usr/share/festival/voices/us/ and you should be good to go.

You can get the tarball with the voice here:
http://dl.dropbox.com/u/1845335/rele...tic_hts.tar.gz

And if you want to hear the difference first I let the old version (The latest prebuild one from Nitech HTS) and the new version speak the first two paragraphs of this article:
http://en.wikinews.org/wiki/Eyewitne...fatal_protests

You can get the two mp3s here:
Old: http://dl.dropbox.com/u/1845335/release/news_old.mp3
New: http://dl.dropbox.com/u/1845335/release/news_new.mp3

Personally I think the new version sounds much smoother (It is compiled & trained with the default options).

Now what this should boil down to is, that you will most likely not need to install festival from source anymore and can just use the normal packages and still have access to the newest Nitech voices (provided you are willing to compile & train them or use the one I posted).

**DocFreed0** · October 25th, 2011

Hi Calrama:

Nice job.

The voice works well on Ubuntu Oneiric 11.10

To make this the default voice I did:
$ sudo gedit /usr/share/festival/voices.scm

(defvar default-voice-priority-list
'(;nitech_us_slt_arctic_hts ; [Error: HTS_Model_load_pdf: Failed to load header of pdfs.]
;kal_diphone
;cmu_us_bdl_arctic_hts
;cmu_us_jmk_arctic_hts
cmu_us_slt_arctic_hts ; Custom compile
;cmu_us_awb_arctic_hts
; cstr_rpx_nina_multisyn ; restricted license (lexicon)
; ...

Thanks again

**CHaoSlayeR** · November 25th, 2011

@Calrama: THX so much for investing the time & resources as well as sharing the voice with all of us!

To all of those, who experience the following message:

Code:

Error: HTS_Model_load_pdf: Failed to load header of pdfs.

The Debian guys already have a "fix" for this by including the old HTS engine and publishing it as a module named "hts21compat".

You can find all information in this bug: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=589614

So, if no appropriate bug is listed here, you probably might want to go ahead and file a bug for including that patch. This way, the available pre-trained HTS-2.1 voices would remain working while the new ones also do with the new hts_engine.

**SwedishWings** · December 2nd, 2011

Originally Posted by CHaoSlayeR

@Calrama: THX so much for investing the time & resources as well as sharing the voice with all of us!

To all of those, who experience the following message:

Code:

Error: HTS_Model_load_pdf: Failed to load header of pdfs.

The Debian guys already have a "fix" for this by including the old HTS engine and publishing it as a module named "hts21compat".

You can find all information in this bug: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=589614

So, if no appropriate bug is listed here, you probably might want to go ahead and file a bug for including that patch. This way, the available pre-trained HTS-2.1 voices would remain working while the new ones also do with the new hts_engine.

That is good news, thanks for the heads-up.

Has anyone managed to build festival from the Debian patch? If so, a short how-to would be much appreciated!

Thanks,
Mike

EDIT: I managed to build it myself, it goes something like this:

Code:

$ mkdir whatever
$ cd whatever
$ git clone git://anonscm.debian.org/tts/festival.git
$ tar cvf festival_2.1~release.orig.tar festival
$ gzip festival_2.1~release.orig.tar
$ cd festival
$ debuild
$ cd ..
$ sudo dpkg --install festival_2.1~release-2.2_i386.deb

That's it. You get some errors about signing in the end of the build that can be ignored.

You have to install some dependencies of course, but it was easy.

Read carefully the link above about changing the files

Code:

/usr/share/festival/voices/us/nitech_us_XXX_arctic_hts/festvox/nitech_us_XXX_arctic_hts.scm

to make it work with the backward compatibility module that was added.

Cheers,
Mike

**mymiasma** · February 18th, 2012

I wasn't able to find some of the festvox files at the location in the Howto but was able to find them here:

http://www.speech.cs.cmu.edu/festiva...estival/1.4.0/

Hope this helps someone out.

Otherwise, despite it's age a very helpful article.

**amanisdude** · March 9th, 2012

Wow. Spent over 40 total hours compiling this (HTS-demo_CMU-ARCTIC-SLT) over and over working out the kinks only to realize that these are not the same as the Nitech voices**. =D>

At any rate, one of the errors I kept getting was the same as mrplow (quoted below). As it turns out, SoX dropped the depreciated '-w' switch in version 14.1.0, resulting in the help text output and overall failure of 'Training.pl'. (See http://sox.git.sourceforge.net/git/g...8b7df334606820.)

In essence, this means that there is one more dependency that is not made known in the 'INSTALL' ReadMe file: SoX 14.0.1 or earlier. (Go figure.) I suppose this could be fixed by editing the 'Config.pl' in the 'scripts' directory to use the newer '-2' switch instead of '-w' mrplow recommends, but this is how I fixed it:
____________

First, download the source for SoX 14.0.1 (the latest version that still supports the '-w' switch):

Code:

cd ~/Downloads
wget -c http://sourceforge.net/projects/sox/files/sox/14.0.1/sox-14.0.1.tar.gz

Then, extract its contents, configure it, make it, and install it:

Code:

tar xvf sox-14.0.1.tar.gz
cd sox-14.0.1/
./configure
make
sudo make install

This should have installed the binaries to '/usr/local/bin', BUT SoX will fail to link to its library file 'libsox.so.0' in '/usr/local/lib'! To fix this, run:

Code:

LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH

(You'll have to do this every time you build the voice.)

Now you should have the right version of SoX installed. If it still isn't using the right version of SoX, uninstall any other versions of SoX on your system (sudo apt-get remove sox) and make sure '/usr/local/bin' is set in the PATH variable. Heck, I recommend you just uninstall any other versions of SoX before you begin compiling, just in case. (It's not worth it to build for 20+ hours to run into the same error again.) Besides, you can always re-install the official Ubuntu SoX package when you're done.

You should now be able to compile HTS-demo_CMU-ARCTIC-SLT, though, as I said before, the CMU Arctic voices are not the same as the Festvox Nitech voices (which I find to be superior and the ones that I believe mrplow was really looking for). As far as I know, there are no Nitech voices built for Festival 2.X**. If you want to use the Nitech voices, you should probably just grit your teeth and use an older version of Festival (1.96 or sooner)**.

Happy compiling!

**EDIT: Gaah! Shame on me for trying to write a forum post at 2 in the morning. If you want to use the Nitech voices for Festival 2.0.96 or later, see CHaoSlayeR's post above. (Literally, just scroll up.) You'll need to build a compatibility patch into Festival. For information on how to do that, see SwedishWing's post right below it. I need some coffee.
____________

Originally Posted by digitaltoast

I had this same problem - until I found a post that suggested that it's because festival 2.095 requires HTS 2.1.1 voices, which can be found here:
http://hts.sp.nitech.ac.jp/archives/2.1.1/

But it's not straightforward! The whole Festival system seems to be designed to be complicated and keep non-geeks out!

Want to try the 2.1.1 voices? You need to do this:

Code:

* Installation of HTS-demo_CMU-ARCTIC-SLT
==========================================

1. HTS-demo_CMU-ARCTIC-SLT requires Festival, SPTK-3.3, HTS-2.1.1, hts_engine API-1.03, and OpenFst-1.1.
   Please install them before running this demo.
   You can download them from the following websites:

   Festival: http://www.cstr.ed.ac.uk/projects/festival/
   SPTK: http://sp-tk.sourceforge.net/
   HTS: http://hts.sp.nitech.ac.jp/
   hts_engine API: http://hts-engine.sourceforge.net/
   OpenFst: http://www.openfst.org/

   In HTS-demo_CMU-ARCTIC-SLT, a simple F0 extraction script written in Tcl/Tk is included.
   This script calls get_f0 function implemented in the open-source speech toolkit Snack.
   Therefore, HTS-demo_CMU-ARCTIC-SLT also requires Tcl/Tk with Snack.
   ActiveState (http://www.activestate.com/) provides a Tcl/Tk distribution named ActiveTcl
   for many platforms.  You can download it from

   ActiveTcl: http://downloads.activestate.com/ActiveTcl/

   The above distribution includes Snack and it is easy to install and use.
   We recommend you to use this to run this demonstration
   (Of course you can use your own tcl/tk with Snack).
   Note that ActiveTcl 8.5 doesn't include Snack, please use ActiveTcl 8.4.


2. Setup HTS-demo_CMU-ARCTIC-SLT by running configure script:

   % cd HTS-demo_CMU-ARCTIC-SLT
   % ./configure --with-tcl-search-path=/usr/local/ActiveTcl/bin \
                 --with-fest-search-path=/usr/local/festival/examples \
                 --with-sptk-search-path=/usr/local/SPTK-3.3/bin \
                 --with-hts-search-path=/usr/local/HTS-2.1.1_for_HTK-3.4.1/bin \
                 --with-hts-engine-search-path=/usr/local/hts_engine_API-1.03/bin \
                 --with-openfst-search-path=/usr/local/openfst-1.1/bin

   Please adjust the above directories for your environment.
   Note that you should specify festival/examples rather than festival/bin.

   You can change various parameters such as speech analysis conditions and model training conditions
   through ./configure arguments.  For example

   % ./configure MGCORDER=24 GAMMA=0 FREQWARP=0.0              (24-th order cepstrum)
   % ./configure MGCORDER=24 GAMMA=0 FREQWARP=0.42             (24-th order Mel-cepstrum)

   % ./configure MGCORDER=12 GAMMA=1 FREQWARP=0.0  LNGAIN=0    (12-th order LSP,     linear gain)
   % ./configure MGCORDER=12 GAMMA=1 FREQWARP=0.0  LNGAIN=1    (12-th order LSP,     log gain)
   % ./configure MGCORDER=12 GAMMA=1 FREQWARP=0.42 LNGAIN=1    (12-th order Mel-LSP, log gain)
   % ./configure MGCORDER=12 GAMMA=3 FREQWARP=0.42 LNGAIN=1    (12-th order MGC-LSP, log gain)

   % ./configure NSTATE=7 NITER=10 WFLOOR=5   (# of HMM states=7, # of EM iterations=10, mix weight floor=5)

   Please refer to the help message for details:

   % ./configure --help


3. Start running demonstration as follows:

   % cd HTS-demo_CMU-ARCTIC-SLT
   % make

   After composing training data, HMMs are estimated and speech waveforms are synthesized.
   It takes about 12 to 18 hours :-)

12 to 18 HOURS??? And I don't even know what I'm going to end up with. What does "DEMO" mean? Does it just say something and stop? Also, do I want
http://hts.sp.nitech.ac.jp/archives/...-ADAPT.tar.bz2
or
http://hts.sp.nitech.ac.jp/archives/...RAIGHT.tar.bz2
?

It's not the 492Mb of each file I mind, it's the idea of spending 12-18 hours building one to find I wanted the other one!

The only manual I can find for Festival is here:
http://www.cstr.ed.ac.uk/projects/festival/manual/
Dated 1999, for version 1.4

I sometimes feel like I've missed the basics somewhere.
Were it not for threads like this I'd be completely lost!

Originally Posted by mrplow

well that was fun, I'm not sure how far I made it but I eventually ran into this error 70 hours into compiling

Code:

====================================================================================
Start synthesizing waveforms (speaker independent) at Thu Nov 11 18:38:44 PST 2010
====================================================================================

Processing directory /home/mrplow/Desktop/HTS/HTS-demo_CMU-ARCTIC-ADAPT/HTS-demo_CMU-ARCTIC-ADAPT/gen/qst001/ver1/SI/0:
 Synthesizing a speech waveform from cmu_us_arctic_slt_alice01.mgc and cmu_us_arctic_slt_alice01.lf0.../usr/bin/sox: invalid option -- w
/usr/bin/sox FAIL sox: invalid option

/usr/bin/sox: SoX v14.3.1

Usage summary: [gopts] [[fopts] infile]... [fopts] outfile [effect [effopt]]...

SPECIAL FILENAMES (infile, outfile):
-                        Pipe/redirect input/output (stdin/stdout); may need -t
-d, --default-device     Use the default audio device (where available)
-n, --null               Use the `null' file handler; e.g. with synth effect
-p, --sox-pipe           Alias for `-t sox -'

SPECIAL FILENAMES (infile only):
"|program [options] ..." Pipe input from external program (where supported)
http://server/file       Use the given URL as input file (where supported)

GLOBAL OPTIONS (gopts) (can be specified at any point before the first effect):
--buffer BYTES           Set the size of all processing buffers (default 8192)
--clobber                Don't prompt to overwrite output file (default)
--combine concatenate    Concatenate all input files (default for sox, rec)
--combine sequence       Sequence all input files (default for play)
-D, --no-dither          Don't dither automatically
--effects-file FILENAME  File containing effects and options
-G, --guard              Use temporary files to guard against clipping
-h, --help               Display version number and usage information
--help-effect NAME       Show usage of effect NAME, or NAME=all for all
--help-format NAME       Show info on format NAME, or NAME=all for all
--i, --info              Behave as soxi(1)
--input-buffer BYTES     Override the input buffer size (default: as --buffer)
--no-clobber             Prompt to overwrite output file
-m, --combine mix        Mix multiple input files (instead of concatenating)
-M, --combine merge      Merge multiple input files (instead of concatenating)
--magic                  Use `magic' file-type detection
--multi-threaded         Enable parallel effects channels processing (where
                         available)
--norm                   Guard (see --guard) & normalise
--play-rate-arg ARG      Default `rate' argument for auto-resample with `play'
--plot gnuplot|octave    Generate script to plot response of filter effect
-q, --no-show-progress   Run in quiet mode; opposite of -S
--replay-gain track|album|off  Default: off (sox, rec), track (play)
-R                       Use default random numbers (same on each run of SoX)
-S, --show-progress      Display progress while processing audio data
--single-threaded        Disable parallel effects channels processing
--temp DIRECTORY         Specify the directory to use for temporary files
--version                Display version number of SoX and exit
-V[LEVEL]                Increment or set verbosity level (default 2); levels:
                           1: failure messages
                           2: warnings
                           3: details of processing
                           4-6: increasing levels of debug messages
FORMAT OPTIONS (fopts):
Input file format options need only be supplied for files that are headerless.
Output files will have the same format as the input file where possible and not
overriden by any of various means including providing output format options.

-v|--volume FACTOR       Input file volume adjustment factor (real number)
--ignore-length          Ignore input file length given in header; read to EOF
-t|--type FILETYPE       File type of audio
-s/-u/-f/-U/-A/-i/-a/-g  Encoding type=signed-integer/unsigned-integer/floating
                         point/mu-law/a-law/ima-adpcm/ms-adpcm/gsm-full-rate
-e|--encoding ENCODING   Set encoding (ENCODING in above list)
-b|--bits BITS           Encoded sample size in bits
-1/-2/-3/-4/-8           Encoded sample size in bytes
-N|--reverse-nibbles     Encoded nibble-order
-X|--reverse-bits        Encoded bit-order
--endian little|big|swap Encoded byte-order; swap means opposite to default
-L/-B/-x                 Short options for the above
-c|--channels CHANNELS   Number of channels of audio data; e.g. 2 = stereo
-r|--rate RATE           Sample rate of audio
-C|--compression FACTOR  Compression factor for output format
--add-comment TEXT       Append output file comment
--comment TEXT           Specify comment text for the output file
--comment-file FILENAME  File containing comment text for the output file
--no-glob                Don't `glob' wildcard match the following filename

AUDIO FILE FORMATS: 8svx aif aifc aiff aiffc al amb amr-nb amr-wb anb au avr awb caf cdda cdr cvs cvsd cvu dat dvms f32 f4 f64 f8 fap flac fssd gsm gsrt hcom htk ima ircam la lpc lpc10 lu mat mat4 mat5 maud nist ogg paf prc pvf raw s1 s16 s2 s24 s3 s32 s4 s8 sb sd2 sds sf sl smp snd sndfile sndr sndt sou sox sph sw txw u1 u16 u2 u24 u3 u32 u4 u8 ub ul uw vms voc vorbis vox w64 wav wavpcm wv wve xa xi
PLAYLIST FORMATS: m3u pls
AUDIO DEVICE DRIVERS: alsa

EFFECTS: allpass band bandpass bandreject bass bend biquad chorus channels compand contrast crop+ dcshift deemph delay dither divide+ earwax echo echos equalizer fade filter* fir firfit+ flanger gain highpass input# key* ladspa loudness lowpass mcompand mixer noiseprof noisered norm oops output# overdrive pad pan* phaser pitch polyphase* rabbit* rate remix repeat resample* reverb reverse riaa silence sinc spectrogram speed splice stat stats stretch swap synth tempo treble tremolo trim vad vol
  * Deprecated effect    + Experimental effect    # LibSoX-only effect
EFFECT OPTIONS (effopts): effect dependent; see --help-effect
Error in /usr/local/SPTK/bin/excite -p 80 /home/mrplow/Desktop/HTS/HTS-demo_CMU-ARCTIC-ADAPT/HTS-demo_CMU-ARCTIC-ADAPT/gen/qst001/ver1/SI/0/cmu_us_arctic_slt_alice01.pit | /usr/local/SPTK/bin/mglsadf -m 24 -p 80 -a 0.42 -c 0 /home/mrplow/Desktop/HTS/HTS-demo_CMU-ARCTIC-ADAPT/HTS-demo_CMU-ARCTIC-ADAPT/gen/qst001/ver1/SI/0/cmu_us_arctic_slt_alice01.mgc | /usr/local/SPTK/bin/x2x +fs | /usr/bin/sox -c 1 -s -w -t raw -r 16000 - -c 1 -s -w -t wav -r 16000 /home/mrplow/Desktop/HTS/HTS-demo_CMU-ARCTIC-ADAPT/HTS-demo_CMU-ARCTIC-ADAPT/gen/qst001/ver1/SI/0/cmu_us_arctic_slt_alice01.wav

it can probably be fixed by changing scripts/Config.pm line 248
$SOXOPTION = 'w';
but I've spent enough time and I'll wait until someone tries out that new festival and reports back

Thread: HOWTO: Make festival TTS use better voices (MBROLA / CMU / HTS)

Thread Tools

Display

Re: HOWTO: Make festival TTS use better voices (MBROLA / CMU / HTS)

Re: HOWTO: Make festival TTS use better voices (MBROLA / CMU / HTS)

Re: HOWTO: Make festival TTS use better voices (MBROLA / CMU / HTS)

Re: HOWTO: Make festival TTS use better voices (MBROLA / CMU / HTS)

Re: HOWTO: Make festival TTS use better voices (MBROLA / CMU / HTS)

Re: HOWTO: Make festival TTS use better voices (MBROLA / CMU / HTS)

Re: HOWTO: Make festival TTS use better voices (MBROLA / CMU / HTS)

Re: HOWTO: Make festival TTS use better voices (MBROLA / CMU / HTS)

Re: HOWTO: Make festival TTS use better voices (MBROLA / CMU / HTS)

Re: HOWTO: Make festival TTS use better voices (MBROLA / CMU / HTS)

Bookmarks

Bookmarks

Posting Permissions