I had this same problem - until I found a post that suggested that it's because festival 2.095 requires HTS 2.1.1 voices, which can be found here:
http://hts.sp.nitech.ac.jp/archives/2.1.1/
But it's not straightforward! The whole Festival system seems to be designed to be complicated and keep non-geeks out!
Want to try the 2.1.1 voices? You need to do this:
Code:
* Installation of HTS-demo_CMU-ARCTIC-SLT
==========================================
1. HTS-demo_CMU-ARCTIC-SLT requires Festival, SPTK-3.3, HTS-2.1.1, hts_engine API-1.03, and OpenFst-1.1.
Please install them before running this demo.
You can download them from the following websites:
Festival: http://www.cstr.ed.ac.uk/projects/festival/
SPTK: http://sp-tk.sourceforge.net/
HTS: http://hts.sp.nitech.ac.jp/
hts_engine API: http://hts-engine.sourceforge.net/
OpenFst: http://www.openfst.org/
In HTS-demo_CMU-ARCTIC-SLT, a simple F0 extraction script written in Tcl/Tk is included.
This script calls get_f0 function implemented in the open-source speech toolkit Snack.
Therefore, HTS-demo_CMU-ARCTIC-SLT also requires Tcl/Tk with Snack.
ActiveState (http://www.activestate.com/) provides a Tcl/Tk distribution named ActiveTcl
for many platforms. You can download it from
ActiveTcl: http://downloads.activestate.com/ActiveTcl/
The above distribution includes Snack and it is easy to install and use.
We recommend you to use this to run this demonstration
(Of course you can use your own tcl/tk with Snack).
Note that ActiveTcl 8.5 doesn't include Snack, please use ActiveTcl 8.4.
2. Setup HTS-demo_CMU-ARCTIC-SLT by running configure script:
% cd HTS-demo_CMU-ARCTIC-SLT
% ./configure --with-tcl-search-path=/usr/local/ActiveTcl/bin \
--with-fest-search-path=/usr/local/festival/examples \
--with-sptk-search-path=/usr/local/SPTK-3.3/bin \
--with-hts-search-path=/usr/local/HTS-2.1.1_for_HTK-3.4.1/bin \
--with-hts-engine-search-path=/usr/local/hts_engine_API-1.03/bin \
--with-openfst-search-path=/usr/local/openfst-1.1/bin
Please adjust the above directories for your environment.
Note that you should specify festival/examples rather than festival/bin.
You can change various parameters such as speech analysis conditions and model training conditions
through ./configure arguments. For example
% ./configure MGCORDER=24 GAMMA=0 FREQWARP=0.0 (24-th order cepstrum)
% ./configure MGCORDER=24 GAMMA=0 FREQWARP=0.42 (24-th order Mel-cepstrum)
% ./configure MGCORDER=12 GAMMA=1 FREQWARP=0.0 LNGAIN=0 (12-th order LSP, linear gain)
% ./configure MGCORDER=12 GAMMA=1 FREQWARP=0.0 LNGAIN=1 (12-th order LSP, log gain)
% ./configure MGCORDER=12 GAMMA=1 FREQWARP=0.42 LNGAIN=1 (12-th order Mel-LSP, log gain)
% ./configure MGCORDER=12 GAMMA=3 FREQWARP=0.42 LNGAIN=1 (12-th order MGC-LSP, log gain)
% ./configure NSTATE=7 NITER=10 WFLOOR=5 (# of HMM states=7, # of EM iterations=10, mix weight floor=5)
Please refer to the help message for details:
% ./configure --help
3. Start running demonstration as follows:
% cd HTS-demo_CMU-ARCTIC-SLT
% make
After composing training data, HMMs are estimated and speech waveforms are synthesized.
It takes about 12 to 18 hours :-)
12 to 18 HOURS??? And I don't even know what I'm going to end up with. What does "DEMO" mean? Does it just say something and stop? Also, do I want
http://hts.sp.nitech.ac.jp/archives/...-ADAPT.tar.bz2
or
http://hts.sp.nitech.ac.jp/archives/...RAIGHT.tar.bz2
?
It's not the 492Mb of each file I mind, it's the idea of spending 12-18 hours building one to find I wanted the other one!
The only manual I can find for Festival is here:
http://www.cstr.ed.ac.uk/projects/festival/manual/
Dated 1999, for version 1.4
I sometimes feel like I've missed the basics somewhere.
Were it not for threads like
this I'd be completely lost!