Page 1 of 19 12311 ... LastLast
Results 1 to 10 of 181

Thread: HOWTO: Make festival TTS use better voices (MBROLA / CMU / HTS)

  1. #1
    Join Date
    Oct 2007
    Beans
    130

    Question HOWTO: Make festival TTS use better voices (MBROLA / CMU / HTS)

    Introduction

    Festival is a Text-To-Speech synthesis system developed at the University of Edinburgh. It can be used with several different voices, which are the models and data it uses to convert typed text into audible speech.

    This HOWTO is meant to be a centralized collection of information about how to install the currently available voices for the Festival TTS. While sections of this HOWTO may apply to installing voices for other languages, it is primarily concerned with the English language voices.

    The layout of this HOWTO is as follows:

    • Installing the standard Festvox diphone voices
    • Installing the enhanced MBROLA voices
    • Installing the enhanced CMU Arctic voices
    • Installing the enhanced Nitech HTS voices
    • Testing voices and choosing a default voice
    • Installing Festival 1.96 from source


    Before we get started, make sure all of the preliminary packages are installed on your system with the following command:

    Code:
    sudo apt-get install festival festlex-cmu festlex-poslex festlex-oald libestools1.2 unzip

    Installing the standard Festvox diphone voices

    These are the voices that are supplied by the Festvox project, which is run by the Carnegie Mellon University speech group. See the voice demo page (kal, ked, don and rab are the voices of interest). Of all of the voices we are concerned with, these take up the smallest size on disk, and are the only voices currently in the Ubuntu package tree. All of the other voices have to be installed manually. However, these are also the poorest quality voices and currently, on my computer, they cause festival to segfault (though I have used them successfully with other set-ups in the past). YMMV.

    Some of the voices have 8k and 16k versions. This is the output frequency of the audio synthesis. The 16k versions have higher quality output and sound better. The four English voices are:

    festvox-don
    festvox-rablpc[8|16]k
    festvox-kallpc[8|16]k
    festvox-kdlpc[8|16]k

    Searching for other voices to install

    To find all the available festvox voices, execute the following command:

    Code:
    apt-cache search festvox-*

    Installing the voices

    Once you've decided on the voices to install, just install them as you would any other apt package. For example, this command will install all of the English voices using the higher quality 16k voices:

    Code:
    sudo apt-get install festvox-don festvox-rablpc16k festvox-kallpc16k festvox-kdlpc16k

    Installing the enhanced MBROLA voices

    These voices are provided by the MBROLA project, run by the TCTS Lab of the Faculté Polytechnique de Mons in Belgium. They offer several voices, in a variety of languages, which sound much better than the Festvox diphone voices. The database of voices can be viewed at the project's download page. See the voice demo page (the us1, us2 and us3 are the voices of interest). To use the MBROLA voices we need three parts: (1.) the mbrola binary program that parses a tokenstream the festival program feeds it and returns audio data back to festival, (2.) the MBROLA voices, and (3.) the Festvox wrappers to let the festival program use the voices. This may sound scary, but it's really very easy to do.

    Downloading the voices, binary and wrappers

    We will download everything we need for the English voices into a temporary directory (total download size is approximately twenty megs):

    Code:
    mkdir mbrola_tmp
    cd mbrola_tmp/
    wget http://tcts.fpms.ac.be/synthesis/mbrola/bin/pclinux/mbrola3.0.1h_i386.deb
    wget -c http://tcts.fpms.ac.be/synthesis/mbrola/dba/us1/us1-980512.zip
    wget -c http://tcts.fpms.ac.be/synthesis/mbrola/dba/us2/us2-980812.zip
    wget -c http://tcts.fpms.ac.be/synthesis/mbrola/dba/us3/us3-990208.zip
    wget -c http://www.festvox.org/packed/festival/latest/festvox_us1.tar.gz
    wget -c http://www.festvox.org/packed/festival/latest/festvox_us2.tar.gz
    wget -c http://www.festvox.org/packed/festival/latest/festvox_us3.tar.gz

    Installing the binary, unpacking the voices and wrappers

    Since the MBROLA project has kindly provided us with a binary deb, we can skip the step of building the binary from souce and just use dpkg to install it:

    Code:
    sudo dpkg -i mbrola3.0.1h_i386.deb

    If you're paranoid about installing a deb from a third-party like the MBROLA project, you can grab the binary in a zip file from their download page and install it manually. There doesn't appear to be any source package released to the public, so if you're extremely paranoid, you may just want to avoid using the MBROLA voices altogether. I've personally never had a problem using the binary either from the zip or the deb.

    Next we'll unpack the voices and the wrappers:

    Code:
    unzip -x us1-980512.zip
    unzip -x us2-980812.zip
    unzip -x us3-990208.zip
    tar xvf festvox_us1.tar.gz
    tar xvf festvox_us2.tar.gz
    tar xvf festvox_us3.tar.gz

    Installing the voices and wrappers

    First we'll make the directories where the voices will be installed:

    Code:
    sudo mkdir -p /usr/share/festival/voices/english/us1_mbrola/
    sudo mkdir -p /usr/share/festival/voices/english/us2_mbrola/
    sudo mkdir -p /usr/share/festival/voices/english/us3_mbrola/

    Then we can install the voices and wrappers there:

    Code:
    sudo mv us1 /usr/share/festival/voices/english/us1_mbrola/
    sudo mv us2 /usr/share/festival/voices/english/us2_mbrola/
    sudo mv us3 /usr/share/festival/voices/english/us3_mbrola/
    sudo mv festival/lib/voices/english/us1_mbrola/* /usr/share/festival/voices/english/us1_mbrola/
    sudo mv festival/lib/voices/english/us2_mbrola/* /usr/share/festival/voices/english/us2_mbrola/
    sudo mv festival/lib/voices/english/us3_mbrola/* /usr/share/festival/voices/english/us3_mbrola/

    Tidy up

    Lastly, we can remove the temporary directory:

    Code:
    cd ../
    rm -rf mbrola_tmp/

    If all went well, you now have a set of working MBROLA voices installed. See the section below on how to test that the voices are working properly.


    Installing the enhanced CMU Arctic voices

    These voices were developed by the Language Technologies Institute at Carnegie Mellon University. They sound much better than both the diphone and the MBROLA voices. See the information page and voice demo page (the *_arctic_cg are the voices of interest). The drawback is that each voice takes over a hundred megs on disk, and with six English voices to choose from, that can take up a lot of bandwidth to download and depending on how much disk space you have to work with, six-hundred plus megs of space might be a bit much for voice data. However, the HTS voices discussed in the next section may in fact provide equal or better quality synthesis, and are only less than %2 of the size.

    Downloading the voices

    We will download everything we need for the English voices into a temporary directory (total download size is approximately six-hundred megs — you might want to go brew some coffee or something, lots of it...we might be here a while):

    Code:
    mkdir cmu_tmp
    cd cmu_tmp/
    wget -c http://www.speech.cs.cmu.edu/cmu_arctic/packed/cmu_us_awb_arctic-0.90-release.tar.bz2
    wget -c http://www.speech.cs.cmu.edu/cmu_arctic/packed/cmu_us_bdl_arctic-0.95-release.tar.bz2
    wget -c http://www.speech.cs.cmu.edu/cmu_arctic/packed/cmu_us_clb_arctic-0.95-release.tar.bz2
    wget -c http://www.speech.cs.cmu.edu/cmu_arctic/packed/cmu_us_jmk_arctic-0.95-release.tar.bz2
    wget -c http://www.speech.cs.cmu.edu/cmu_arctic/packed/cmu_us_rms_arctic-0.95-release.tar.bz2
    wget -c http://www.speech.cs.cmu.edu/cmu_arctic/packed/cmu_us_slt_arctic-0.95-release.tar.bz2
    Note: You can add the option "--limit-rate" to wget to set a maximum transfer speed (e.g., "wget -c --limit-rate=60K ..." to limit the download rate to 60KB/s).

    Unpacking the voices

    Due to the size of the archives, this will probably take a few minutes as well (Maybe get a slice of cake this time? Also, note than tar is run non-verbosely so we don't have seven-million lines flooding the terminal):

    Code:
    for t in `ls cmu_*` ; do tar xf $t ; done
    rm *.bz2

    Installing the voices

    Now we can install the voices:

    Code:
    sudo mkdir -p /usr/share/festival/voices/english/
    sudo mv * /usr/share/festival/voices/english/

    The voices are now installed, but Festival requires them to have slightly different directory names, so we'll rename the directories as needed.

    Code:
    for d in `ls /usr/share/festival/voices/english` ; do
    if [[ "$d" =~ "cmu_us_" ]] ; then
    sudo mv "/usr/share/festival/voices/english/${d}" "/usr/share/festival/voices/english/${d}_clunits" 
    fi ; done
    Note: We could just use symlinks, but then the list of installed voices (see section below on testing voices) would show entries for the actual directories and the symlinks, though only the symlinks would work properly.


    Tidy up

    Lastly, we can remove the temporary directory:

    Code:
    cd ../
    rm -rf cmu_tmp/

    If all went well, you now have a set of working CMU Arctic voices installed. See the section below on how to test that the voices are working properly.


    Installing the enhanced Nitech HTS voices

    Note: Unfortunately, these voices require at least Festival 1.95, but only 1.4 is available on Ubuntu prior to Hardy (8.04). This means that if you want to use these voices on any prior release (Gutsy, Feisty...), you need to compile Festival from source. This is relatively easy to do, and a section at the bottom of this HOWTO will guide you through the process. Don't waste your time trying to install these voices on Festival versions less that 1.95, they just won't work.

    These voices are produced by the HTS working group hosted at the Nagoya Institute of Technology. They have produced excellent quality voices which take up very little disk space. In terms of quality and size, probably the best (non-commercial) English voices availible for Festival. See the voice demo page (the *_arctic_hts are the voices of interest). Highly recommended. The voices are available on their download page.

    Downloading the voices

    We will download everything we need for the English voices into a temporary directory (total download size is approximately ten megs):

    Code:
    mkdir hts_tmp
    cd hts_tmp/
    wget -c http://hts.sp.nitech.ac.jp/archives/2.1/festvox_nitech_us_awb_arctic_hts-2.1.tar.bz2
    wget -c http://hts.sp.nitech.ac.jp/archives/2.1/festvox_nitech_us_bdl_arctic_hts-2.1.tar.bz2
    wget -c http://hts.sp.nitech.ac.jp/archives/2.1/festvox_nitech_us_clb_arctic_hts-2.1.tar.bz2
    wget -c http://hts.sp.nitech.ac.jp/archives/2.1/festvox_nitech_us_rms_arctic_hts-2.1.tar.bz2
    wget -c http://hts.sp.nitech.ac.jp/archives/2.1/festvox_nitech_us_slt_arctic_hts-2.1.tar.bz2
    wget -c http://hts.sp.nitech.ac.jp/archives/2.1/festvox_nitech_us_jmk_arctic_hts-2.1.tar.bz2
    wget -c http://hts.sp.nitech.ac.jp/archives/1.1.1/cmu_us_kal_com_hts.tar.gz
    wget -c http://hts.sp.nitech.ac.jp/archives/1.1.1/cstr_us_ked_timit_hts.tar.gz

    Unpacking the voices

    Next we'll unpack the voices:

    Code:
    for t in `ls` ; do tar xvf $t ; done

    Installing the voices

    Now we can install the voices:

    Code:
    sudo mkdir -p /usr/share/festival/voices/us
    sudo mv lib/voices/us/* /usr/share/festival/voices/us/
    sudo mv lib/hts.scm /usr/share/festival/hts.scm

    Tidy up

    Lastly, we can remove the temporary directory:

    Code:
    cd ../
    rm -rf hts_tmp/

    If all went well, you now have a set of working Nitech HTS voices installed. See the section below on how to test that the voices are working properly.


    Testing voices and choosing a default voice

    Now that you have some voices to play with, you may want to try out different voices to suite your taste, or just to make sure things are working properly. To get a list of all the voices that are installed, you can simply look at the directories under /usr/share/festival/voices:

    Code:
    for d in `ls /usr/share/festival/voices` ; do ls "/usr/share/festival/voices/${d}" ; done

    Or you can run the fetival program, and, at the prompt, enter (voice.list). The result will look something like this:

    Code:
    festival> (voice.list)
    (us1_mbrola
     us3_mbrola
     us2_mbrola
     nitech_us_slt_arctic_hts
     nitech_us_jmk_arctic_hts
     nitech_us_clb_arctic_hts
     nitech_us_rms_arctic_hts
     nitech_us_bdl_arctic_hts
     nitech_us_awb_arctic_hts)

    Playing with the voices

    To select a voice, add the prefix "voice_" to the voice name, and surround it by parentheses:

    Code:
    festival> (voice_us2_mbrola)

    Then you can test it using (SayText "The text") to speak a single line of text, or (tts "somefile.txt" nil) to process an entire file. You can also hear a short introduction about Festival with (intro).

    Code:
    festival> (SayText "Hello from Ubuntu")
    festival> (tts "story.txt" nil)
    festival> (intro)

    Selecting a default voice

    You can edit the file /etc/festival.scm to select a default voice. Open the file in your favorite editor (you will need super-user privleges, so run it with sudo or gksu) and add the line:

    Code:
    (set! voice_default 'voice_nitech_us_rms_arctic_hts)

    Simply replace "voice_nitech_us_rms_arctic_hts" with whatever your favorite voice is.

    Making it work with PulseAudio, ESD or ALSA

    By default, festival tries to synthesize speech to /dev/dsp using OSS. To make it work with ESD, PulseAudio or ALSA you have two options. The first is to use a wrapper to call festival, and the second is to make festival internally try to use one of them output. The advantage to the first method is that you can use the esd, pluseaudio or the alsa wrapper at any time, without having to edit the config file. The advantage of the second method is that you don't have to remember to call the wrapper—directly running festival works.

    The available wrappers I know of are: the aoss wrapper program from the alsa-oss package, the padsp wrapper from pulseaudio-utils, and the esddsp wrapper from esound-clients — but caveat emptor — there is a bug (#214465) in the esddsp script right now (see this post for a quick-fix).

    To use the internal method, see the instructions for editing /etc/festival.scm at the community docs on TextToSpeech.

    An example /etc/festival.scm

    This is a drop-in /etc/festival.scm file. It covers all of the options mentioned about selecting a default voice and using ESD or ALSA for audio output. The ";;" are comment specifiers, so to use any of the options simply uncomment them by removing the ";;" in from of them and adding it to whatever you want to deactivate. It doesn't matter whether you have two different options uncommented at the same time (e.g., two different default_voice options), Festival will simply use whichever one is defined last in the file, however it may be a little bit slower to start up in such a case and it's better to just uncomment the one you want to use.

    Code:
    ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
    ;;;;
    ;;;; setup audio output
    ;;;;
    ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
    
    ;;;; defaults to oss output
    
    ;;;; use pulseaudio to output sound
    ;;(Parameter.set 'Audio_Command "paplay $FILE") 
    ;;(Parameter.set 'Audio_Method 'Audio_Command)
    ;;(Parameter.set 'Audio_Required_Format 'snd)
    
    ;;;; use esd to output sound
    ;;(Parameter.set 'Audio_Command "esdplay $FILE") 
    ;;(Parameter.set 'Audio_Method 'Audio_Command)
    ;;(Parameter.set 'Audio_Required_Format 'snd)
    
    ;;;; use alsa to output sound
    ;;(Parameter.set 'Audio_Command "aplay -D plug:dmix -q -c 1 -t raw -f s16 -r $SR $FILE") 
    ;;(Parameter.set 'Audio_Method 'Audio_Command)
    ;;(Parameter.set 'Audio_Required_Format 'snd)
    
    
    ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
    ;;;;
    ;;;; setup voices
    ;;;;
    ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
    
    ;;;; Festvox voices
    ;;(set! default_voice 'voice_rab_diphone)
    ;;(set! default_voice 'voice_don_diphone)
    ;;(set! default_voice 'voice_kal_diphone)
    ;;(set! default_voice 'voice_ked_diphone)
    
    ;;;; MBROLA voices
    ;;(set! default_voice 'voice_us1_mbrola)
    ;;(set! default_voice 'voice_us2_mbrola)
    ;;(set! default_voice 'voice_us3_mbrola)
    
    ;;;; CMU Arctic voices
    ;;(set! voice_default 'voice_cmu_us_rms_arctic_clunits)
    ;;(set! voice_default 'voice_cmu_us_bdl_arctic_clunits)
    ;;(set! voice_default 'voice_cmu_us_slt_arctic_clunits)
    ;;(set! voice_default 'voice_cmu_us_clb_arctic_clunits)
    ;;(set! voice_default 'voice_cmu_us_awb_arctic_clunits)
    ;;(set! voice_default 'voice_cmu_us_jmk_arctic_clunits)
    
    ;;;; Nitech HTS voices
    (set! voice_default 'voice_nitech_us_rms_arctic_hts)
    ;;(set! voice_default 'voice_nitech_us_bdl_arctic_hts)
    ;;(set! voice_default 'voice_nitech_us_slt_arctic_hts)
    ;;(set! voice_default 'voice_nitech_us_clb_arctic_hts)
    ;;(set! voice_default 'voice_nitech_us_awb_arctic_hts)
    ;;(set! voice_default 'voice_nitech_us_jmk_arctic_hts)
    ;;(set! voice_default 'voice_cmu_us_kal_com_hts)
    ;;(set! voice_default 'voice_cstr_us_ked_timit_hts)
    
    
    ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
    ;;;;
    ;;;; Advanced voice configuration
    ;;;;
    ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
    
    ;;;; Slow the HTS speech down.
    ;;(set! hts_duration_stretch 0.1)
    
    ;;;; Slow the standard voices down
    ;;(Parameter.set 'Duration_Stretch 2.5)
    
    ;;;; Set volume.
    ;;(set! default_after_synth_hooks
    ;;    (list (lambda (utt) (utt.wave.rescale utt 2.0 t))))
    Note: Advanced voice config based on this page


    Installing Festival 1.96 from source

    Note: If you have any errors that stop the build process, please report them here so I can update the HOWTO. Even if it's something trivial like a broken path, and you fix it yourself, others may not know how to fix it.

    The HTS voices above require at lest Festival 1.95. Here we will go through the steps necessary to build and install Festival 1.96. This method was tested on the Dapper 6.06.1 VMplayer appliance using VirtualBox, so it should work on all Ubuntu releases. We will build a backport using official Ubuntu sources, so you don't need to remove any existing festival installation — apt will upgrade it for you.

    We firstly need some development tools, libraries and headers. Use the following command to install the build requirements:

    Code:
    sudo apt-get install autotools-dev debhelper gawk libesd0-dev libncurses5-dev quilt g++ texinfo dpkg-dev devscripts fakeroot

    Say yes to whatever dependencies are required.

    Downloading the source packages

    We will download everything we need for building Festival 1.96 into a temporary directory (total download size is approximately five megs):

    Code:
    mkdir fest_tmp
    cd fest_tmp/
    wget -c http://archive.ubuntu.com/ubuntu/pool/universe/s/speech-tools/speech-tools_1.2.96~beta-2.dsc
    wget -c http://archive.ubuntu.com/ubuntu/pool/universe/s/speech-tools/speech-tools_1.2.96~beta.orig.tar.gz
    wget -c http://archive.ubuntu.com/ubuntu/pool/universe/s/speech-tools/speech-tools_1.2.96~beta-2.diff.gz
    wget -c http://archive.ubuntu.com/ubuntu/pool/universe/f/festival/festival_1.96~beta-7ubuntu1.dsc
    wget -c http://archive.ubuntu.com/ubuntu/pool/universe/f/festival/festival_1.96~beta.orig.tar.gz
    wget -c http://archive.ubuntu.com/ubuntu/pool/universe/f/festival/festival_1.96~beta-7ubuntu1.diff.gz

    Unpacking the sources

    Next we'll unpack the sources and fix some versioning stuff so we can backport:

    Code:
    sed -i -e "s/(>= 5)/(>= 4.1.75)/" festival_1.96~beta-7ubuntu1.dsc
    sed -i -e "s/(>= 5)/(>= 4)/" speech-tools_1.2.96~beta-2.dsc
    dpkg-source -x festival_1.96~beta-7ubuntu1.dsc
    dpkg-source -x speech-tools_1.2.96~beta-2.dsc
    sed -i -e "s/(>= 5)/(>= 4)/" -e "s/\${binary:Version}/\${Source-Version}/g" speech-tools-1.2.96~beta/debian/control
    sed -i -e "s/(>= 5)/(>= 4.1.75)/" -e "s/ (>= 3.105)//" -e "s/ (>= 3.0-10)//" -e "s/ (>= 2.86.ds1)//" -e "s/\${binary:Version}/\${Source-Version}/g" festival-1.96~beta/debian/control

    Compiling and installing the sources

    Now we can compile the sources. Firstly we need to compile the Speech Tools. I'm using debuild because it makes it easier.

    Code:
    cd speech-tools-1.2.96~beta/
    debuild binary
    cd ../

    This should have created three deb files, which we need to install before we can build and install Festival:

    Code:
    sudo dpkg -i libestools1.2_1.2.96~beta-2_i386.deb libestools1.2-dev_1.2.96~beta-2_i386.deb speech-tools_1.2.96~beta-2_i386.deb

    Now we can build Festival, but first we should add a patch for HTS voices (patch derived from Nitech hts.scm file included in their voices). This step is optional, but recommended, as it can improve the synthesis in some circumstances. Save the following file to festival-1.96~beta/debian/patches/lib_hts.scm.diff:

    Code:
    --- a/lib/hts.scm
    +++ b/lib/hts.scm
    @@ -108,7 +108,7 @@
       (format ofd "+%s" (if (string-equal "0" (item.feat s "n.name"))
     			"x" (item.feat s "n.name")))
     ;  nn.name
    -  (format ofd "+%s" (if (string-equal "0" (item.feat s "n.n.name"))
    +  (format ofd "=%s" (if (string-equal "0" (item.feat s "n.n.name"))
     			"x" (item.feat s "n.n.name")))
     
     ;  position in syllable (segment)
    @@ -299,7 +299,7 @@
     	      (item.feat s "R:SylStructure.parent.parent.R:Word.content_words_out")))
     
     ;  distance from content word in phrase
    -  (format ofd ";%s" 
    +  (format ofd "#%s" 
     	  (if (string-equal "pau" (item.feat s "name"))
     	      "x"
     	      (item.feat s "R:SylStructure.parent.parent.R:Word.lisp_distance_to_p_content")))
    @@ -377,7 +377,7 @@
     	      (item.feat s "R:SylStructure.parent.parent.R:Phrase.parent.n.lisp_num_syls_in_phrase")))
     
     ;  length of next phrase (word)
    -  (format ofd "=:%s" 
    +  (format ofd "=%s" 
     	  (if (string-equal "pau" (item.feat s "name"))
     	      (item.feat s "n.R:SylStructure.parent.parent.R:Phrase.parent.lisp_num_words_in_phrase")
     	      (item.feat s "R:SylStructure.parent.parent.R:Phrase.parent.n.lisp_num_words_in_phrase")))
    Note: This file needs to preserve formatting and tab characters! If you get an error about patching failing in the patch command below, try downloading the file from here (right-click save-as) and re-running the patch command. Alternately, you could grab the hts.scm file from a Nitech HTS voice (see section above) and replace the festival-1.96~beta/lib/hts.scm file with it, and skip the patch step.

    Now for the actual build:

    Code:
    cd festival-1.96~beta/
    patch -p1 -i debian/patches/lib_hts.scm.diff
    debuild binary
    cd ../

    And now the install:

    Code:
    sudo dpkg -i festival_1.96~beta-7ubuntu1_i386.deb festival-dev_1.96~beta-7ubuntu1_i386.deb
    sudo apt-get install festlex-cmu festlex-poslex

    Tidy up

    Lastly, we can remove the temporary directory:

    Code:
    cd ../
    rm -rf fest_tmp/

    If all went well, you now have Fastival 1.96 installed.


    That's it! Have fun!
    Last edited by MonkeeSage; February 16th, 2009 at 09:40 AM. Reason: pulseaudio supposrt
    [People] are usually satisfied with bad argument only when their convictions rest on other grounds. (John Oman, Grace and Personality [New York: Macmillan, 1925], p. 38).

  2. #2
    Join Date
    Jul 2007
    Beans
    1
    Distro
    Hardy Heron (Ubuntu Development)

    Re: HOWTO: Make festival TTS use better voices (MBROLA / CMU / HTS)

    Thanks for the post. Definately made it easy installing the other voices.

    Any reason to do your configuration in /etc/festival.scm vs ~/.festivalrc ?
    I currently have alsa and my default configured in ~/.festivalrc and was wondering if I'm missing any options that could be set in the other location?

  3. #3
    Join Date
    Oct 2007
    Beans
    130

    Re: HOWTO: Make festival TTS use better voices (MBROLA / CMU / HTS)

    Quote Originally Posted by datarez View Post
    Thanks for the post. Definately made it easy installing the other voices.

    Any reason to do your configuration in /etc/festival.scm vs ~/.festivalrc ?
    I currently have alsa and my default configured in ~/.festivalrc and was wondering if I'm missing any options that could be set in the other location?
    No problem.

    The possible advantage to having your configuration in /etc/festival.scm versus having it in ~/.festivalrc is that /etc/festival.scm applies to all users, so if you have more than one user account on your computer and festival is used by multiple people, you can prevent duplicating a user config for each user and just put everything in /etc/festival.scm. Or you could put the sound configuration stuff in /etc/festival.scm and let each user select a default voice through ~/.festivalrc.

    In terms of being able to set more options and such, I don't think there is any practical difference between using the global or local config files -- they are both loaded in the exact same way at the end of /usr/share/festival/init.scm (lines 138 and 146 respectively).
    [People] are usually satisfied with bad argument only when their convictions rest on other grounds. (John Oman, Grace and Personality [New York: Macmillan, 1925], p. 38).

  4. #4
    Join Date
    Jan 2006
    Beans
    24

    Re: HOWTO: Make festival TTS use better voices (MBROLA / CMU / HTS)

    Rergarding the hts, after installing (a single) one of the voices I receive the following error starting festival:
    SIOD ERROR: module hts_engine required, but not compiled in this installation

  5. #5
    Join Date
    Sep 2007
    Location
    Sweden
    Beans
    14
    Distro
    Ubuntu 8.10 Intrepid Ibex

    Re: HOWTO: Make festival TTS use better voices (MBROLA / CMU / HTS)

    Kick-*** how to tut. for adding better voices to festival. Thank you so much MonkeeSage!

    The only problem I got into was trying to use the Nitech HTS voices.

    SIOD ERROR: module hts_engine required, but not compiled in this installation

  6. #6
    Join Date
    Oct 2007
    Beans
    130

    Re: HOWTO: Make festival TTS use better voices (MBROLA / CMU / HTS)

    @yosemite610,
    @Debuggern

    Sounds like the festival version on your box is not recent enough to support the HTS voices.

    What do you get when run this command:

    Code:
    festival --version
    I'm pretty sure that they work with 1.95, but I haven't verified that. It's been a couple years since I've been running 1.96 (on different boxes). But I thought that 1.95 was recent enough.
    [People] are usually satisfied with bad argument only when their convictions rest on other grounds. (John Oman, Grace and Personality [New York: Macmillan, 1925], p. 38).

  7. #7
    Join Date
    Oct 2007
    Beans
    130

    Re: HOWTO: Make festival TTS use better voices (MBROLA / CMU / HTS)

    PS. In case anyone is interested, I'm writing a very loose front-end to festival/speech-dispatcher that can load and speak text files. This is VERY, very beta, and it comes with no support or promise that it will work (at all! for any reason!). But, in case you are interested, here you go:

    gspeak.py & gspeak.glade

    ---->%----
    Code:
    #!/usr/bin/python
    # -*- ts=4:sw=4:noexpandtab -*-
    
    import os
    import sys
    
    try:
    	import pygtk
    	pygtk.require("2.0")
    except:
    	pass
    try:
    	import gtk
    	import gtk.glade
    except:
    	print 'This program requires pygtk\n' \
    				'http://www.pygtk.org/'
    	sys.exit(1)
    
    class SpeechKit:
    
    	def __init__(self):
    		self.use_spd = True
    		self.active_file = None
    		self.pwd = os.path.dirname(sys.argv[0])
    
    		self.glade  = gtk.glade.XML('%s/gspeak.glade' % self.pwd, 'main_window')
    		self.window = self.glade.get_widget('main_window')
    		self.view   = self.glade.get_widget('text')
    
    		self.glade.signal_autoconnect({
    			'on_main_quit'   : self.on_main_quit,
    			'on_open'        : self.open_file,
    			'on_speak'       : self.start_speech,
    			'on_stop'        : self.stop_speech,
    			'on_choose_fest' : self.on_choose_fest,
    			'on_choose_spd'  : self.on_choose_spd,
    			'on_about'       : self.show_about
    		})
    
    	def run(self):
    		gtk.main()
    
    	def open_file(self, widget=None):
    		self.glade = gtk.glade.XML('%s/gspeak.glade' % self.pwd, 'filechooser_dialog')
    		dlg   = self.glade.get_widget('filechooser_dialog')
    		dlg.connect('file_activated', self.choose_file)
    		dlg.connect('response', self.close_chooser)
    
    	def close_chooser(self, widget, args=None):
    		widget.destroy()
    
    	def choose_file(self, widget, args=None):
    		name = widget.get_filename()
    		fh   = open(name)
    		data = fh.read()
    		fh.close()
    		self.view.get_buffer().set_text(data)
    		self.window.set_title(os.path.basename(name))
    
    	def start_speech(self, widget=None):
    		buffer = self.view.get_buffer()
    		text = buffer.get_text(*buffer.get_bounds())
    		if self.use_spd:
    			pipe = os.popen('spd-say -e 1 > /dev/null', 'w')
    			pipe.write(text)
    			pipe.close()
    		else:
    			fh = open('/tmp/foobarbaz', 'w')
    			fh.write(text)
    			fh.close()
    			os.system(r'echo "(tts \"/tmp/foobarbaz\" nil)" | aoss festival &')
    
    	def stop_speech(self, widget=None):
    		if self.use_spd:
    			os.system('spd-say -S')
    			os.system('spd-say -C')
    		else:
    			pipe = os.popen('pidof festival', 'r')
    			pid = pipe.read().strip()
    			while not pid == "":
    				os.system('killall -9 festival')
    				os.system('killall -9 /usr/lib/festival/audsp')
    				pid = pipe.read().strip()
    			pipe.close()
    
    	def on_choose_fest(self, widget=None):
    		self.use_spd = False
    
    	def on_choose_spd(self, widget=None):
    		self.use_spd = True
    
    	def show_about(self, widget=None):
    		self.glade = gtk.glade.XML('%s/gspeak.glade' % self.pwd, 'about_dialog')
    		dlg = self.glade.get_widget('about_dialog')
    		dlg.connect('response', self.about_quit)
    
    	def about_quit(self, widget, args=None):
    		widget.destroy()
    
    	def on_main_quit(self, widget, args=None):
    		self.stop_speech()
    		gtk.main_quit()
    
    if __name__ == '__main__':
    	sk = SpeechKit()
    	if len(sys.argv) > 1:
    		if sys.argv[1] == '-s':
    		  pass # use default spd-say synthesis
    		elif sys.argv[1] == '-f':
    			sk.use_spd = False
    		for file in sys.argv[2:]:
    			sk.active_file = file
    			break
    	sk.run()
    ---->%----

    Code:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <!DOCTYPE glade-interface SYSTEM "glade-2.0.dtd">
    <!--*- mode: xml -*-->
    <glade-interface>
      <widget class="GtkWindow" id="main_window">
        <property name="visible">True</property>
        <property name="title" translatable="yes">Gspeak</property>
        <property name="window_position">GTK_WIN_POS_CENTER</property>
        <property name="default_width">640</property>
        <property name="default_height">480</property>
        <property name="icon_name">stock_headphones</property>
        <signal name="delete_event" handler="on_main_quit"/>
        <child>
          <widget class="GtkVBox" id="vbox1">
            <property name="visible">True</property>
            <child>
              <widget class="GtkMenuBar" id="menubar">
                <property name="visible">True</property>
                <child>
                  <widget class="GtkMenuItem" id="file_menu">
                    <property name="visible">True</property>
                    <property name="label" translatable="yes">_File</property>
                    <property name="use_underline">True</property>
                    <child>
                      <widget class="GtkMenu" id="file_menu_menu">
                        <child>
                          <widget class="GtkImageMenuItem" id="open">
                            <property name="visible">True</property>
                            <property name="label">gtk-open</property>
                            <property name="use_underline">True</property>
                            <property name="use_stock">True</property>
                            <signal name="activate" handler="on_open"/>
                          </widget>
                        </child>
                        <child>
                          <widget class="GtkSeparatorMenuItem" id="separator">
                            <property name="visible">True</property>
                          </widget>
                        </child>
                        <child>
                          <widget class="GtkImageMenuItem" id="quit">
                            <property name="visible">True</property>
                            <property name="label">gtk-quit</property>
                            <property name="use_underline">True</property>
                            <property name="use_stock">True</property>
                            <signal name="activate" handler="on_main_quit"/>
                          </widget>
                        </child>
                      </widget>
                    </child>
                  </widget>
                </child>
                <child>
                  <widget class="GtkMenuItem" id="_Tools">
                    <property name="visible">True</property>
                    <property name="label" translatable="yes">_Tools</property>
                    <property name="use_underline">True</property>
                    <child>
                      <widget class="GtkMenu" id="menu1">
                        <property name="visible">True</property>
                        <child>
                          <widget class="GtkMenuItem" id="tool_fest">
                            <property name="visible">True</property>
                            <property name="label" translatable="yes">Festival</property>
                            <property name="use_underline">True</property>
                            <signal name="activate" handler="on_choose_fest"/>
                          </widget>
                        </child>
                        <child>
                          <widget class="GtkMenuItem" id="tool_spd">
                            <property name="visible">True</property>
                            <property name="label" translatable="yes">Speechd</property>
                            <property name="use_underline">True</property>
                            <signal name="activate" handler="on_choose_spd"/>
                          </widget>
                        </child>
                      </widget>
                    </child>
                  </widget>
                </child>
                <child>
                  <widget class="GtkMenuItem" id="help_menu">
                    <property name="visible">True</property>
                    <property name="label" translatable="yes">_Help</property>
                    <property name="use_underline">True</property>
                    <child>
                      <widget class="GtkMenu" id="help_menu_menu">
                        <child>
                          <widget class="GtkImageMenuItem" id="about">
                            <property name="visible">True</property>
                            <property name="label" translatable="yes">_About</property>
                            <property name="use_underline">True</property>
                            <signal name="activate" handler="on_about"/>
                            <child internal-child="image">
                              <widget class="GtkImage" id="image4">
                                <property name="visible">True</property>
                                <property name="stock">gtk-about</property>
                                <property name="icon_size">1</property>
                              </widget>
                            </child>
                          </widget>
                        </child>
                      </widget>
                    </child>
                  </widget>
                </child>
              </widget>
              <packing>
                <property name="expand">False</property>
                <property name="fill">False</property>
                <property name="padding">2</property>
              </packing>
            </child>
            <child>
              <widget class="GtkScrolledWindow" id="scrolled">
                <property name="visible">True</property>
                <property name="can_focus">True</property>
                <property name="hscrollbar_policy">GTK_POLICY_AUTOMATIC</property>
                <property name="vscrollbar_policy">GTK_POLICY_AUTOMATIC</property>
                <property name="shadow_type">GTK_SHADOW_IN</property>
                <child>
                  <widget class="GtkTextView" id="text">
                    <property name="visible">True</property>
                    <property name="can_focus">True</property>
                    <property name="wrap_mode">GTK_WRAP_WORD</property>
                    <property name="left_margin">4</property>
                    <property name="right_margin">4</property>
                  </widget>
                </child>
              </widget>
              <packing>
                <property name="padding">2</property>
                <property name="position">1</property>
              </packing>
            </child>
            <child>
              <widget class="GtkHBox" id="hbox1">
                <property name="visible">True</property>
                <child>
                  <widget class="GtkButton" id="stop_button">
                    <property name="visible">True</property>
                    <property name="can_focus">True</property>
                    <property name="response_id">0</property>
                    <signal name="clicked" handler="on_stop"/>
                    <child>
                      <widget class="GtkAlignment" id="alignment1">
                        <property name="visible">True</property>
                        <property name="xscale">0</property>
                        <property name="yscale">0</property>
                        <child>
                          <widget class="GtkHBox" id="hbox2">
                            <property name="visible">True</property>
                            <property name="spacing">2</property>
                            <child>
                              <widget class="GtkImage" id="image1">
                                <property name="visible">True</property>
                                <property name="stock">gtk-no</property>
                              </widget>
                              <packing>
                                <property name="expand">False</property>
                                <property name="fill">False</property>
                              </packing>
                            </child>
                            <child>
                              <widget class="GtkLabel" id="label1">
                                <property name="visible">True</property>
                                <property name="label" translatable="yes">Stop</property>
                                <property name="use_underline">True</property>
                              </widget>
                              <packing>
                                <property name="expand">False</property>
                                <property name="fill">False</property>
                                <property name="position">1</property>
                              </packing>
                            </child>
                          </widget>
                        </child>
                      </widget>
                    </child>
                  </widget>
                  <packing>
                    <property name="padding">2</property>
                  </packing>
                </child>
                <child>
                  <widget class="GtkButton" id="speak_button">
                    <property name="visible">True</property>
                    <property name="can_focus">True</property>
                    <property name="response_id">0</property>
                    <signal name="clicked" handler="on_speak"/>
                    <child>
                      <widget class="GtkAlignment" id="alignment2">
                        <property name="visible">True</property>
                        <property name="xscale">0</property>
                        <property name="yscale">0</property>
                        <child>
                          <widget class="GtkHBox" id="hbox3">
                            <property name="visible">True</property>
                            <property name="spacing">2</property>
                            <child>
                              <widget class="GtkImage" id="image2">
                                <property name="visible">True</property>
                                <property name="stock">gtk-apply</property>
                              </widget>
                              <packing>
                                <property name="expand">False</property>
                                <property name="fill">False</property>
                              </packing>
                            </child>
                            <child>
                              <widget class="GtkLabel" id="label2">
                                <property name="visible">True</property>
                                <property name="label" translatable="yes">Speak</property>
                                <property name="use_underline">True</property>
                              </widget>
                              <packing>
                                <property name="expand">False</property>
                                <property name="fill">False</property>
                                <property name="position">1</property>
                              </packing>
                            </child>
                          </widget>
                        </child>
                      </widget>
                    </child>
                  </widget>
                  <packing>
                    <property name="padding">2</property>
                    <property name="position">1</property>
                  </packing>
                </child>
              </widget>
              <packing>
                <property name="expand">False</property>
                <property name="fill">False</property>
                <property name="padding">2</property>
                <property name="position">2</property>
              </packing>
            </child>
          </widget>
        </child>
      </widget>
      <widget class="GtkAboutDialog" id="about_dialog">
        <property name="visible">True</property>
        <property name="destroy_with_parent">True</property>
        <property name="type_hint">GDK_WINDOW_TYPE_HINT_NORMAL</property>
        <property name="copyright" translatable="yes">MonkeeSage, 2007</property>
        <property name="comments" translatable="yes">Simple GTK+ frontend to spd-say (part of Speech-Dispatcher).</property>
        <property name="license" translatable="yes">GSpeak 1.0
    Copyright (C) 2007 MonkeeSage
    
    This program is free software; you can redistribute it and/or
    modify it under the terms of the GNU General Public License
    as published by the Free Software Foundation; either version 2
    of the License, or (at your option) any later version.
    
    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.
    
    You should have received a copy of the GNU General Public License
    along with this program; if not, write to the Free Software
    Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.</property>
        <property name="authors">Jordan Callicoat &lt;MonkeeSage@gmail.com&gt;</property>
        <property name="translator_credits" translatable="yes" comments="TRANSLATORS: Replace this string with your names, one name per line.">translator-credits</property>
        <child internal-child="vbox">
          <widget class="GtkVBox" id="dialog-vbox1">
            <child internal-child="action_area">
              <widget class="GtkHButtonBox" id="dialog-action_area1">
              </widget>
              <packing>
                <property name="expand">False</property>
                <property name="pack_type">GTK_PACK_END</property>
              </packing>
            </child>
          </widget>
        </child>
      </widget>
      <widget class="GtkFileChooserDialog" id="filechooser_dialog">
        <property name="visible">True</property>
        <property name="title" translatable="yes">Open File....</property>
        <property name="modal">True</property>
        <property name="window_position">GTK_WIN_POS_CENTER</property>
        <property name="destroy_with_parent">True</property>
        <property name="type_hint">GDK_WINDOW_TYPE_HINT_DIALOG</property>
        <property name="show_hidden">True</property>
        <child internal-child="vbox">
          <widget class="GtkVBox" id="dialog-vbox1">
            <property name="visible">True</property>
            <property name="spacing">24</property>
            <child internal-child="action_area">
              <widget class="GtkHButtonBox" id="dialog-action_area1">
                <property name="visible">True</property>
                <property name="layout_style">GTK_BUTTONBOX_END</property>
                <child>
                  <widget class="GtkButton" id="button1">
                    <property name="visible">True</property>
                    <property name="can_focus">True</property>
                    <property name="can_default">True</property>
                    <property name="label">gtk-cancel</property>
                    <property name="use_stock">True</property>
                    <property name="response_id">-6</property>
                  </widget>
                </child>
                <child>
                  <widget class="GtkButton" id="button2">
                    <property name="visible">True</property>
                    <property name="can_focus">True</property>
                    <property name="can_default">True</property>
                    <property name="has_default">True</property>
                    <property name="label">gtk-open</property>
                    <property name="use_stock">True</property>
                    <property name="response_id">-5</property>
                  </widget>
                  <packing>
                    <property name="position">1</property>
                  </packing>
                </child>
              </widget>
              <packing>
                <property name="expand">False</property>
                <property name="pack_type">GTK_PACK_END</property>
              </packing>
            </child>
          </widget>
        </child>
      </widget>
    </glade-interface>
    [People] are usually satisfied with bad argument only when their convictions rest on other grounds. (John Oman, Grace and Personality [New York: Macmillan, 1925], p. 38).

  8. #8
    Join Date
    Jun 2007
    Location
    Bochum, Germany
    Beans
    23
    Distro
    Kubuntu 9.10 Karmic Koala

    Re: HOWTO: Make festival TTS use better voices (MBROLA / CMU / HTS)

    Thanks for this excellent post, it made my life a hell of a lot easier
    Still got some questions, i would like to install a female voice package, i have so far only found male voices... don't they think that there are men who give their workstations female names and would like to give it a female voice?
    Second question is: is there a german voice package as well? i've seen finnish, french, english but no german, i will really appreciate it if you lead me in the right direction.
    Thanks a lot again.

  9. #9
    Join Date
    Sep 2007
    Location
    Sweden
    Beans
    14
    Distro
    Ubuntu 8.10 Intrepid Ibex

    Re: HOWTO: Make festival TTS use better voices (MBROLA / CMU / HTS)

    Yup, MonkeeSage. I had an older version "1.4.2" which I installed from apt-get , from Kubuntu feisty. I have the latest tar.gz packages from festiva'ls website, http://festvox.org/. I need to compile them. Do you know any good how to compile it, I know the INSTALL documents decribes everything, I have problems compiling the "speech tools".

    gcc -O3 -Wall -o ch_lab ch_lab_main.o -L../lib -lestools -L../lib -lestbase -L../lib -leststring -lcurses -ldl -lncurses -lm -lstdc++ -lgcc
    /usr/bin/ld: cannot find -lcurses
    collect2: ld returned 1 exit status
    make[1]: *** [ch_lab] Error 1
    Thanks again for the how to!

    Cheers!

  10. #10
    Join Date
    Oct 2007
    Beans
    130

    Re: HOWTO: Make festival TTS use better voices (MBROLA / CMU / HTS)

    Quote Originally Posted by Cyberponcho View Post
    Thanks for this excellent post, it made my life a hell of a lot easier
    Still got some questions, i would like to install a female voice package, i have so far only found male voices... don't they think that there are men who give their workstations female names and would like to give it a female voice?
    Second question is: is there a german voice package as well? i've seen finnish, french, english but no german, i will really appreciate it if you lead me in the right direction.
    Thanks a lot again.
    Well, it depends on which voices you're using, there are female voices in the CMU and HTS voices (see their information page links in the HOWTO for which voices, I believe that the *_stl_* voice is female and there is another one or maybe two IIRC).

    As for German, I believe the MBROLA page has a German voice, see the link from the HOWTO. Installing it should be just like installing the English MBROLA voices. Let me know if you need more help.


    Quote Originally Posted by Debuggern View Post
    Yup, MonkeeSage. I had an older version "1.4.2" which I installed from apt-get , from Kubuntu feisty. I have the latest tar.gz packages from festiva'ls website, http://festvox.org/. I need to compile them. Do you know any good how to compile it, I know the INSTALL documents decribes everything, I have problems compiling the "speech tools".

    [. . .]

    Thanks again for the how to!

    Cheers!
    I totally forgot to check what versions shipped with stable Ubuntu!

    There's no backports for festival 1.96. I'll add a section to the HOWTO for compiling from source.
    [People] are usually satisfied with bad argument only when their convictions rest on other grounds. (John Oman, Grace and Personality [New York: Macmillan, 1925], p. 38).

Page 1 of 19 12311 ... LastLast

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •