Page 2 of 3 FirstFirst 123 LastLast
Results 11 to 20 of 29

Thread: cat Japanese text file

  1. #11
    Join Date
    May 2012
    Location
    ザ・ワ&
    Beans
    152
    Distro
    Xubuntu 12.04 Precise Pangolin

    Re: cat Japanese text file

    Quote Originally Posted by Vaphell View Post
    not that i converted anything in my life using that tool, but short test shows that it merely prints out the converted content and the original file stays intact.

    so what do you want to achieve exactly with these synonyms? you want to create symlinks pointing to real .wav files or what?
    Basically the way it works is you place notes, either hiragana or romaji (Roman letters) on the screen, and tell it what voicebank (collection of *.wav files and oto.ini) to use. If you place hiragana notes but use a voicebank with romaji *.wav files, a properly aliased oto.ini should point あ to a.wav (and same for romaji notes and a hiragana voicebank, points a to あ.wav), no need to create Linux symlinks.

    The reason I needed to figure this out proper is I want to create a automatic aliaser script, probably using sed. See, a good voicebank can have an upwards of 300 lines, and manually aliasing them all is quite tedious. There are already windows batch files that do this, but I am developing tools for UTAU users on ubuntu. Wrote a tutorial on how to install UTAU (a Japanese windows .exe) on a non-Japanese ubuntu install, and how to properly unzip voicebanks with Japanese folder and/or file names (using the default file-roller or archive mounter gives gibberish names).

  2. #12
    Join Date
    Jul 2007
    Location
    Poland
    Beans
    4,243
    Distro
    Ubuntu 10.04 Lucid Lynx

    Re: cat Japanese text file

    so what's the input for that automatic aliaser script and what's the expected output?
    if your question is answered, mark the thread as [SOLVED]. Thx.
    To post code or command output, use [code] tags.
    Check your bash script here // BashFAQ // BashPitfalls

  3. #13
    Join Date
    May 2012
    Location
    ザ・ワ&
    Beans
    152
    Distro
    Xubuntu 12.04 Precise Pangolin

    Re: cat Japanese text file

    Input 1 (going to have 2, I think. One to do this)
    Code:
    あ.wav=,0,0,0,0,0
    い.wav=,0,0,0,0,0
    う.wav=,0,0,0,0,0
    え.wav=,0,0,0,0,0
    お.wav=,0,0,0,0,0
    Output 1
    Code:
    あ.wav=a,0,0,0,0,0
    い.wav=i,0,0,0,0,0
    う.wav=u,0,0,0,0,0
    え.wav=e,0,0,0,0,0
    お.wav=o,0,0,0,0,0
    Input 2 (and one to do this)
    Code:
    a.wav=,0,0,0,0,0
    i.wav=,0,0,0,0,0
    u.wav=,0,0,0,0,0
    e.wav=,0,0,0,0,0
    o.wav=,0,0,0,0,0
    Output 2

    Code:
    a.wav=あ,0,0,0,0,0
    i.wav=い,0,0,0,0,0
    u.wav=う,0,0,0,0,0
    e.wav=え,0,0,0,0,0
    o.wav=お,0,0,0,0,0
    I think
    Code:
    sed 's/a.wav=/a.wav=あ/'
    and so on would do the trick, make a sedscript with all the equivalents and run
    Code:
    sed -f sedscript oto.ini
    would be what is called for, yes?

    This may not seem to be too much work to do manually, but I am only showing a small portion of what a full oto.ini file would contain. As I said, a good bank would have at least 120-ish just to cover Japanese syllables, and a few are multilingual, so lists of over 300 are not uncommon.
    Last edited by ntzrmtthihu777; November 19th, 2012 at 12:00 AM. Reason: syntax

  4. #14
    Join Date
    Jul 2007
    Location
    Poland
    Beans
    4,243
    Distro
    Ubuntu 10.04 Lucid Lynx

    Re: cat Japanese text file

    but i don't see from where the script should get the info that a=あ, i=い, etc
    if you had a tidy list of synonyms, it would be rather easy to generate these files.
    if your question is answered, mark the thread as [SOLVED]. Thx.
    To post code or command output, use [code] tags.
    Check your bash script here // BashFAQ // BashPitfalls

  5. #15
    Join Date
    May 2012
    Location
    ザ・ワ&
    Beans
    152
    Distro
    Xubuntu 12.04 Precise Pangolin

    Re: cat Japanese text file

    Quote Originally Posted by Vaphell View Post
    but i don't see from where the script should get the info that a=あ, i=い, etc
    if you had a tidy list of synonyms, it would be rather easy to generate these files.
    I was thinking along these lines, have 2 files:

    hira_roma containing:
    Code:
    s/あ.wav=/あ.wav=a/g
    s/い.wav=/い.wav=i/g
    s/う.wav=/う.wav=u/g
    s/え.wav=/え.wav=e/g
    s/お.wav=/お.wav=o/g
    ...
    roma_hira containing:
    Code:
    s/a.wav=/a.wav=あ/g
    s/i.wav=/i.wav=い/g
    s/u.wav=/u.wav=う/g
    s/e.wav=/e.wav=え/g
    s/o.wav=/o.wav=お/g
    ...
    And running
    Code:
    sed -f roma_hira oto.ini
    or
    Code:
    sed -f hira_roma oto.ini
    as needed, assuming oto.ini is in the same dir.

  6. #16
    Join Date
    Jul 2007
    Location
    Poland
    Beans
    4,243
    Distro
    Ubuntu 10.04 Lucid Lynx

    Re: cat Japanese text file

    awk would be much better

    consider this example:
    Code:
    $ cat syn.txt 
    a あ
    i い
    u う
    e え
    o お
    $ awk '{ printf("%s.wav=%s,0,0,0,0,0\n", $1, $2); }' syn.txt
    a.wav=あ,0,0,0,0,0
    i.wav=い,0,0,0,0,0
    u.wav=う,0,0,0,0,0
    e.wav=え,0,0,0,0,0
    o.wav=お,0,0,0,0,0
    $ awk '{ printf("%s.wav=%s,0,0,0,0,0\n", $2, $1); }' syn.txt
    あ.wav=a,0,0,0,0,0
    い.wav=i,0,0,0,0,0
    う.wav=u,0,0,0,0,0
    え.wav=e,0,0,0,0,0
    お.wav=o,0,0,0,0,0
    pipe to iconv to convert to SHIFT_JIS, dump the result to a file and it's done

    even pure bash can do it:
    Code:
    $ while read -r a b; do echo "$a.wav=$b,0,0,0,0,0"; done < syn.txt
    a.wav=あ,0,0,0,0,0
    i.wav=い,0,0,0,0,0
    u.wav=う,0,0,0,0,0
    e.wav=え,0,0,0,0,0
    o.wav=お,0,0,0,0,0
    $ while read -r a b; do echo "$b.wav=$a,0,0,0,0,0"; done < syn.txt
    あ.wav=a,0,0,0,0,0
    い.wav=i,0,0,0,0,0
    う.wav=u,0,0,0,0,0
    え.wav=e,0,0,0,0,0
    お.wav=o,0,0,0,0,0
    Last edited by Vaphell; November 19th, 2012 at 12:18 AM.
    if your question is answered, mark the thread as [SOLVED]. Thx.
    To post code or command output, use [code] tags.
    Check your bash script here // BashFAQ // BashPitfalls

  7. #17
    Join Date
    May 2012
    Location
    ザ・ワ&
    Beans
    152
    Distro
    Xubuntu 12.04 Precise Pangolin

    Re: cat Japanese text file

    Very interesting... I have used awk for a few personal projects, very nice use here. But, I have just considered what may be a hitch using my old scheme and was about to post it, but then I saw yours and it may have a similar problem...

    Suppose this oto.ini is already partially aliased, say:


    Code:
     
    a.wav=あ,0,0,0,0,0
    i.wav=,0,0,0,0,0
    u.wav=う,0,0,0,0,0
    e.wav=,0,0,0,0,0
    o.wav=お,0,0,0,0,0
    Wouldn't using either of our scripts give us

    Code:
     
    a.wav=ああ,0,0,0,0,0
    i.wav= い,0,0,0,0,0
    u.wav=うう ,0,0,0,0,0
    e.wav= え,0,0,0,0,0
    o.wav= おお,0,0,0,0,0

    ?
    I was thinking extending the sed to:
    Code:
    s/a.wav=*,/a.wav=あ,/g
    unless your know of a better solution.

    Also, an actual oto.ini would have numbers other than 0 depending on the frequency of the sound, length of the consonant or vowel, and other info, so merely creating them out of thin air with the awk example would only be useful when creating a brand new bank which default to ,0,0,0,0,0 or ,,,,, anyway. And each bank has its own more or less unique oto.ini. Creating the initial one is done by the program itself, its just aliasing that takes a bit. I am looking to modify an existing oto.ini, sorry if I was not 100% clear on that from the start.

    Last edited by ntzrmtthihu777; November 19th, 2012 at 12:45 AM.

  8. #18
    Join Date
    Jul 2007
    Location
    Poland
    Beans
    4,243
    Distro
    Ubuntu 10.04 Lucid Lynx

    Re: cat Japanese text file

    my code created the output from scratch using template line
    PUT_STUFF_HERE.wav=PUT_STUFF_HERE,0,0,0,0,0
    but those changing numbers you speak of make that approach go out the window

    duplicated symbols you mentioned are easy to fix - simply strip anything between = and , before doing substitutions.

    can you give few example lines of real data (or even full oto.ini) so i can get full picture
    Last edited by Vaphell; November 19th, 2012 at 12:39 AM.
    if your question is answered, mark the thread as [SOLVED]. Thx.
    To post code or command output, use [code] tags.
    Check your bash script here // BashFAQ // BashPitfalls

  9. #19
    Join Date
    May 2012
    Location
    ザ・ワ&
    Beans
    152
    Distro
    Xubuntu 12.04 Precise Pangolin

    Re: cat Japanese text file

    Code:
    a.wav=‚ ,54,105,348,36,17 
    ad.wav=,8,133,155,79,39 
    ah.wav=,80,97,318,39,14 
    ai.wav=,64,132,284,57,31 
    al.wav=,303,109,152,33,10 
    all.wav=,270,118,168,32,18 
    am.wav=,27,74,19,23,11 
    an.wav=,63,75,307,26,10 
    and.wav=,209,80,297,33,13 
    ang.wav=,110,74,352,24,11
    A few of these will not have a hiragana equivalent, but again some of these banks are designed to be multilingual.
    Attached Files Attached Files
    Last edited by ntzrmtthihu777; November 19th, 2012 at 12:42 AM.

  10. #20
    Join Date
    Jul 2007
    Location
    Poland
    Beans
    4,243
    Distro
    Ubuntu 10.04 Lucid Lynx

    Re: cat Japanese text file

    i think sed -f is a good approach but i'd generate these sed files too, based on the clean list to make it easier to introduce changes, should the need arise.

    Code:
    #!/bin/bash
    
    while read -r a b
    do
      echo "s/^$a[.]wav=[^,]*,/$a.wav=$b,/"
    done < syn.txt > sed1.txt
    
    while read -r a b
    do
      echo "s/^$a[.]wav=[^,]*,/$b.wav=$a,/"
    done < syn.txt > sed2.txt
    
    echo
    echo "Sed #1"
    sed -f sed1.txt oto.txt # | iconv -f UTF-8 -t SHIFT_JIS > output1.txt
    echo "Sed #2"
    sed -f sed2.txt oto.txt # | iconv -f UTF-8 -t SHIFT_JIS > output2.txt
    example, using trash data
    Code:
    $ cat syn.txt
    a XoXo
    ad !!!
    ah ###
    ai ===
    al ---
    all @@@
    am FFUU-
    an -_-
    and o.O
    ang >_<
    $ ./jp.sh 
    Sed #1
    a.wav=XoXo,54,105,348,36,17 
    ad.wav=!!!,8,133,155,79,39 
    ah.wav=###,80,97,318,39,14 
    ai.wav====,64,132,284,57,31 
    al.wav=---,303,109,152,33,10 
    all.wav=@@@,270,118,168,32,18 
    am.wav=FFUU-,27,74,19,23,11 
    an.wav=-_-,63,75,307,26,10 
    and.wav=o.O,209,80,297,33,13 
    ang.wav=>_<,110,74,352,24,11
    Sed #2
    XoXo.wav=a,54,105,348,36,17 
    !!!.wav=ad,8,133,155,79,39 
    ###.wav=ah,80,97,318,39,14 
    ===.wav=ai,64,132,284,57,31 
    ---.wav=al,303,109,152,33,10 
    @@@.wav=all,270,118,168,32,18 
    FFUU-.wav=am,27,74,19,23,11 
    -_-.wav=an,63,75,307,26,10 
    o.O.wav=and,209,80,297,33,13 
    >_<.wav=ang,110,74,352,24,11
    if your question is answered, mark the thread as [SOLVED]. Thx.
    To post code or command output, use [code] tags.
    Check your bash script here // BashFAQ // BashPitfalls

Page 2 of 3 FirstFirst 123 LastLast

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •