Page 3 of 3 FirstFirst 123
Results 21 to 29 of 29

Thread: cat Japanese text file

  1. #21
    Join Date
    May 2012
    Location
    ザ・ワ&
    Beans
    152
    Distro
    Xubuntu 12.04 Precise Pangolin

    Re: cat Japanese text file

    Hmm... yes I just ran my example on a dupe oto.ini so as not to ruin the real thing if it goes horridly wrong >.< and it would have, lol.
    There are a number of Japanese characters whose romaji equivalent would match a few search strings, such as
    Code:
    sha.wav=,0,0,0,0,0
    matches all three of these search strings:
    Code:
    s/a.wav=,/a.wav=あ,/g
    s/ha.wav=,/ha.wav=は,/g
    s/sha.wav=,/sha.wav=しゃ,/g
    So how could confusion be prevented in this matter?

    [NOTE]
    The reverse is not totally true, the same example (but instead from hiragana to romaji) would not have many issues:
    Code:
    しゃ.wav=,0,0,0,0,0

    does not match
    Code:
    s/あ.wav=,/あ.wav=a,/g
    s/は.wav=,/は.wav=ha,/g
    s/しゃ.wav=,/しゃ.wav=sha,/g
    Though they are the same in reverse.
    Last edited by ntzrmtthihu777; November 19th, 2012 at 01:38 AM.

  2. #22
    Join Date
    Jul 2007
    Location
    Poland
    Beans
    4,499
    Distro
    Ubuntu 14.04 Trusty Tahr

    Re: cat Japanese text file

    add ^ to your regex to signal you want to match from start, not anywhere

    did you try my script? finetuning s/// expressions by hand for few hundred pairs would be ungodly tedious.
    if your question is answered, mark the thread as [SOLVED]. Thx.
    To post code or command output, use [code] tags.
    Check your bash script here // BashFAQ // BashPitfalls

  3. #23
    Join Date
    May 2012
    Location
    ザ・ワ&
    Beans
    152
    Distro
    Xubuntu 12.04 Precise Pangolin

    Re: cat Japanese text file


    No, I have not tried it yet, in the middle of a couple of things, >.<
    I actually intend to use your original awk script to generate the s/// expressions for me, as it can easily be done with:
    Code:
    $ cat syn.txt a あ i い u う e え o お $ awk '{ printf("s/%s.wav=,/%s.wav=%s,/g\n", $1, $1, $2); }' syn.txt $ awk '{ printf("s/%s.wav=,/%s.wav=%s,/g\n", $2, $2, $1); }' syn.txt


    If my grasp of awk is correct this should give me the expressions I need, correct?
    Also, where exactly would the ^ go? still a bit of a linux n00b, but loving every thing I learn, lol.

  4. #24
    Join Date
    Jul 2007
    Location
    Poland
    Beans
    4,499
    Distro
    Ubuntu 14.04 Trusty Tahr

    Re: cat Japanese text file

    oh ok, my script uses pure bash to generate but the result will be pretty much identical.

    Code:
    printf("s/^%s.wav=,/%s.wav=%s,/g\n", $1, $1, $2);
    if your question is answered, mark the thread as [SOLVED]. Thx.
    To post code or command output, use [code] tags.
    Check your bash script here // BashFAQ // BashPitfalls

  5. #25
    Join Date
    May 2012
    Location
    ザ・ワ&
    Beans
    152
    Distro
    Xubuntu 12.04 Precise Pangolin

    Re: cat Japanese text file

    Quote Originally Posted by Vaphell View Post
    oh ok, my script uses pure bash to generate but the result will be pretty much identical.

    Code:
    printf("s/^%s.wav=,/%s.wav=%s,/g\n", $1, $1, $2);

    Ah, thank you so much! I basically want it the way I was describing (two files you use for sed) to mirror an existing Windows alias tool in order to be more familiar to UTAU users. Many thanks, my friend.
    Technomancy
    The old ways are not the only ways. We study the mysteries of laser and circuit, crystal and scanner. Holographic daemons and invocations of equations. These are the tools we employ, and we know many things

  6. #26
    Join Date
    May 2012
    Location
    ザ・ワ&
    Beans
    152
    Distro
    Xubuntu 12.04 Precise Pangolin

    Re: cat Japanese text file

    Having an issue with my script and yours, Vaphell.
    The iconv is returning an error message, yours at position 6 and mine at 78** or some such number. Any tips?
    Technomancy
    The old ways are not the only ways. We study the mysteries of laser and circuit, crystal and scanner. Holographic daemons and invocations of equations. These are the tools we employ, and we know many things

  7. #27
    Join Date
    Jul 2007
    Location
    Poland
    Beans
    4,499
    Distro
    Ubuntu 14.04 Trusty Tahr

    Re: cat Japanese text file

    i guess the problem is that the original file is in SHIFT_JIS and the script doesn't take that into account

    try this:
    Code:
    iconv -f SHIFT_JIS -t UTF-8 oto.txt | sed -f sed1.txt | iconv -f UTF-8 -t SHIFT_JIS > output1.txt
    Last edited by Vaphell; November 19th, 2012 at 06:47 AM.
    if your question is answered, mark the thread as [SOLVED]. Thx.
    To post code or command output, use [code] tags.
    Check your bash script here // BashFAQ // BashPitfalls

  8. #28
    Join Date
    May 2012
    Location
    ザ・ワ&
    Beans
    152
    Distro
    Xubuntu 12.04 Precise Pangolin

    Re: cat Japanese text file

    Quote Originally Posted by Vaphell View Post
    i guess the problem is that the original file is in SHIFT_JIS and the script doesn't take that into account

    try this:
    Code:
    iconv -f SHIFT_JIS -t UTF-8 oto.txt | sed -f sed1.txt | iconv -f UTF-8 -t SHIFT_JIS > output1.txt

    Ah, just figured part of it out, lol. Mine stopped later because I was running it against "test", an oto.ini I removed all (or so I thought) the aliases from (apparently I missed one or two) but yours was targeting the raw oto.ini which still had aliases and stopped short on the first one due to its あ alias. Also the $a and $b orders were wrong, so even when I cleaned out all the aliases it did nothing, lol. Should have been:

    Code:
    while read -r a b
    do
      echo "s/^$b[.]wav=[^,]*,/$b.wav=$a,/"
    done < syn.txt > hira_roma
    
    while read -r a b
    do
      echo "s/^$a[.]wav=[^,]*,/$a.wav=$b,/"
    done < syn.txt > roma_hira
    Technomancy
    The old ways are not the only ways. We study the mysteries of laser and circuit, crystal and scanner. Holographic daemons and invocations of equations. These are the tools we employ, and we know many things

  9. #29
    Join Date
    May 2012
    Location
    ザ・ワ&
    Beans
    152
    Distro
    Xubuntu 12.04 Precise Pangolin

    Re: cat Japanese text file

    Aha, absolutely done! I can now do as needed, thank you so much for your help Vaphell!
    Last edited by ntzrmtthihu777; November 25th, 2012 at 04:42 AM.
    Technomancy
    The old ways are not the only ways. We study the mysteries of laser and circuit, crystal and scanner. Holographic daemons and invocations of equations. These are the tools we employ, and we know many things

Page 3 of 3 FirstFirst 123

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •