Results 1 to 9 of 9

Thread: sed command - help please

  1. #1
    Join Date
    Sep 2008
    Location
    Real Ale Republic
    Beans
    143
    Distro
    Ubuntu Mate 20.04 Focal Fossa

    sed command - help please

    I have been trying to use the sed command to remove lines of what I suspect were Chinese or Japanese translation in text documents. These lines appear to consist mostly of non-ASCII characters like "Šš¥ß€FŠ"

    I was hoping that delete lines containing non-ASCII characters:

    sed -e '/^[:ascii:]/d' < test1.txt

    but this doesn't appear to do anything.

    I tried this:

    sed -e '/^[\x00 - \x7F]/d' < test1.txt

    but again, no luck.
    I realise that this is probably because it is finding some ASCII characters in the lines I wish to discard?

    Any ideas?

  2. #2
    Join Date
    Jan 2009
    Beans
    Hidden!

    Re: sed command - help please

    Are you trying to delete all lines that have one or more non-ascii characters in them? In that case, I think you want the '^' symbol inside the '[...]' expression, as in:
    Code:
    sed -e '/[^:ascii:]/d'
    Outside of the character class, the '^' matches on start-of-line. If it's the first character inside, it inverts the specified character class.

  3. #3
    Join Date
    Sep 2006
    Beans
    8,627
    Distro
    Ubuntu 14.04 Trusty Tahr

    awk or grep

    Maybe awk or egrep would be more appropriate.

    Do you want to keep the line, but leave it empty, or eliminate the offending line completely?

  4. #4
    Join Date
    Jan 2010
    Location
    Lillehammer, Norway
    Beans
    23
    Distro
    Kubuntu 9.10 Karmic Koala

    Re: sed command - help please

    Try 'tr'... perhaps with the -d option.
    Note: It must be *in* the pipeline, it doesn't take file-arguments itself (use 'cat' first).

  5. #5
    Join Date
    Sep 2008
    Location
    Real Ale Republic
    Beans
    143
    Distro
    Ubuntu Mate 20.04 Focal Fossa

    Re: sed command - help please

    Thanks for the replies.

    Quote Originally Posted by Brandon Williams View Post
    Are you trying to delete all lines that have one or more non-ascii characters in them?
    Yes.

    Quote Originally Posted by Brandon Williams View Post
    In that case, I think you want the '^' symbol inside the '[...]' expression, as in:
    Code:
    sed -e '/[^:ascii:]/d'
    Outside of the character class, the '^' matches on start-of-line. If it's the first character inside, it inverts the specified character class.
    I tried

    sed -e '/[^:ascii:]/d' < test1.txt

    and there is no output - it seems to delete every line...



    Quote Originally Posted by Lars Noodén
    Do you want to keep the line, but leave it empty, or eliminate the offending line completely?
    I want to remove it completely.

    ETA
    I have also tried:

    sed -e '/[\x80 - \xFF]/d' < test1.txt

    hoping that this would catch "extended" ASCII characters, but this seems to delete all the lines I actually want!

    This seems to be a typical "simple 5-minute job"!

    I'll have a look at awk, egrep and tr.
    Last edited by Timothy Taylor; February 4th, 2010 at 02:27 PM.

  6. #6
    Join Date
    May 2008
    Beans
    Hidden!

    Re: sed command - help please

    Quote Originally Posted by Timothy Taylor View Post
    Thanks for the replies.

    Yes.

    I tried

    sed -e '/[^:ascii:]/d' < test1.txt

    and there is no output - it seems to delete every line...
    :confused:

    I want to remove it completely.
    You almost have it, try:
    Code:
    sed -e '/[^[:print:]]/d' test1.txt
    [:print:] is for printable characters and there's no need for the < character.

  7. #7
    Join Date
    Sep 2008
    Location
    Real Ale Republic
    Beans
    143
    Distro
    Ubuntu Mate 20.04 Focal Fossa

    Re: sed command - help please

    Quote Originally Posted by mobilediesel View Post
    You almost have it, try:
    Code:
    sed -e '/[^[:print:]]/d' test1.txt
    [rint:] is for printable characters and there's no need for the < character.
    No, that doesn't work either - no output.

  8. #8
    Join Date
    May 2008
    Beans
    Hidden!

    Re: sed command - help please

    Quote Originally Posted by Timothy Taylor View Post
    No, that doesn't work either - no output.
    That suggests that the file is all one line or every line contains non-printable characters.

  9. #9
    Join Date
    Sep 2008
    Location
    Real Ale Republic
    Beans
    143
    Distro
    Ubuntu Mate 20.04 Focal Fossa

    Re: sed command - help please

    Aye, something weird is going on...

    I tried

    sed -e '/[:ascii:]/d' test1.txt

    and was surprised to get a list of paragraph numbers and some lines of the garbage text out.

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •