Results 1 to 6 of 6

Thread: regex to match lines containing one or more characters

  1. #1
    Join Date
    Apr 2020
    Location
    where the work takes me
    Beans
    243
    Distro
    Ubuntu 20.04 Focal Fossa

    regex to match lines containing one or more characters

    Hello all,

    I'm trying to write a sed expression which will insert a tab at the start of each line which has some content, i.e I don't want to simply insert a tab at the beginning of every line. I've tried the following but it doesn't seem to match...
    Code:
    sed s/^.+$/\\t&/g [input file]
    There seems to be two problems, first as mentioned the regex doesn't match anything, and also according to the sed info pages...
    Code:
    the REPLACEMENT can contain unescaped '&' characters which reference the whole matched portion of the pattern space
    but when I try this as in my example above, it results in the subsequent "/g" being interpreted as a filename

    Any pointers would be much appreciated!

  2. #2
    Join Date
    Mar 2011
    Location
    U.K.
    Beans
    Hidden!
    Distro
    Ubuntu 20.04 Focal Fossa

    Re: regex to match lines containing one or more characters

    There is a tool named SearchMonkey in Ubuntu.
    In Advanced tab there is a RegEx Expression Builder.
    You can use the builder to test regex expressions on a sample file.

  3. #3
    Join Date
    Mar 2010
    Location
    Squidbilly-Land
    Beans
    Hidden!
    Distro
    Ubuntu

    Re: regex to match lines containing one or more characters

    The requirements as written are ambiguous.
    Code:
    This matches
       This matches
         
       # this matches
    #$ this matches
    Line 3 has spaces which can be considered "content", so it gets a tab added too.
    The last line is empty, but exists. No tab is added.

    Must the 1st column always have a non-space character or not?
    If it was me and I didn't have perl, I'd use a 2-stage solution.
    1. add the tab the the beginning of every line -e s/^/\t/g
    2. remove the tab on all lines that don't have any other content -e s/\t$//g


    That would look like:
    Code:
    $ sed -e 's/^/\t/g' -e 's/^\t$//g' {input_file}
    I tested the above input file with that and it appeared to work to me.

    With perl, we can do smarter matching and grouping so the prior match isn't just removed, but captured for use later. Very handy when we need to swap columns in code. Of course, swapping columns is trivial in vim.

    If you want to include whitespace in the definition of content, but only at the front of the line. Then questions about the requirements still exist. In perl, a little function that removes all leading and trailing whitespace is handy:
    Code:
     $line =~ s/^\s+|\s+$//go;
    I don't think sed supports \s. Maybe there is a -E options for full regex support in sed? Seems so from the sed manpage:
    Code:
           -E, -r, --regexp-extended
    
                  use extended regular expressions in the script (for portability use
                  POSIX -E).
    Last edited by TheFu; February 18th, 2021 at 02:38 AM. Reason: added code-tag

  4. #4
    Join Date
    Aug 2010
    Location
    Lancs, United Kingdom
    Beans
    1,573
    Distro
    Ubuntu Mate 16.04 Xenial Xerus

    Re: regex to match lines containing one or more characters

    Quote Originally Posted by jcdenton1995 View Post
    Code:
    sed s/^.+$/\\t&/g [input file]
    There are 2 problems with that.
    1. & has meaning to the shell. An incomplete sed command is backgrounded then another command beginning with /g is attempted to be executed. You need to prevent the shell from interpreting the &.
    2. + only has the meaning of "one or more" if the -E switch is used.
    Hence:
    Code:
    sed -E 's/^.+$/\t&/g' [input file]

  5. #5
    Join Date
    Apr 2020
    Location
    where the work takes me
    Beans
    243
    Distro
    Ubuntu 20.04 Focal Fossa

    Exclamation Re: regex to match lines containing one or more characters

    Quote Originally Posted by TheFu View Post
    Must the 1st column always have a non-space character or not?
    In a manner, I'm just trying to find a way of applying indentation to all the lines within a given range that actually contain something, there are also lines that contain nothing at all (no white spaces, tabs or the like) which I don't want to prepend with a tab. Mainly because it's messy and not really the correct way I guess.
    If it was me and I didn't have perl, I'd use a 2-stage solution.
    1. add the tab the the beginning of every line -e s/^/\t/g
    2. remove the tab on all lines that don't have any other content -e s/\t$//g


    That would look like:
    $ sed -e 's/^/\t/g' -e 's/^\t$//g' {input_file}
    I had to chuckle at this because I never considered doing it in two stages I'll give it a try tomorrow when I'm not falling asleep. As for the Perl, I'll steer clear of that for now, I don't think I'm equipped to learn regular expressions and Perl at the same time! Thanks.

  6. #6
    Join Date
    Apr 2020
    Location
    where the work takes me
    Beans
    243
    Distro
    Ubuntu 20.04 Focal Fossa

    Re: regex to match lines containing one or more characters

    Quote Originally Posted by spjackson View Post
    There are 2 problems with that.
    1. & has meaning to the shell. An incomplete sed command is backgrounded then another command beginning with /g is attempted to be executed. You need to prevent the shell from interpreting the &.
    2. + only has the meaning of "one or more" if the -E switch is used.
    Hence:
    Code:
    sed -E 's/^.+$/\t&/g' [input file]
    Thanks, good to know about the -E (for extended regular expressions?). Also confusing is how the sed documentation says the '&' can be unescaped, but I suppose that might be like saying that as far as 'sed' is concerned it can be unescaped, but it makes no assumptions about which shell you are using and how that will interpret the ampersand.

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •