Page 2 of 3 FirstFirst 123 LastLast
Results 11 to 20 of 22

Thread: rename and regex

  1. #11
    Join Date
    Jul 2007
    Location
    Poland
    Beans
    4,499
    Distro
    Ubuntu 14.04 Trusty Tahr

    Re: rename and regex

    Code:
    $ rename -nv 's/.*([a-zA-Z][0-9]+[a-zA-Z][0-9]+).*[.]([^.]+)$/\L$1.$2/' really*
    really.long.filename.with.extra.stuff.A12B33.and.m ore.stuff.extension renamed as a12b33.extension
    if your question is answered, mark the thread as [SOLVED]. Thx.
    To post code or command output, use [code] tags.
    Check your bash script here // BashFAQ // BashPitfalls

  2. #12
    Join Date
    Dec 2007
    Beans
    716

    Re: rename and regex

    Quote Originally Posted by Vaphell View Post
    Code:
    $ rename -nv 's/.*([a-zA-Z][0-9]+[a-zA-Z][0-9]+).*[.]([^.]+)$/\L$1.$2/' really*
    really.long.filename.with.extra.stuff.A12B33.and.m ore.stuff.extension renamed as a12b33.extension
    Wow... regex may be REGULAR, but it makes my head hurt! OK, so...

    I understand the .*([a-zA-Z][0-9]+[a-zA-Z][0-9]+).* but why the next [.] after that? How does [^.]+ match the extension? If I understand regex correctly (and I do not!) [^.] matches any character that is not a . (which is confusing because ^ without the brackets indicates the start of the line! ). Would that not be every alphanumeric character including the extension?

    I promise I am not looking JUST to have this written for me. I really do want to understand it so I can do this on my own next time. Yeah, I know... rather presumptuous of me

    But I do appreciate all of the help!

  3. #13
    Join Date
    Jul 2007
    Location
    Poland
    Beans
    4,499
    Distro
    Ubuntu 14.04 Trusty Tahr

    Re: rename and regex

    [^.]+ = not-dot 1-or-more times
    general idea:
    anything(char-digits-char-digits)anything[dot](not-dots)end-of-line => $1.$2
    obviously 2nd parenthesis can only store extension because it forces everything between last dot and end-of-line. In the replacement you invoke the content of that parenthesis with $2

    your current code simply finds the episode number and takes everything after it verbatim, that's why any garbage that happens to be there after the number gets to the final name (first part up to the number gets transformed but anything after gets through).

    (something.s01e01=> s01e01).garbage.ext (bold shows the scope of your regex substitution)


    In my regex i match the whole name from start to end, with .*[.] to consume any garbage to the last dot leaving only extension to be captured and used to construct final name.

    something.s01e01.garbage.ext => s01e01.ext
    Last edited by Vaphell; June 3rd, 2013 at 07:50 PM.
    if your question is answered, mark the thread as [SOLVED]. Thx.
    To post code or command output, use [code] tags.
    Check your bash script here // BashFAQ // BashPitfalls

  4. #14
    Join Date
    Dec 2007
    Beans
    716

    Re: rename and regex

    I think what I don't understand is why .*[.] stops at the LAST [.] instead of the first [.] it comes to...

  5. #15
    Join Date
    Aug 2011
    Location
    47°9′S 126°43W
    Beans
    2,172
    Distro
    Ubuntu 16.04 Xenial Xerus

    Re: rename and regex

    Quote Originally Posted by rebeltaz View Post
    I think what I don't understand is why .*[.] stops at the LAST [.] instead of the first [.] it comes to...
    Because that the normal "greedy" behavior in regular expressions. In "aaabbbcccaaabbbccc" you have three possible matches for "aaa.*ccc": the first "aaabbbccc", the second one, or the whole string. I know, you are going to ask, "But why the VisualBasic is the default behavior to match the whole string"? And the answer is, because it is a lot easier in that case to prevent that behavior[*] and write an expression that matches only the first or last small strings than it would be, if the "frugal" behavior was the default, to construct a regexp that matches the whole string.

    Some regexp syntaxes have a modifier (*?, +?) that let you specify the shortest match. But Real Men don't use it
    [*] "aaa[^a]*ccc"

  6. #16
    Join Date
    Jul 2007
    Location
    Poland
    Beans
    4,499
    Distro
    Ubuntu 14.04 Trusty Tahr

    Re: rename and regex

    just like ofnuts said by default regexes try to consume as much as possible, besides when you write regex .*[.][^.]+$ there is no other option, $ clarifies it: line has to end with [.][^.]+. If after that dot only non-dots are allowed then it's the last one.

    Code:
    abc.def.ghi /  .*[.][^.]+$
    no way it will ever match, dot would have to be consumed by non-dot+ which is impossible

    Code:
    abc.def.ghi / .*[.][^.]+$
    everything is fine
    Last edited by Vaphell; June 3rd, 2013 at 10:31 PM.
    if your question is answered, mark the thread as [SOLVED]. Thx.
    To post code or command output, use [code] tags.
    Check your bash script here // BashFAQ // BashPitfalls

  7. #17
    Join Date
    Dec 2007
    Beans
    716

    Re: rename and regex

    Oh! I didn't see the dollar sign at the end of that equation. Now I get it.

    Aren't the two examples above (abc.def.ghi / .*[.][^.]+$) the same?

    Thank you all. You have been a great help!

  8. #18
    Join Date
    Feb 2009
    Beans
    1,469

    Re: rename and regex

    Quote Originally Posted by rebeltaz View Post
    Oh! I didn't see the dollar sign at the end of that equation. Now I get it.
    That's good, but it's important to note that the pattern will match (for the example strings) in exactly the same way without the dollar sign, because of the greediness of the * and + quantifiers and because [^.]+ matches only non-dots.

    Aren't the two examples above (abc.def.ghi / .*[.][^.]+$) the same?
    What Vaphell was pointing out was that some parts of the pattern match different parts of the string, which affects the substrings ($1 and $2) captured by the parentheses () in the original pattern.

    E.g. when matching against the string "hello, world":

    1. /([a-z]*)/ will match the first 5 characters of the string, putting 'hello' into $1;
    2. /.*([a-z]*)/ will match the whole string, putting '' (the empty string) into $1;
    3. /.*?([a-z]*)/ will match the whole string, putting 'world' into $1;
    4. /([a-z]*)$/ will match the last 5 characters of the string, putting 'world' into $1.


    The first three patterns match all the same strings -- it's not possible to construct a string for which one of them succeeds but another fails. (The fourth pattern only matches at the end of the string, so it'll never match a string like "hello, world4".) The differences lie in which parts of the string they match first, and how quickly that happens. (In many cases, /.*PATTERN/ matches the same thing as /PATTERN$/, but is likely to do it faster -- sometimes much faster.)

    This is stuff I picked up from the Camel book, and happens to apply because rename is written in Perl. Other languages and regex engines have slightly different rules, syntaxes and performance profiles, but the general concepts (like greediness) are the same.

  9. #19
    Join Date
    Dec 2007
    Beans
    716

    Re: rename and regex

    I think I understand, but I may need to take college course on regex if I ever attempt this again!

  10. #20
    Join Date
    Aug 2011
    Location
    47°9′S 126°43W
    Beans
    2,172
    Distro
    Ubuntu 16.04 Xenial Xerus

    Re: rename and regex

    No need for a college course. Everything is there: http://www.amazon.com/Mastering-Regu...dp/0596528124/

Page 2 of 3 FirstFirst 123 LastLast

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •