Code:$ rename -nv 's/.*([a-zA-Z][0-9]+[a-zA-Z][0-9]+).*[.]([^.]+)$/\L$1.$2/' really* really.long.filename.with.extra.stuff.A12B33.and.m ore.stuff.extension renamed as a12b33.extension
Code:$ rename -nv 's/.*([a-zA-Z][0-9]+[a-zA-Z][0-9]+).*[.]([^.]+)$/\L$1.$2/' really* really.long.filename.with.extra.stuff.A12B33.and.m ore.stuff.extension renamed as a12b33.extension
if your question is answered, mark the thread as [SOLVED]. Thx.
To post code or command output, use [code] tags.
Check your bash script here // BashFAQ // BashPitfalls
Wow... regex may be REGULAR, but it makes my head hurt! OK, so...
I understand the .*([a-zA-Z][0-9]+[a-zA-Z][0-9]+).* but why the next [.] after that? How does [^.]+ match the extension? If I understand regex correctly (and I do not!) [^.] matches any character that is not a . (which is confusing because ^ without the brackets indicates the start of the line! ). Would that not be every alphanumeric character including the extension?
I promise I am not looking JUST to have this written for me. I really do want to understand it so I can do this on my own next time. Yeah, I know... rather presumptuous of me
But I do appreciate all of the help!
[^.]+ = not-dot 1-or-more times
general idea:
anything(char-digits-char-digits)anything[dot](not-dots)end-of-line => $1.$2
obviously 2nd parenthesis can only store extension because it forces everything between last dot and end-of-line. In the replacement you invoke the content of that parenthesis with $2
your current code simply finds the episode number and takes everything after it verbatim, that's why any garbage that happens to be there after the number gets to the final name (first part up to the number gets transformed but anything after gets through).
(something.s01e01=> s01e01).garbage.ext (bold shows the scope of your regex substitution)
In my regex i match the whole name from start to end, with .*[.] to consume any garbage to the last dot leaving only extension to be captured and used to construct final name.
something.s01e01.garbage.ext => s01e01.ext
Last edited by Vaphell; June 3rd, 2013 at 07:50 PM.
if your question is answered, mark the thread as [SOLVED]. Thx.
To post code or command output, use [code] tags.
Check your bash script here // BashFAQ // BashPitfalls
I think what I don't understand is why .*[.] stops at the LAST [.] instead of the first [.] it comes to...
Because that the normal "greedy" behavior in regular expressions. In "aaabbbcccaaabbbccc" you have three possible matches for "aaa.*ccc": the first "aaabbbccc", the second one, or the whole string. I know, you are going to ask, "But why the VisualBasic is the default behavior to match the whole string"? And the answer is, because it is a lot easier in that case to prevent that behavior[*] and write an expression that matches only the first or last small strings than it would be, if the "frugal" behavior was the default, to construct a regexp that matches the whole string.
Some regexp syntaxes have a modifier (*?, +?) that let you specify the shortest match. But Real Men don't use it
[*] "aaa[^a]*ccc"
just like ofnuts said by default regexes try to consume as much as possible, besides when you write regex .*[.][^.]+$ there is no other option, $ clarifies it: line has to end with [.][^.]+. If after that dot only non-dots are allowed then it's the last one.
no way it will ever match, dot would have to be consumed by non-dot+ which is impossibleCode:abc.def.ghi / .*[.][^.]+$
everything is fineCode:abc.def.ghi / .*[.][^.]+$
Last edited by Vaphell; June 3rd, 2013 at 10:31 PM.
if your question is answered, mark the thread as [SOLVED]. Thx.
To post code or command output, use [code] tags.
Check your bash script here // BashFAQ // BashPitfalls
Oh! I didn't see the dollar sign at the end of that equation. Now I get it.
Aren't the two examples above (abc.def.ghi / .*[.][^.]+$) the same?
Thank you all. You have been a great help!
That's good, but it's important to note that the pattern will match (for the example strings) in exactly the same way without the dollar sign, because of the greediness of the * and + quantifiers and because [^.]+ matches only non-dots.
What Vaphell was pointing out was that some parts of the pattern match different parts of the string, which affects the substrings ($1 and $2) captured by the parentheses () in the original pattern.Aren't the two examples above (abc.def.ghi / .*[.][^.]+$) the same?
E.g. when matching against the string "hello, world":
- /([a-z]*)/ will match the first 5 characters of the string, putting 'hello' into $1;
- /.*([a-z]*)/ will match the whole string, putting '' (the empty string) into $1;
- /.*?([a-z]*)/ will match the whole string, putting 'world' into $1;
- /([a-z]*)$/ will match the last 5 characters of the string, putting 'world' into $1.
The first three patterns match all the same strings -- it's not possible to construct a string for which one of them succeeds but another fails. (The fourth pattern only matches at the end of the string, so it'll never match a string like "hello, world4".) The differences lie in which parts of the string they match first, and how quickly that happens. (In many cases, /.*PATTERN/ matches the same thing as /PATTERN$/, but is likely to do it faster -- sometimes much faster.)
This is stuff I picked up from the Camel book, and happens to apply because rename is written in Perl. Other languages and regex engines have slightly different rules, syntaxes and performance profiles, but the general concepts (like greediness) are the same.
I think I understand, but I may need to take college course on regex if I ever attempt this again!
No need for a college course. Everything is there: http://www.amazon.com/Mastering-Regu...dp/0596528124/
Bookmarks