Page 1 of 2 12 LastLast
Results 1 to 10 of 15

Thread: Need a REGEX to increment the file number of a pdf file

  1. #1
    Join Date
    Mar 2011
    Beans
    30

    Need a REGEX to increment the file number of a pdf file

    Hello,

    I have a few thousand .pdf files in various folders each have a naming scheme like this:

    006_-_Titled_Document_#34_-_September-25-2011-side-1.pdf

    In each folder, the number system starts at 001 (as you see on the far left of the file name), and then ends at 999 (maximum .pdf files).

    Somewhere in the collection of files say .pdf # 286, I have 286 twice (duplicate). Which screws up my numbering system.

    I need a REGEX that I can enter into the shell when I'm in the .pdf directory, and start from say the duplicate # 286 and increment that 2nd duplicate # 286 & all the numbers after that by +1. So that they are all renamed appropriately.

    This way I don't have to go in there and rename each .pdf file manually one by one.

    Unfortunately, I am not very good with REGEX and haven't had much need for it in the past until now.

    Would anyone know the best way to automate this .pdf renumbering task? I would appreciate any constructive thoughts on how to accomplish this.

    Thank you.

  2. #2

    Re: Need a REGEX to increment the file number of a pdf file

    You'll need more than a regex to solve this problem... here, have some Perl.

    I'd do something like this:

    Code:
    $ cat >fixnames.pl <<"EOF"
    #!/usr/bin/perl -n
    
    ($old, $num) = /(?:.*\/)((\d{3})[^\/]*\.pdf)$/;
    next unless $old;
    $new = $old;
    $new =~ s/$num/sprintf "%03d", $num+1/e;
    if ($old > 286) {
        print "rename '$old', '$new'\n";
    }
    EOF
    $ chmod +x fixnames.pl
    $ find . -name '*.pdf' | ./fixnames.pl
    Change the 'find' command if necessary to get all the files you want without any false positives. Run it as-is first, and when you're sure it does what you want, edit fixnames.pl and change the "print" line to just
    Code:
        rename $old, $new;
    It won't rename the extra #286, though, you'll have to fix that by hand. It's a feature, really.

  3. #3
    Join Date
    Aug 2011
    Location
    47°9′S 126°43W
    Beans
    2,000
    Distro
    Kubuntu 12.10 Quantal Quetzal

    Re: Need a REGEX to increment the file number of a pdf file

    No need for a regex to do this... The real question is whether you want to renumber everything (filling any gaps), or whether the sequence should be altered only when finding a duplicate. Renumbering everything bluntly is likely going to be a bit easier:
    Code:
    #! /bin/bash
    
    count=0;
    for file in *
    do 
        count=$(( $count + 1 ))
        count0="000$count"
        renumbered="${count0: -3}_${file#???_}"
        [[ "$file" != "$renumbered" ]] && echo mv -v "$file" "$renumbered"
    done
    This code won't behave correctly is some non-numbered files starts with three characters and an underscore. Remove the red "echo" when you are confident it fits your needs.

  4. #4
    Join Date
    Feb 2013
    Beans
    Hidden!

    Re: Need a REGEX to increment the file number of a pdf file

    Well, I guess something like what trent.josephsen suggested can also be done with rename command alone as it is actually a perl script:
    Code:
    $ ls
    001_Document0.pdf  004_Document3.pdf  006_Document6.pdf  009_Document9.pdf
    002_Document1.pdf  004_Document4.pdf  007_Document7.pdf  010_Document10.pdf
    003_Document2.pdf  005_Document5.pdf  008_Document8.pdf
    $ rename -vn 's/^\d{3}/sprintf("%03d",$&+1)/e if substr($_,0,3)>4' *
    005_Document5.pdf renamed as 006_Document5.pdf
    006_Document6.pdf renamed as 007_Document6.pdf
    007_Document7.pdf renamed as 008_Document7.pdf
    008_Document8.pdf renamed as 009_Document8.pdf
    009_Document9.pdf renamed as 010_Document9.pdf
    010_Document10.pdf renamed as 011_Document10.pdf
    Last edited by schragge; March 28th, 2013 at 11:45 PM.

  5. #5

    Re: Need a REGEX to increment the file number of a pdf file

    Ah, I realize now that I misunderstood the problem... I was thinking you had many files with sequential names spread throughout a directory hierarchy, instead of a set of sequential names all in one folder. So I wrote a script that would handle whole paths when all that was really necessary was something much simpler, like what schragge did.

  6. #6
    Join Date
    Feb 2009
    Location
    Dallas, TX
    Beans
    6,894
    Distro
    Ubuntu 14.04 Trusty Tahr

    Re: Need a REGEX to increment the file number of a pdf file

    Hi Marcus Aurelius.

    I had a similar challenge some time ago. What I did was create a script that reports the integrity of the directories, so that I could design a better solution.

    This would be that reporting script adapted to your case:
    Code:
    #!/bin/bash
    
    # Base directory containing other directories with pdfs.
    PROJECT_DIR="/path/to/project/"
    
    # Cycle through all subdirectories
    while IFS= read -d '' dir; do
        echo "$dir"
    
        all_ok=false
    
        # cycle through all prefixes: 001...999
        for index in $(seq -f "%03.0f" 1 999); do
    
            # Get all files that start with prefix "$index".
            unset list i
            while IFS= read -d '' file; do
                list[i++]="$file"
            done< <(find "$dir" -maxdepth 1 -type f -name "${index}*" -print0)
    
            count=${#list[@]}
            all_ok=false
    
            # Missing file.
            if [ $count -eq 0 ]; then
                echo "    missing file: $index"
    
            # Perfect case: 1 file per index.
            elif [ $count -eq 1 ]; then
                #echo "    index $index OK: ${list[0]}"
                all_ok=true
    
            # Duplicate case.
            elif [ $count -gt 0 ]; then
                echo "    $index prefix: $count duplicate files:"
                for f in "${list[@]}"; do
                    echo "      $f"
                done
            fi
        done
    
        # Summary message in case no errors.
        if $all_ok; then
            echo "    All OK."
        fi
    
    done< <(find "$PROJECT_DIR" -mindepth 1 -type d -print0)

  7. #7
    Join Date
    Mar 2011
    Beans
    30

    Re: Need a REGEX to increment the file number of a pdf file

    Quote Originally Posted by schragge View Post
    Well, I guess something like what trent.josephsen suggested can also be done with rename command alone as it is actually a perl script:
    Code:
    $ ls
    001_Document0.pdf  004_Document3.pdf  006_Document6.pdf  009_Document9.pdf
    002_Document1.pdf  004_Document4.pdf  007_Document7.pdf  010_Document10.pdf
    003_Document2.pdf  005_Document5.pdf  008_Document8.pdf
    $ rename -vn 's/^\d{3}/sprintf("%03d",$&+1)/e if substr($_,0,3)>4' *
    005_Document5.pdf renamed as 006_Document5.pdf
    006_Document6.pdf renamed as 007_Document6.pdf
    007_Document7.pdf renamed as 008_Document7.pdf
    008_Document8.pdf renamed as 009_Document8.pdf
    009_Document9.pdf renamed as 010_Document9.pdf
    010_Document10.pdf renamed as 011_Document10.pdf


    OK, thank you to all for all the input.
    I tried so far:
    #1: I set up some test files in a new directory, heres the original ls: #015-023 and #017 are the duplicates (see no # 018).

    $ ls
    015_-_Test_File_#34_-_September-28-2011-side-1.pdf 020_-_Test_File_#34_-_September-30-2011.pdf
    016_-_Test_File_#34_-_September-28-2011-side-2.pdf 021_-_Test_File_#34_-_October-1-2011.pdf
    017_-_Test_File_#34_-_September-28-2011-side-3.pdf 022_-_Test_File_#34_-_October-1-2011-side-2.pdf
    017_-_Test_File_#34_-_September-29-2011.pdf 023_-_Test_File_#34_-_October-1-2011-side-3.pdf
    019_-_Test_File_#34_-_September-29-2011-side-1.pdf


    If I run the following using # 019, I get an illegal octal digit error:

    $ rename -vn 's/^\d{3}/sprintf("%03d",$&+1)/e if substr($_,0,3)>019' *
    Illegal octal digit '9' at (eval 1) line 1, at end of line


    AGAIN:
    If instead of # 019 like above, I choose #017, this is the output. However, if I look at line 016 it renames to 017. Shouldn't the numbers higher than 017 be the only ones that changed?

    $ rename -vn 's/^\d{3}/sprintf("%03d",$&+1)/e if substr($_,0,3)>017' *
    016_-_Test_File_#34_-_September-28-2011-side-2.pdf renamed as 017_-_Test_File_#34_-_September-28-2011-side-2.pdf
    017_-_Test_File_#34_-_September-28-2011-side-3.pdf renamed as 018_-_Test_File_#34_-_September-28-2011-side-3.pdf
    017_-_Test_File_#34_-_September-29-2011.pdf renamed as 018_-_Test_File_#34_-_September-29-2011.pdf
    019_-_Test_File_#34_-_September-29-2011-side-1.pdf renamed as 020_-_Test_File_#34_-_September-29-2011-side-1.pdf
    020_-_Test_File_#34_-_September-30-2011.pdf renamed as 021_-_Test_File_#34_-_September-30-2011.pdf
    021_-_Test_File_#34_-_October-1-2011.pdf renamed as 022_-_Test_File_#34_-_October-1-2011.pdf
    022_-_Test_File_#34_-_October-1-2011-side-2.pdf renamed as 023_-_Test_File_#34_-_October-1-2011-side-2.pdf
    023_-_Test_File_#34_-_October-1-2011-side-3.pdf renamed as 024_-_Test_File_#34_-_October-1-2011-side-3.pdf


    I appreciate all the info so far.

  8. #8
    Join Date
    Aug 2011
    Location
    47°9′S 126°43W
    Beans
    2,000
    Distro
    Kubuntu 12.10 Quantal Quetzal

    Re: Need a REGEX to increment the file number of a pdf file

    Don't use '019' but '19'. When you write '019' it is assumesd to be a number in octal notation (in which '9' isn't a valid digit)....

    May I point out that the code by my esteemed co-posters requires you to first spot any duplicates, while my suggestion can be applied blindly?

  9. #9
    Join Date
    Feb 2013
    Beans
    Hidden!

    Re: Need a REGEX to increment the file number of a pdf file

    +1 to ofnuts on both accounts. Applying his solution to the test case from my previous post:
    Code:
    $ c=0;for f in *;{ printf -vn "%03d${f#???}" $((++c));[[ $f != $n ]]&&echo "$f -> $n";}
    004_Document4.pdf -> 005_Document4.pdf
    005_Document5.pdf -> 006_Document5.pdf
    006_Document6.pdf -> 007_Document6.pdf
    007_Document7.pdf -> 008_Document7.pdf
    008_Document8.pdf -> 009_Document8.pdf
    009_Document9.pdf -> 010_Document9.pdf
    010_Document10.pdf -> 011_Document10.pdf

  10. #10
    Join Date
    Apr 2012
    Beans
    5,574

    Re: Need a REGEX to increment the file number of a pdf file

    ^^^ that's almost exactly what I came up with as well, but since the OP mentioned regexs I wondered if it would be worth adding a regex check on the 3 digit prefix, something like

    Code:
    i=0; for file in *.pdf; do if [[ "$file" =~ ^[0-9]{3}.* ]]; then printf -v newfile "%03d%s" $((i++)) "${file:3}"; echo mv "$file" "$newfile"; fi; done

Page 1 of 2 12 LastLast

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •