Page 1 of 2 12 LastLast
Results 1 to 10 of 14

Thread: command line file editing

  1. #1
    Join Date
    Dec 2005
    Beans
    49

    command line file editing

    I have a list of 1,600 URL's like this:
    http://www.website.com/2014/04/03/some-page-name.aspx

    I want to make a list of 301 redirects with a structure that doesn't have dates or file extensions

    I need the file to be like this
    Redirect 301 http://www.website.com/2014/04/03/some-page-name.aspx http://www.website.com/some-page-name

    I can make a script to remove any bit of that, and even add the Redirect 301, but I can't seem to make the script take a line like this:
    http://www.website.com/2014/04/03/some-page-name.aspx

    and make it this:
    http://www.website.com/2014/04/03/some-page-name.aspx http://www.website.com/2014/04/03/some-page-name.aspx

    any sed, awk, or other scriptaculous things I can do?

  2. #2
    Join Date
    Dec 2009
    Beans
    166

    Re: command line file editing help

    Quote Originally Posted by jtpratt View Post
    I have a list of 1,600 URL's like this:
    http://www.website.com/2014/04/03/some-page-name.aspx

    I want to make a list of 301 redirects with a structure that doesn't have dates or file extensions

    I need the file to be like this
    Redirect 301 http://www.website.com/2014/04/03/some-page-name.aspx http://www.website.com/some-page-name

    [...]
    One way with awk:

    Code:
    awk '{ rec=$0; gsub(/[0-9]+\/[0-9]+\/[0-9]+\/|\.[^.]+$/,"") ; print "Redirect 301 " rec " " $0 }'  filename
    The code is based on your posted input, however the regex part inside gsub can be made stricter if needed.

  3. #3
    Join Date
    Dec 2005
    Beans
    49

    Re: command line file editing help

    Quote Originally Posted by erind View Post
    One way with awk:

    Code:
    awk '{ rec=$0; gsub(/[0-9]+\/[0-9]+\/[0-9]+\/|\.[^.]+$/,"") ; print "Redirect 301 " rec " " $0 }'  filename
    The code is based on your posted input, however the regex part inside gsub can be made stricter if needed.
    that was great, thanks! It only printed to the screen, so I updated like this to print to a new file:

    awk '{ rec=$0; gsub(/[0-9]+\/[0-9]+\/[0-9]+\/|\.[^.]+$/,"") ; print "Redirect 301 " rec " " $0 }' urls2.txt > urls3.txt && mv urls3.txt urlsfinal.txt

  4. #4
    Join Date
    Dec 2005
    Beans
    49

    Re: command line file editing help

    I guess what I thought would work at first needs some modification. for instanced, let's say I have a line like this:
    www.website.com/2014/4/30/this-is-my-page

    I'm building a file a 301 redirects (1,600 of them).
    so if I use that code I'll get
    Redirect www.website.com/2014/4/30/this-is-my-page www.website.com/2014/4/30/this-is-my-page

    so let's say I find and replace the dates out - I'll remove them from both entries.

    What I need in the end is for each line to have dates in the first entry, and no dates in the second like this:
    Redirect www.website.com/2014/4/30/this-is-my-page www.website.com/this-is-my-page

    the URL's all have varying length.

    1. is there a way I can remove the date sub-dirs from only the second URL?

    2. if not I can build 2 files, one with dates and one not. Is there a way to merge the 2 files together where the right lines line up?

  5. #5
    Join Date
    Apr 2012
    Beans
    5,136

    Re: command line file editing help

    You could use awk to split the URL into /-delimited fields, then print out the ones you want to keep e.g. if

    Code:
    url='www.website.com/2014/4/30/this-is-my-page'
    then
    Code:
    $ awk -F/ '{print "Redirect",$0,$1"/"$NF}' <<< "$url"
    Redirect www.website.com/2014/4/30/this-is-my-page www.website.com/this-is-my-page
    or something like this with sed, which just deletes anything matching (more-or-less) /yyyy/mm/dd?

    Code:
    $ echo "Redirect $url" $(sed -r 's|/[0-9]+/[0-9]+/[0-9]+||' <<< "$url")
    Redirect www.website.com/2014/4/30/this-is-my-page www.website.com/this-is-my-page

  6. #6
    Join Date
    Dec 2005
    Beans
    49

    Re: command line file editing help

    well, here's the thing. I can remove dates from a line no problem. The rewrite rules are basically redirect this URL to that URL
    My original URL is this:
    www.website.com/2014/4/30/this-is-my-page

    The rewrite rules need to be redirect this URL to the new URL like this:
    Redirect 301 www.website.com/2014/4/30/this-is-my-page www.website.com/this-is-my-page

    Notice the URL appears twice in that line - that's the end product of what I need.

    If you look through the posts here since the beginning I got some code to repeat the same URL on a line.
    so I can turn this:
    www.website.com/2014/4/30/this-is-my-page

    inito this:
    www.website.com/2014/4/30/this-is-my-page www.website.com/2014/4/30/this-is-my-page

    and I can prepend all the lines with Redirect 301 like this:
    Redirect 301 www.website.com/2014/4/30/this-is-my-page www.website.com/2014/4/30/this-is-my-page

    what I cannot do is - the date sub-dirs appear twice (/2014/4/30). I want the date to appear in the first URL (the original location of the page), but be removed in the second URL (the new location of the URL)
    so if I have this:
    Redirect 301 www.website.com/2014/4/30/this-is-my-page www.website.com/2014/4/30/this-is-my-page

    how do I remove the dates from only the second URL in 1,600 lines? The URL will vary - so it's not like I can character count. As I mentioned, maybe an option is to make 2 files of URL's - one with dates, and one without - and merge them together? I just need a way to merge each exact line - separated by a space.

    Again - thx for everyone's help, my regex and find / replace skills are really lacking (even though I've been on Ubuntu for 8-9 years).

  7. #7
    Join Date
    Apr 2012
    Beans
    5,136

    Re: command line file editing help

    isn't that exactly what I posted above?

    or are you asking about how to read the URL lines from the file?

  8. #8
    Join Date
    Dec 2005
    Beans
    49

    Re: command line file editing help

    I'm not sure - sorry I'm a webdev with some hacking skills going back to Perl and some shell, but I don't totally get what you posted. In the first one you assign a URL to something. In the second you prepend the line - I get that, but then the second line has this:
    Redirect www.website.com/2014/4/30/this-is-my-page www.website.com/this-is-my-page

    so I don't get that...I don't get what the $0, $1, and $NF vars come from.

    the last one:
    $ echo "Redirect $url" $(sed -r 's|/[0-9]+/[0-9]+/[0-9]+||' <<< "$url")

    you said it removes /yy/mm/dd/ - but wouldn't that remove it from both URL's? How I limit it to just the last one?

    I guess the main thing is - I didn't understand if your approach was to remove the dates from a file with 2 URL's per line, or 2 files with single rows of URL's that get merged together.

    Sorry to be such a PITA rear here, my knowledge in this area stopped at command line find and grep operations a decade back.

  9. #9
    Join Date
    Apr 2012
    Beans
    5,136

    Re: command line file editing help

    in

    Code:
    echo "Redirect $url" $(sed -r 's|/[0-9]+/[0-9]+/[0-9]+||' <<< "$url")
    the $url part is the original, unmodified URL string and the $(sed -r 's|/[0-9]+/[0-9]+/[0-9]+||' <<< "$url") part is the same string after stripping the date

  10. #10
    Join Date
    Sep 2006
    Beans
    7,227
    Distro
    Lubuntu Development Release

    Re: command line file editing help

    Quote Originally Posted by jtpratt View Post
    so I don't get that...I don't get what the $0, $1, and $NF vars come from.
    They are built into awk. $0 stands for the whole input record. A record is usually a single line. $1 stands for the first field, as defined by the pattern in the Field Separator, FS. $2 would be the second field, $3 the third field and so on. NF is a built-in variable, one of several, but it contains the number of fields found in the current record (line).

    You mentioned perl, if that is easier to work with for you, something like this might work:

    Code:
    perl  -w -p -e 's|^(.*//[^/]+/)(\d{4}/\d{2}/\d{2}/)(.*)$|Redirect 301 $1$2$3 $1$3|' < yourfile
    I'm not sure that's clearer to read though.

Page 1 of 2 12 LastLast

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •