Results 1 to 2 of 2

Thread: merging html files with wget

  1. #1
    Join Date
    Jan 2014

    merging html files with wget

    Hi all,

    With the command:

    wget -i list.txt --recursive --convert-links --page-requisites --html-extension --restrict-file-names=windows --no-clobber
    I am able to get html files for each URL contained in list.txt. But what If want only one big html file for each URL?

    I know that with the option -O file wget can merge files but how can I do it for each URL separately?

    How can I use the --domains option in this case?

    Thanks in advance.

  2. #2
    Join Date
    Mar 2010
    Ubuntu 12.04 Precise Pangolin

    Re: merging html files with wget

    Welcome to Ubuntu Forums nickeforos !

    By looking at the man page of wget, it seems there is no internal option in it to do what you want, and so you must loop the wget command on the URLs individually.

    For example, assuming your URL list file contains one URL per line, you can run the following loop -
    count=1; while read URL; do wget --recursive --convert-links --page-requisites --html-extension --restrict-file-names=windows --no-clobber -O file-$count "$URL"; let count=count+1; done < list.txt
    The above code should merge each URL's corresponding data in "file-<the URL's line no. in the list file>", (e.g. file-1, file-2, file-3... etc.). If you need a more meaningful name for each file, a relevant function to do that can be included in the loop (along with a function to make sure no files are overwritten in case of duplicate names).
    Help others by marking threads as [SOLVED], if they are. (See how)
    Wireless Script | Use Code Tags

Tags for this Thread


Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts