Results 1 to 6 of 6

Thread: Convert from html to text

  1. #1
    Join Date
    Aug 2008
    Location
    India
    Beans
    219
    Distro
    Ubuntu 9.04 Jaunty Jackalope

    Convert from html to text

    I have 2 GB of html files and i need to convert all of them at once to text files. Is there any way that I can do it. Converting each of them manually is really a pain and I want to convert all of them with a single command to text files.

    Can anyone help

  2. #2
    Join Date
    Mar 2010
    Location
    Minnesota
    Beans
    857
    Distro
    Ubuntu 11.04 Natty Narwhal

    Re: Convert from html to text

    Um you could open them in say firefox and copy the text? I don't know I am understanding you.
    You can follow me at my blog.

  3. #3
    Join Date
    Apr 2009
    Beans
    264
    Distro
    Ubuntu 12.04 Precise Pangolin

    Re: Convert from html to text

    You could find or write a script, perhapse. Do you just need plain text?

  4. #4
    Join Date
    Apr 2006
    Location
    Fresno CA
    Beans
    2,790
    Distro
    Ubuntu 12.04 Precise Pangolin

    Re: Convert from html to text

    You can open any html file with OpenOffice and then save as Text which strips all the markup and images. There may be some editing since it will save the text but not the URL of any navigation link on the HTML page.
    Thank a veteran -- George 8)
    Internet Coach & Writer
    Personal Blog -- 3 Joes' Blog

  5. #5
    Join Date
    Nov 2006
    Location
    Craggy Island.
    Beans
    Hidden!
    Distro
    Ubuntu 10.04 Lucid Lynx

    Re: Convert from html to text

    I used to have to convert a lot of xml raw billing data to CSV format.
    I used an excellent program called xml2csv. There was a linux and a windows version of that software.
    Becuase the xml tags were somewhat unique I could filter what I wanted to see and how I wanted to see it.

    I believe that program also contained a html2csv
    (im not sure about text, however, a simple vi would remove any commas.)
    sorting was hard as the html had repetative tags which meant sorting data was difficult.

    Anyway Im babbling, if you look around for html2csv and maybe even to txt you may find something,.
    Heres is a python script i found from a quick google
    note its 2002, and not the ine I think I used.

    Failing all that, and if your feeling lazy
    you could put up your data on a webserver and use something like w3m to only see the data without the formatting?
    You can tell a man who boozes by the company he chooses, as the pig got up and slowly walked away.

  6. #6
    Join Date
    May 2008
    Beans
    142

    Re: Convert from html to text

    Quote Originally Posted by roshanjose View Post
    I have 2 GB of html files and i need to convert all of them at once to text files. Is there any way that I can do it. Converting each of them manually is really a pain and I want to convert all of them with a single command to text files.

    Can anyone help
    You might try running lynx inside a shell script, maybe something like:

    Code:
    #!/bin/sh
    
    if [ ! -d text ]; then
      echo "Creating text subdir."
      mkdir text
    fi
    
    for file in *.html
    do
      FNAME=`echo $file | cut -d . -f 1`
      lynx -dump $file >text/$FNAME.txt
    done
    (This would dump the converted files into a "text" subdirectory.)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •