Results 1 to 3 of 3

Thread: Convert doc to xml

  1. #1
    Join Date
    Jun 2011
    Beans
    62

    Convert doc to xml

    I have a bunch of doc documents (Microsoft Word, not docx, can't open in archive manager). I want to convert them to xml, batch style.

    It is important that the xml document contain info about:

    • all the text
    • new lines
    • tab
    • styles like
      • small capitals
      • italics
      • bold


    Other than that, a not too complicated xml format.

    Any suggestions?
    Last edited by anemone42; June 5th, 2012 at 06:14 PM. Reason: Clarification

  2. #2
    Join Date
    Oct 2005
    Location
    Al Ain
    Beans
    8,574

    Re: Convert doc to xml

    You can run Libre Office Writer from the command line to do that. Go to their web site and have a look around.

  3. #3
    Join Date
    Jun 2011
    Beans
    62

    Re: Convert doc to xml

    Okay, it seems this is on the right track.

    Code:
    $ libreoffice --headless --invisible --convert-to xml test.doc
    My guess is I will have to twiddle either the infilter or the outfilter to keep all the formatting.

    New line is preserved, so is all the text. Tab is converted into a couple of spaces. Small caps, bold and italics is just lost.

    So, any suggestions for what I should use as filters (in and out) are welcome.
    Last edited by anemone42; June 6th, 2012 at 07:15 AM. Reason: Clarification.

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •