PDA

View Full Version : [SOLVED] Tracker search extract filter service for wordperfect


lizardsigh
September 4th, 2009, 02:57 PM
Here are the steps I took to get default install of Jaunty's tracker index search tool/daemon to handle the extraction and indexing of the contents of various versions of old wordperfect files. Wordperfect files usually have the *.wpd extension. I have hundreds of wordperfect files where the secretary had used her own filenaming convention; she changed the DOS filename extension from the default wp* to whatever type document she was typing. My collection thus has *.ltr files etc.

I could not easily discover from the documentation how new filters were added, but I figured out eventually and here's what you do:



Install libwpd-tools
Edit /usr/share/tracker/services/default.service
This is the file that needs to know the mime type of wordperfect files.
Go down to the [Documents] section and add "application/vnd.wordperfect;" to the end of the line. Shown below:

[Documents]
DisplayName=Documents
Description=Office and PDF based files
PropertyPrefix=Doc
Parent=Files
UIVisible=true
Icon=x-office-document
KeyMetadata1=Doc:Title
KeyMetadata2=Doc:Author
KeyMetadata3=Doc:Created
TabularMetadata=File:Name;File:Mime;Doc:Title;Doc: Author;File:Size;File:Modified;Doc:Created;
TileMetadata=Doc:Title;Doc:Subject;Doc:Author;Doc: Created;DocageCount;File:Size;
ContentMetadata=File:Contents
Mimes=application/rtf;text/richtext;application/msword;application/pdf;application/postscript;application/x-dvi;application/vnd.ms-excel;vnd.ms-powerpoint;application/x-abiword;text/html;text/sgml;text/x-tex;application/x-mswrite;application/x-applix-word;application/docbook+xml;application/x-kword;application/x-kword-crypt;application/x-lyx;application/vnd.lotus-1-2-3;application/x-applix-spreadsheet;application/x-gnumeric;application/x-kspread;application/x-kspread-crypt;application/x-quattropro;application/x-sc;application/x-siag;application/x-magicpoint;application/x-kpresenter;application/illustrator;application/vnd.corel-draw;application/vnd.stardivision.draw;application/vnd.oasis.opendocument.graphics;application/x-dia-diagram;application/x-karbon;application/x-killustrator;application/x-kivio;application/x-kontour;application/x-wpg;application/vnd.wordperfect;
MimePrefixes=application/vnd.oasis.opendocument;application/vnd.sun.xml;application/vnd.stardivision;
ShowServiceFiles=true
ShowServiceDirectories=true
HasMetadata=true
HasFullText=true
HasThumbs=true
"application/vnd.wordperfect" is the mime type. Now you create a filter so that Tracker can extract the contents and index it. The easiest way is to simply start with a copy of an existing filter and modify it. cd to the following directory:
/usr/lib/tracker/filters/application/
copy the pdf_filter like so:
"cp pdf_filter vnd.worperfect_filter"


Now edit vnd.worperfect_filter, delete the line that was in pdf_filter ("nice -n19 pdftotext -enc UTF-8 -q -nopgbrk "$1" -") and the script should simply be the following:

#!/bin/sh
nice -n19 wpd2text "$1" -


Now you have to completely re-index everything. Simply choosing re-index from the Tracker applet won't be enough. You must delete the existing index database. From your home directory: "rm .local/share/tracker/data/common.db"
NOW, right-click on the Tracker magnifying glass applet and after you make sure to add the directories where your wordperfect files are, then choose to re-index. After it's done, you should be able to search contents of all your wordperfect files.

kowy
January 7th, 2011, 10:41 AM
Thank you. It helps me much.

It's a pity this very useful hint is missing in Tracker documentation wiki.

forliberty
February 22nd, 2011, 12:48 AM
Thank you, lizardsigh; this is a life-saver!! I had been trying to get recoll to index WordPerfect files, but even with compiling the libwpd helper app from source, could not get it to work.

I would add a very simple correction to your HOWTO: you said to add the following line to vnd.worperfect_filter:

nice -n19 wpd2text "$1" -

But the hyphen at the end of the line must be deleted, or it won't work. In other words, the line should be:

nice -n19 wpd2text "$1"

medoc92
March 12th, 2011, 10:55 AM
[QUOTE=forliberty;10482046]Thank you, lizardsigh; this is a life-saver!! I had been trying to get recoll to index WordPerfect files, but even with compiling the libwpd helper app from source, could not get it to work.
/QUOTE]

Just for the record and the benefit of future search engine travellers, on Ubuntu, you need to install the libwpd-tools to get the wpd2html command (used by Recoll to extract text from wordperfect files). This works for me with recoll 1.15.5 on Lucid.

forliberty
March 12th, 2011, 03:13 PM
I'm glad recoll works for you, medoc92, but even with libwpd-tools installed from the libraries, it would not work for me. That's why I tried compiling the latest version of the libwpd helper app from source, to see if maybe it would work then. It did not.