PDA

View Full Version : [ubuntu] how to copy text from a pdf in Asian language and paste in libreoffice



jamesbon
February 3rd, 2012, 11:08 AM
I have a pdf in an Asian language the pdf can be seen here
https://rapidshare.com/files/833850670/lalita_sahasranama17_pages.pdf

I open this pdf in document viewer and select all now I do a Ctrl+C and open Libreoffice.Here upon pasting I see a random garbled meaning lesss text
such as


AÉoÉÉsÉaÉÉåmÉÌuÉÌSiÉÉ xÉuÉÉïlÉÑssÉÇbrÉzÉÉxÉlÉÉ
́ÉÏcÉ¢üUÉeÉÌlÉsÉrÉÉ ́ÉÏqÉiÉç-̧ÉmÉÑUxÉÑlSUÏ - 182
́ÉÏ ÍzÉuÉÉ ÍzÉuÉ-zÉYirÉæYrÉ-ÃÌmÉhÉÏ sÉÍsÉiÉÉÎqoÉMüÉ
LãuÉÇ ́ÉÏ sÉÍsÉiÉÉ SãurÉÉ lÉÉqlÉÉÇ xÉÉWûxÉëMüqÉç eÉaÉÑÈ - 183
CÌiÉ ́ÉÏ oÉë1⁄4ÉhQûmÉÑUÉhÉå E ̈ÉUZÉÉhQåû ́ÉÏ WûrÉaÉëÏuÉÉaÉxirÉ xÉçÇuÉÉSå
́ÉÏ sÉÍsÉiÉÉxÉWûxÉëlÉÉqÉ xiÉÉå§ÉÇ xÉqmÉÔhÉïqÉç ||

How can I see the text correctly in libreoffice.
A correct document should appear like this
https://picasaweb.google.com/107404068162388981296/UnknownAsianLanguage#5704850858093630402

I have ibus running so I can easily type text in Hindi as shown here
http://www.youtube.com/watch?v=LL7icGNhIfI
but I can not do a copy paste from pdf to libreoffice.What is missing here?
I use Ubuntu 11.10 and gnome interface.

andrewc
February 3rd, 2012, 11:44 PM
EDIT: Sorry didn't read your post properly. So you can enter Hindi text in Libreoffice properly, but copying and pasting doesn't work properly?

EDIT2: does copying and pasting work properly to Abiword? I came across a post while googling for LibreOffice mangling UTF text that suggests that OO.org can mangle UTF text sent through the clipboard, so it's possible that LibreOffice also suffers from this bug as well.

This is the post http://user.services.openoffice.org/en/forum/viewtopic.php?f=25&t=42580

It's the only a mention of anything similar I could locate in a quick search, though.

EDIT 3: I know this is overkill, but have you tried installing the LibreOffice PDF import filter, importing the PDF and working from there?

ORIGINAL POST BELOW
First, do you have a font installed on your system and selected in LibreOffice that can display the Asian language? Ubuntu should have suitable fonts preinstalled. the Liberation fonts can display most Asian alphabets and there are specific fonts for others, like Hindi and Khmer installed

Secondly, AFAIK, LibreOffice sometime has issues displaying some East Asian alphabets correctly. This is a problem it inherited from OpenOfice.org, something which I hoped had been fixed by now.

You best solution is probably to install Abiword (if it isn't already), which should work fine with all Asian alphabets, assuming you have the correct fonts installed and selected.

grahammechanical
February 4th, 2012, 02:13 AM
That PDF has two embedded TrueType fonts one of which is BRH Devanagari. Have you tried finding a copy of that font and installing it on your system.

We can type in different languages in Ubuntu because we have Unicode fonts installed that have character sets for many languages. So, we are able to set different language keyboard layouts.

If the writer on that document had used a Unicode font to create those characters you might not have this problem. Instead he has used a particular font that has been embedded into the PDF document so that the document can be read even on computers that do not have that font installed.

I do not know if this is the answer. I have tried to find this font to download and test out this suggestion but I do not know if I am downloading the right one.

Regards.

jamesbon
February 4th, 2012, 04:03 AM
EDIT 3: I know this is overkill, but have you tried installing the LibreOffice PDF import filter, importing the PDF and working from there?
Yes I tried this one and the charachters which I see are like this you can try with pdf I shared at your machine too to confirm it.


#��#�k�Jյ���y�1�'��*�#�N#��U����#�o�>�!
��=�0[ң��#NA;b9v�>���T��X��j��##�l�#??����U�/e�7#�dS��n:#�f���7:�]K��PT
-t#�l#H#V�WKc��2r�ܧ2��;��x�#e��-#�q5#ӠqYf�6c)#�v-G�n��#�os��ϗI��L �#�<紽�%#@��-�yzL<#.
*U��#���##*e����r7j�:�Kz�d���#���|��l��#Z%��+� .#<#�


That PDF has two embedded TrueType fonts one of which is BRH Devanagari. Have you tried finding a copy of that font and installing it on your system.

How did you find that pdf has the font you mentioned?