PDA

View Full Version : [ubuntu] how to copy text from a pdf in Asian language and paste in libreoffice



jamesbon
February 3rd, 2012, 11:08 AM
I have a pdf in an Asian language the pdf can be seen here
https://rapidshare.com/files/833850670/lalita_sahasranama17_pages.pdf

I open this pdf in document viewer and select all now I do a Ctrl+C and open Libreoffice.Here upon pasting I see a random garbled meaning lesss text
such as


A蒾缮s蒩缮錷商u商Si缮 x蓇缮飈裳ss汕br蓏缮x蒷缮
́上c散黆蒭商l蓅蓃缮 ́上q蒳社-抬蒻裳Ux裳lSU - 182
́上 蛕蓇缮 蛕蓇-z蒠ir涉Yr-锰m蒱上 s赏s蒳缮蝢o蒑
L鉼汕 ́上 s赏s蒳缮 S鉼r缮 l缮ql缮 x缮W鹸呻M黴社 e蒩裳 - 183
C蘨 ́上 o呻1⁄4蒱Q鹠裳U蒱慑 E ̈蒛Z缮hQ妍 ́上 W鹯蒩呻蟯缮a蓌ir x社莡缮S
́上 s赏s蒳缮x蒞鹸呻l缮q xi缮濮汕 x蓂m稍h娠q社 ||

How can I see the text correctly in libreoffice.
A correct document should appear like this
https://picasaweb.google.com/107404068162388981296/UnknownAsianLanguage#5704850858093630402

I have ibus running so I can easily type text in Hindi as shown here
http://www.youtube.com/watch?v=LL7icGNhIfI
but I can not do a copy paste from pdf to libreoffice.What is missing here?
I use Ubuntu 11.10 and gnome interface.

andrewc
February 3rd, 2012, 11:44 PM
EDIT: Sorry didn't read your post properly. So you can enter Hindi text in Libreoffice properly, but copying and pasting doesn't work properly?

EDIT2: does copying and pasting work properly to Abiword? I came across a post while googling for LibreOffice mangling UTF text that suggests that OO.org can mangle UTF text sent through the clipboard, so it's possible that LibreOffice also suffers from this bug as well.

This is the post http://user.services.openoffice.org/en/forum/viewtopic.php?f=25&t=42580

It's the only a mention of anything similar I could locate in a quick search, though.

EDIT 3: I know this is overkill, but have you tried installing the LibreOffice PDF import filter, importing the PDF and working from there?

ORIGINAL POST BELOW
First, do you have a font installed on your system and selected in LibreOffice that can display the Asian language? Ubuntu should have suitable fonts preinstalled. the Liberation fonts can display most Asian alphabets and there are specific fonts for others, like Hindi and Khmer installed

Secondly, AFAIK, LibreOffice sometime has issues displaying some East Asian alphabets correctly. This is a problem it inherited from OpenOfice.org, something which I hoped had been fixed by now.

You best solution is probably to install Abiword (if it isn't already), which should work fine with all Asian alphabets, assuming you have the correct fonts installed and selected.

grahammechanical
February 4th, 2012, 02:13 AM
That PDF has two embedded TrueType fonts one of which is BRH Devanagari. Have you tried finding a copy of that font and installing it on your system.

We can type in different languages in Ubuntu because we have Unicode fonts installed that have character sets for many languages. So, we are able to set different language keyboard layouts.

If the writer on that document had used a Unicode font to create those characters you might not have this problem. Instead he has used a particular font that has been embedded into the PDF document so that the document can be read even on computers that do not have that font installed.

I do not know if this is the answer. I have tried to find this font to download and test out this suggestion but I do not know if I am downloading the right one.

Regards.

jamesbon
February 4th, 2012, 04:03 AM
EDIT 3: I know this is overkill, but have you tried installing the LibreOffice PDF import filter, importing the PDF and working from there?
Yes I tried this one and the charachters which I see are like this you can try with pdf I shared at your machine too to confirm it.


#��#�k�Jյ���y�1�'��*�#�N#��U����#�o�>�!
��=�0[ң��#NA;b9v�>���T��X��j��##�l�#??����U�/e�7#�dS��n:#�f���7:�]K��PT
-t#�l#H#V�WKc��2r�ܧ2��;��x�#e��-#�q5#ӠqYf�6c)#�v-G�n��#�os��ϗI��L �#�<紽�%#@��-�yzL<#.
*U��#���##*e����r7j�:�Kz�d���#���|��l��#Z%��+� .#<#�


That PDF has two embedded TrueType fonts one of which is BRH Devanagari. Have you tried finding a copy of that font and installing it on your system.

How did you find that pdf has the font you mentioned?