Results 1 to 4 of 4

Thread: how to copy text from a pdf in Asian language and paste in libreoffice

  1. #1
    Join Date
    Jun 2010
    Beans
    380

    how to copy text from a pdf in Asian language and paste in libreoffice

    I have a pdf in an Asian language the pdf can be seen here
    https://rapidshare.com/files/8338506...ma17_pages.pdf

    I open this pdf in document viewer and select all now I do a Ctrl+C and open Libreoffice.Here upon pasting I see a random garbled meaning lesss text
    such as

    Code:
    A蒾缮s蒩缮錷商u商Si缮 x蓇缮飈裳ss汕br蓏缮x蒷缮
     ́上c散黆蒭商l蓅蓃缮  ́上q蒳社-抬蒻裳Ux裳lSU - 182
     ́上 蛕蓇缮 蛕蓇-z蒠ir涉Yr-锰m蒱上 s赏s蒳缮蝢o蒑
    L鉼汕  ́上 s赏s蒳缮 S鉼r缮 l缮ql缮 x缮W鹸呻M黴社 e蒩裳 - 183
    C蘨  ́上 o呻1⁄4蒱Q鹠裳U蒱慑 E ̈蒛Z缮hQ妍  ́上 W鹯蒩呻蟯缮a蓌ir x社莡缮S
     ́上 s赏s蒳缮x蒞鹸呻l缮q xi缮濮汕 x蓂m稍h娠q社 ||
    How can I see the text correctly in libreoffice.
    A correct document should appear like this
    https://picasaweb.google.com/1074040...50858093630402

    I have ibus running so I can easily type text in Hindi as shown here
    http://www.youtube.com/watch?v=LL7icGNhIfI
    but I can not do a copy paste from pdf to libreoffice.What is missing here?
    I use Ubuntu 11.10 and gnome interface.
    Last edited by jamesbon; February 3rd, 2012 at 11:26 AM.

  2. #2
    Join Date
    Jan 2005
    Location
    The Pudding Isles
    Beans
    5
    Distro
    Kubuntu 7.04 Feisty Fawn

    Re: how to copy text from a pdf in Asian language and paste in libreoffice

    EDIT: Sorry didn't read your post properly. So you can enter Hindi text in Libreoffice properly, but copying and pasting doesn't work properly?

    EDIT2: does copying and pasting work properly to Abiword? I came across a post while googling for LibreOffice mangling UTF text that suggests that OO.org can mangle UTF text sent through the clipboard, so it's possible that LibreOffice also suffers from this bug as well.

    This is the post http://user.services.openoffice.org/...p?f=25&t=42580

    It's the only a mention of anything similar I could locate in a quick search, though.

    EDIT 3: I know this is overkill, but have you tried installing the LibreOffice PDF import filter, importing the PDF and working from there?

    ORIGINAL POST BELOW
    First, do you have a font installed on your system and selected in LibreOffice that can display the Asian language? Ubuntu should have suitable fonts preinstalled. the Liberation fonts can display most Asian alphabets and there are specific fonts for others, like Hindi and Khmer installed

    Secondly, AFAIK, LibreOffice sometime has issues displaying some East Asian alphabets correctly. This is a problem it inherited from OpenOfice.org, something which I hoped had been fixed by now.

    You best solution is probably to install Abiword (if it isn't already), which should work fine with all Asian alphabets, assuming you have the correct fonts installed and selected.
    Last edited by andrewc; February 4th, 2012 at 12:59 AM.

  3. #3
    Join Date
    Jun 2010
    Location
    London, England
    Beans
    6,938
    Distro
    Ubuntu Development Release

    Re: how to copy text from a pdf in Asian language and paste in libreoffice

    That PDF has two embedded TrueType fonts one of which is BRH Devanagari. Have you tried finding a copy of that font and installing it on your system.

    We can type in different languages in Ubuntu because we have Unicode fonts installed that have character sets for many languages. So, we are able to set different language keyboard layouts.

    If the writer on that document had used a Unicode font to create those characters you might not have this problem. Instead he has used a particular font that has been embedded into the PDF document so that the document can be read even on computers that do not have that font installed.

    I do not know if this is the answer. I have tried to find this font to download and test out this suggestion but I do not know if I am downloading the right one.

    Regards.
    It is a machine. It is more stupid than we are. It will not stop us from doing stupid things.
    Ubuntu user #33,200. Linux user #530,530


  4. #4
    Join Date
    Jun 2010
    Beans
    380

    Re: how to copy text from a pdf in Asian language and paste in libreoffice

    Quote Originally Posted by andrewc View Post
    EDIT 3: I know this is overkill, but have you tried installing the LibreOffice PDF import filter, importing the PDF and working from there?
    Yes I tried this one and the charachters which I see are like this you can try with pdf I shared at your machine too to confirm it.
    Code:
    #��#�k�Jյ���y�1�'��*�#�N#��U����#�o�>�!
    ��=�0[ң��#NA;b9v�>���T��X��j��##�l�#??����U�/e�7#�dS��n:#�f���7:�]K��PT
    -t#�l#H#V�WKc��2r�ܧ2��;��x�#e��-#�q5#ӠqYf�6c)#�v-G�n��#�os��ϗI��L �#�<紽�%#@��-�yzL<#.
    *U��#���##*e����r7j�:�Kz�d���#���|��l��#Z%��+�.#<#�
    Quote Originally Posted by grahammechanical View Post
    That PDF has two embedded TrueType fonts one of which is BRH Devanagari. Have you tried finding a copy of that font and installing it on your system.
    How did you find that pdf has the font you mentioned?

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •