Page 1 of 3 123 LastLast
Results 1 to 10 of 23

Thread: Unable to read Cyrillic/Russian file names mounting with ext2

  1. #1
    Join Date
    May 2007
    Beans
    22

    Question Unable to read Cyrillic/Russian file names mounting with ext2

    Hi all, I have the following challenge:

    I have to copy files from a HD, from a NAS.
    It's formatted ext2.
    I can mount the drive perfectly using:
    mount -t auto /dev/md0 /media/mynewdriv

    But when I scan through the folders, Cyrillic (Russian) filenames are shown as:
    _______ _______.doc
    ~$????? ???????.doc
    ~$_____ _______.doc
    ??????? ???????.doc

    I can't find a way to mount ext2 as unicode, so what are my options to view these files properly?

  2. #2
    Join Date
    May 2007
    Beans
    22

    Re: Unable to read Cyrillic/Russian file names mounting with ext2

    Bump?

  3. #3
    Join Date
    Apr 2006
    Location
    London
    Beans
    212
    Distro
    Ubuntu

    Re: Unable to read Cyrillic/Russian file names mounting with ext2

    Some investigation is needed. Things to find out include:

    - do Unicode characters work in your system generally, i.e. Cyrillic fonts?

    - are the filenames wrong in all applications or only some?

    - does changing the font used to a known good Unicode font (e.g. Arial Unicode MS) help?

    Since UTF-8/Unicode normally works without any setup on Ubuntu, and this hard disk is from a NAS that presumably used Samba to serve Windows clients, my suspicion is that it includes filenames that are non-Unicode but are being interpreted as Unicode (not by the ext2 filesystem code, but by applications).

    See http://www.samba.org/samba/docs/man/...n/unicode.html for some details from a Samba centric perspective on how to configure Unicode etc. If your filenames/filesystem are in a pre-UTF8 character set, it's best to convert them - see the convmv tool in particular to do this in-place. Also, see http://tldp.org/HOWTO/Unicode-HOWTO.html - the main Linux Unicode HOWTO. Rather old but has some good pointers.

    Please back up all your data before using convmv (ideally with an image backup, e.g. Clonezilla Live CD - otherwise you may lose the files that have odd names if the backup program doesn't like these). If you do use another program (sbackup is very easy to setup up, just do sudo aptitude install sbackup), please double check that the files with odd characters in the name were backed up OK (do a test restore of a few of them).
    Last edited by Cato2; September 10th, 2009 at 02:16 PM. Reason: clarification - ext2/ext3 doesn't have a mount as utf8 option

  4. #4
    Join Date
    May 2007
    Beans
    22

    Re: Unable to read Cyrillic/Russian file names mounting with ext2

    Hi! Thanks for this great reply!
    It gives me something to hold on to.

    First some background-info:
    These drives come from a 2-drive NAS (DNS-323).
    They're formatted ext2 in a JBOD config.
    I've mounted them to from my ubuntu-live using mdadm, to rebuild the array (MD0) again, and them mounted this array with [mount -t ext2 /dev/md0 /media/mynewdrive]

    While these drives were in my NAS, I could mount them with cifs iocharset=UTF8, and cyrillic characters were no problem.

    But mounting these drives directly with ext2, gives me unusable filenames.
    Does this put a focus on the solution?

    By the way: How can you mount ext2 as utf8? afaik, it can't be done with ext2.
    Last edited by Bl4deRunner; September 7th, 2009 at 11:26 PM.

  5. #5
    Join Date
    Apr 2006
    Location
    London
    Beans
    212
    Distro
    Ubuntu

    Re: Unable to read Cyrillic/Russian file names mounting with ext2

    ext2 and ext3 don't care at all about filename encodings, unlike NTFS which uses a superset of UTF-16 (basically Unicode as 1 or more 16-bit words, whereas UTF-8 is 1 or more 8-bit bytes). Generally UTF-8 is used as a network format and by Samba/SMB, while Mac HFS+ uses a weirdly normalized variant of UTF-16 I think.

    All 'native' Linux filesystems (i.e. not VFAT, NTFS, HFS+, etc) can use UTF-8 or any other filename encoding - it's completely up to the applications using the filesystem.

    You don't need to mount ext2 as utf8, as it simply passes the bytes through. There are a few possibilities:

    1. your filenames are corrupted (hence you should do a backup first from a system or live CD that can see the correct filenames, or an image backup). Or simply mount the filesystem read-only (see 'man mount') to protect it.

    2. the application (e.g. terminal, ls, etc) doesn't know the filenames are utf8 encoded - see HOWTOs on locale setup, e.g. http://lists.debian.org/debian-user/.../msg00243.html

    3. the application is using a font that doesn't include Cyrillic characters for the Unicode codepoints (character values) used in the utf8 filenames

    Please answer every one of my other questions, they will help diagnose what's happening here.

    One question: which types of clients originally created the files on the NAS - Windows, Mac, Linux?

    Also, the output of the following commands would be useful

    1. mount

    2. ls (on a directory with these files)

    3. ls >/tmp/test.txt; file /tmp/test.txt

    The last one will say what encoding you actually have.
    Last edited by Cato2; September 8th, 2009 at 08:46 AM.

  6. #6
    Join Date
    May 2007
    Beans
    22

    Re: Unable to read Cyrillic/Russian file names mounting with ext2

    Thnx for your time, Cato.
    This is as much as I can tell from my work, tonight I can tell you more when I'm behind my system:

    These files were created either using Ubuntu 8.10 through a mounted NAS using cifs & iocharset=UTF8, or Windows XP.
    It's hard to tell now which system created what file, since we use both systems simultaneously at home.

    Right now I'm using Ubuntu 9.04 Live (on USB) to recover these files & put them on my NAS (DNS-323 with new harddrives).

    To view the files I use Nautilus & Terminal, Nautilus supports Cyrillic(utf8) for sure, and both show the same filenames:

    ??????? ???????.doc
    _______ _______.doc
    ~$????? ???????.doc
    ~$_____ _______.doc

    Where I have to note that, after opening these files appear to be identical:
    _______ _______.doc
    ??????? ???????.doc

    I didn't check:
    ~$????? ???????.doc
    ~$_____ _______.doc
    But I guess these files are left after a crashed OpenOffice session.

    I'll post the results to your questions when I'm at home.

  7. #7
    Join Date
    May 2007
    Beans
    22

    Re: Unable to read Cyrillic/Russian file names mounting with ext2

    1) mount
    Code:
    proc on /proc type proc (rw)
    sysfs on /sys type sysfs (rw)
    tmpfs on /lib/modules/2.6.28-11-generic/volatile type tmpfs (rw,mode=0755)
    tmpfs on /lib/modules/2.6.28-11-generic/volatile type tmpfs (rw,mode=0755)
    tmpfs on /lib/init/rw type tmpfs (rw,nosuid,mode=0755)
    varrun on /var/run type tmpfs (rw,nosuid,mode=0755)
    varlock on /var/lock type tmpfs (rw,noexec,nosuid,nodev,mode=1777)
    udev on /dev type tmpfs (rw,mode=0755)
    tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
    devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=620)
    rootfs on / type rootfs (rw)
    /dev/sdc1 on /cdrom type vfat (ro,noatime,fmask=0022,dmask=0022,codepage=cp437,iocharset=iso8859-1)
    /dev/loop0 on /rofs type squashfs (ro,noatime)
    fusectl on /sys/fs/fuse/connections type fusectl (rw)
    tmpfs on /tmp type tmpfs (rw,nosuid,nodev)
    binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,noexec,nosuid,nodev)
    //192.168.0.2/Volume_2 on /media/nas2 type cifs (rw,mand)
    //192.168.0.2/Volume_1 on /media/nas1 type cifs (rw,mand)
    /dev/md0 on /media/disk type ext2 (rw,nosuid,nodev,uhelper=hal)
    gvfs-fuse-daemon on /home/ubuntu/.gvfs type fuse.gvfs-fuse-daemon (rw,nosuid,nodev,user=ubuntu)
    2)ls
    Code:
    ~$_____ _______.doc  _______ __ ______.doc  ??????? ?? ??????.doc  Vyazanie - model platjev.doc
    ~$????? ???????.doc  _______ _______.doc    ??????? ???????.doc
    3)ls >/tmp/test.txt; file /tmp/test.txt
    Code:
    /tmp/test.txt: ASCII text
    Does it help?

  8. #8
    Join Date
    Apr 2006
    Location
    London
    Beans
    212
    Distro
    Ubuntu

    Re: Unable to read Cyrillic/Russian file names mounting with ext2

    Thanks for that - the last one is interesting as it implies any high-bit characters (8th bit set) are being lost completely somewhere.

    Can you also answer these questions from an earlier post?

    - do Unicode characters work in your system generally, i.e. Cyrillic fonts?

    - are the filenames wrong in all applications or only some?

    - does changing the font used to a known good Unicode font (e.g. Arial Unicode MS) help?

    I think the key thing is to understand which character encodings (UTF-8, KOI8-R, ISO-8859-x, CP85x - aka Windows codepage) is used by these files. It may well vary between different directories and files, depending on the client and how it mounted the Samba volume on the NAS.

    Experimenting with different locales (at the level of your terminal, so ls will pick this up) is the best thing to try in my view.

    What's your current locale? Look in output of 'env' for LANG, LC_*, etc.

    To add a new locale in Ubuntu, try (use whatever local makes sense for you, and check it exists in Ubuntu, see 2nd step):

    1. sudo locale-gen de_DE@euro

    2. A list of possible values (instead of de_de@euro) can be looked up in the file /usr/share/i18n/SUPPORTED.

    3. Then sudo dpkg-reconfigure locales

    4. Then test with ls, less, etc. Also try Nautilus perhaps.
    Last edited by Cato2; September 9th, 2009 at 12:32 PM.

  9. #9
    Join Date
    May 2007
    Beans
    22

    Re: Unable to read Cyrillic/Russian file names mounting with ext2

    - do Unicode characters work in your system generally, i.e. Cyrillic fonts?
    Yes, I can manually rename the files with Cyrillic letters. And it shows properly in Nautilus as in the terminal.

    - are the filenames wrong in all applications or only some?
    All applications I use show the same, when I "browse" for these files. Cyrillic font's isn't a problem anywhere in Ubuntu 9.04.
    And I ~could~ access these files properly when the disks were in a NAS mounted through cifs & iocharset=utf8. But now, when mounting the drives directly through mdadmin & mount ext2, it's unreadable.

    I just discovered that it isn't only Cyrillic font's:
    I had a folder with the name Niña Pastori, but now it shows Ni?a Pastori.

    - does changing the font used to a known good Unicode font (e.g. Arial Unicode MS) help?
    Well, standard Ubuntu fonts work fine showing Cyrillic and other special fonts. It's just the files on this mounted harddrive...

    Through Ubuntu settings I've already added Russian language support. I'll try the other suggestions later when I'm at home again.

    I'm suspecting D-Link did something funny with their firmware to support cyrillic font's, because it wasn't supported initially.

  10. #10
    Join Date
    Apr 2006
    Location
    London
    Beans
    212
    Distro
    Ubuntu

    Re: Unable to read Cyrillic/Russian file names mounting with ext2

    See http://www.j3e.de/linux/convmv/man/ - this is a very useful tool to migrate filenames from one encoding to another, but please back up first! It has an interactive mode so you can see if it makes sense.

    In particular, see the section on Samba at the end - this may be what has happened, e.g. the client is working as utf8 on the wire but Samba is doing (perhaps) CP850 or some other single-byte character set in the filesystem. Once the disk is not accessed via Samba you get these issues.

    For the 'Niña Pastori' folder, can you do this and show the results?

    ls Ni*astori | od -c -td1

    This may help establish what character set is in use, but most likely you have >1 single byte character sets - e.g. ISO-8859-1 for this one, and CP1251 for the Cyrillic filenames (see http://www.fingertipsoft.com/ref/cyrillic/cp1251.html )

    Doing this for a Cyrillic filename would be good, as long as you can paste in the 'real' filename in UTF8 here.

    See also http://ubuntuforums.org/showthread.php?t=123350 - not very relevant but people in that thread had similar problems I think.
    Last edited by Cato2; September 9th, 2009 at 01:22 PM.

Page 1 of 3 123 LastLast

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •