View Full Version : Vala: text file needs to be UTF-8
kumoshk
November 26th, 2009, 08:19 AM
I get the following error,
(reader:8018): Gtk-CRITICAL **: gtk_text_buffer_emit_insert: assertion `g_utf8_validate (text, len, NULL)' failed
while running the following line of code:
string text;
FileUtils.get_contents (filename, out text, null);
It seems that I'm getting this because the file isn't UTF-8 and it has non-ascii characters (it's ISO-8859-1).
How do I convert it to UTF-8?
LKjell
November 26th, 2009, 08:43 AM
Open the file in gedit and use save as to save. Set encoding to utf-8. You might have to edit the file afterwards so convert those character to utf-8
kumoshk
November 26th, 2009, 08:46 AM
Open the file in gedit and use save as to save. Set encoding to utf-8. You might have to edit the file afterwards so convert those character to utf-8
Thanks for the advice—but, I need to do it in the program (in a cross-platform manner). So this won't work. I want the program itself to be able to handle all files in iso-8859-1 encoding (by converting it to UTF-8, since Vala doesn't do anything but UTF-8 itself).
I know I can change it manually, is what I'm saying.
LKjell
November 26th, 2009, 08:56 AM
If I remember correctly you need iconv then. It should convert from one encoding to another.
kumoshk
November 26th, 2009, 11:56 AM
Thanks! That looks like it should work, as soon as I can figure out the documentation on the library for it (which does exist for Vala, thankfully).
kumoshk
December 3rd, 2009, 07:34 AM
How do I get the encoding of a .txt file?
To convert my file to UTF-8 properly, I need to know what its original encoding is. So, without that information, it looks like I have to choose one encoding for my program to convert, and forget about the rest (unless I allow the user to select the encoding manually).
Arndt
December 3rd, 2009, 08:35 AM
How do I get the encoding of a .txt file?
To convert my file to UTF-8 properly, I need to know what its original encoding is. So, without that information, it looks like I have to choose one encoding for my program to convert, and forget about the rest (unless I allow the user to select the encoding manually).
The program 'file' at least can distinguish between UTF-8, UTF-16 and Latin-1, probably many more.
Interactively, you can give the text file to Emacs or Firefox and they will try to select the correct encoding.
kumoshk
December 3rd, 2009, 08:56 AM
The program 'file' at least can distinguish between UTF-8, UTF-16 and Latin-1, probably many more.
Interactively, you can give the text file to Emacs or Firefox and they will try to select the correct encoding.
I believe I need a solution that could be cross-platform more easily. It didn't recognize the charactersets I tried, though, but it could be cool if I need to distinguish between Unicode formats.
Until I have a solution, I'll just let my saved configuration file specify a secondary characterset—and tell people in the help that they can change it there (or I'll make a menu option); UTF-8 is the default, so that's why I say a secondary characterset (although that does require them to know what it is, unfortunately).
LKjell
December 3rd, 2009, 09:07 AM
You should be able to distinguish between utf-8 and utf-16 since there are some bytes that are illegal in utf-8 but legal in utf-16.
kumoshk
December 3rd, 2009, 10:02 AM
You should be able to distinguish between utf-8 and utf-16 since there are some bytes that are illegal in utf-8 but legal in utf-16.
Yeah. For this particular project I don't need to convert between Unicode formats, since they can all display pretty much the same stuff anyway. I just need to convert non-Unicode formats to UTF-8 (Vala's default encoding).
At least, I think I don't need to worry about that. I actually haven't tried loading a utf-16 or 32 file yet.
EDIT: Yep, I need to worry about it. GLib.convert doesn't seem to work on it, either (although it's worked on all the non-Unicode stuff). I'll have to try Iconv.
Powered by vBulletin® Version 4.2.2 Copyright © 2024 vBulletin Solutions, Inc. All rights reserved.