Find the answer to your Linux question:
Results 1 to 2 of 2
This blew up because I was trying to display the contents of a file in a gtk window and it kept telling me that this wasn't a valid utf8 string. ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Linux Engineer hazel's Avatar
    Join Date
    May 2004
    Location
    Harrow, UK
    Posts
    1,199

    Problem with character conversion


    This blew up because I was trying to display the contents of a file in a gtk window and it kept telling me that this wasn't a valid utf8 string. I was puzzled because the text was plain ASCII (no higher order ansi codes) and ASCII is supposed to be subset of utf8.

    So I wrote a little test program that writes a line into a file, reads it back into a buffer, and tests it. g_utf8_validate() continued to reject it and g_get_charset() said it was ANSI_X3.4_1968.

    OK, I thought, I'll use one of glib's conversion functions to turn it into utf8. g_locale_to_utf8() seemed like the right one. Here is the code:
    Code:
    int main (int argc, char *argv[])
    {
    	FILE *in, *out;
    	gchar *buffer, *newbuffer; 
    	const char *codeset;
    	gsize bytes_in, bytes_out;
    	GError *oops = NULL;
    
    	out = fopen ("testfile", "w");
    	fputs ("This is the first line", out);
    	fclose (out);
    
    	in = fopen ("testfile", "r");
    	buffer = (gchar *)alloca(80);
    	fgets (buffer, 80, in);
    
    	fclose (in);
    	
    	g_get_charset (&codeset);
    	printf ("You are using %s\n", codeset);
    	
    	newbuffer = (gchar *)alloca(80);
    	newbuffer = g_locale_to_utf8 (buffer, -1, &bytes_in, &bytes_out, &oops);
    /* Check for correct conversion */	
    	printf ("g_locale_to_utf read %i bytes and wrote %i bytes\n", bytes_in,
    		       	bytes_out);
    	if (oops) 
    	{
    		puts (oops->message);
    	}
    	else
    	{
    		printf ("New buffer contains converted text: %s\n", newbuffer);
    	}
    
    	if (g_utf8_validate (newbuffer, 80, NULL))
    	{
    		printf ("%s is a valid utf string\n", newbuffer);
    	}
    	else
    	{
    		printf ("\"%s\" is not a valid utf string\n", newbuffer);
    	}
    	exit (0);
    }
    and here is the output:

    You are using ANSI_X3.4-1968
    g_locale_to_utf read 22 bytes and wrote 22 bytes
    New buffer contains converted text: This is the first line
    "This is the first line" is not a valid utf string

    As you can see, the conversion looks as if it has gone through successfully: the right number of characters has been processed, the return buffer contains the text and there is no error message. Yet g_utf8_validate() still says it's not valid.

    What am I doing wrong?
    "I'm just a little old lady; don't try to dazzle me with jargon!"

  2. #2
    Linux Engineer hazel's Avatar
    Join Date
    May 2004
    Location
    Harrow, UK
    Posts
    1,199
    Solved it! g_utf8_validate() and functions that use it (like gtk_text_buffer_set_text()) misbehave if you give them an explicit string length which is longer than the actual length. You need to use "-1" for the length to make them react properly to the null string terminator. My file contents display now, without needing conversion.
    "I'm just a little old lady; don't try to dazzle me with jargon!"

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •