PDA

View Full Version : Unicode to binary.



Godd
July 30th, 2009, 02:08 AM
I need to be able to convert unicode text to binary for a program I'm writing.

I've looked through the entire binascii library but I can't find what I need.

I can go through unicode to hex to binary, but the binary comes out to something unusable.

\xf0Y\xfa\xc3_\x8bn/\x8d>\x88\xd7\xe2

Something along the lines of that. What I need is just ones and zeros for on and off status.

I think I can do it by using a list to reference to numerical values but a function would be much nicer and faster.

soltanis
July 30th, 2009, 04:47 AM
You just want to convert unicode text (i.e. \u0065) to the binary number (in 0 and 1 format)?

There is the non-portable C extension to printf using the %b format specifier which prints binary representations of numbers. You could do this with:



static char *buffer[1024*5];

char *u2bin(wchar_t *text)
{
int ctr = 0;

while (text[ctr] != L'\0') {
sprintf(buffer+(ctr * sizeof(wchar_t)), "%b", text[ctr]);
ctr++;
}
buffer[ctr] = '\0';

return buffer;
}


I *think*, although there is some type coercion going on there (with the wchar_t to an int, I think, though on UNIX these should be the same).

Godd
July 30th, 2009, 06:07 AM
Ha. I've never even begun to code in C.

I have a friend that may be able to help. But Could someone explain the general idea?

hyperAura
July 30th, 2009, 10:44 PM
so what programming language r u using to write the program? or is it just a shell script u r trying to create?

Godd
August 2nd, 2009, 06:15 AM
Sorry bout reanimating a dead thread but I wasn't able to get to my comp for a few days.

I can't believe I didn't post the language I was programming in on the initial post.

That would be python. Sorry about that.

unutbu
August 2nd, 2009, 05:03 PM
Is this close?


#!/usr/bin/env python

def bin(num):
"""
__source__='http://mail.python.org/pipermail/python-list/2000-March/028379.html'
__author__='wheineman 1wheinemanNOwhSPAM at uconect.net.invalid'
Returns a string binary representation of a number. For example
bin(255) returns '11111111'.
"""
returnValue = ''
mask = 0
i = 0
while(1):
mask = 1 << i
if (i != 0) and (mask > num):
break
if (mask & num):
returnValue = '1' + returnValue
else:
returnValue = '0' + returnValue
i = i + 1
return returnValue

astr=u'Unicode to binary'
bin_list=[bin(ord(elt)) for elt in astr]
print(bin_list)

# ['1010101', '1101110', '1101001', '1100011', '1101111', '1100100', '1100101', '100000', '1110100', '1101111', '100000', '1100010', '1101001', '1101110', '1100001', '1110010', '1111001']

print(''.join(bin_list))
# 10101011101110110100111000111101111110010011001011 00000111010011011111000001100010110100111011101100 00111100101111001