Verminox
May 12th, 2008, 02:39 PM
The Challenge
Write a program implementing methods to serialize and unserialize (http://en.wikipedia.org/wiki/Serialization) data, specifically, Associative Arrays.
Description
You have to be able to take any variable, whether integer, float, string, or array, and serialize it to a string which can be stored somewhere and later unserialized to construct the same variable again.
Although this would seem simple for primitive data types, the main challenge is that your variable might very well be a complex array structure. For example it could be a 2D associative array with each array element having a varying data type.
Your serialize() function should create the string in such a way that your unserialize() function should restore the variable as it was, complete with having the same data types as before.
For an example, see PHP's built in serialize function.
echo serialize(5) . "\n"; // Integer
echo serialize( array( 'foo' => 'bar', 'eggs' => 'spam' ) ); // Associative array
i:5;
a:2:{s:3:"foo";s:3:"bar";s:4:"eggs";s:4:"spam";}
Of course, your format does not have to be like the one above. You can let your serialized string be of any format as long as you can safely unserialize it (see next section).
Needless to say, you can't use built-in functions such as the one above.
String Format
There are two paths you can take when creating your own format to store the serialized variables.
1. Compact. Reduce the size of the output string as much as possible. Basically, optimize the serialization process. This strategy is useful for large amounts of data. Your format does not have to human-readable at all. In fact, if you want to go extreme you can even work on the bit level if you are good with binary data handling.
2. Human Readable. If you choose this category your string format has to be neat and readable so that one can understand what the data is without even having to use the unserialize() function (you still have to implement that though :p). Here string size is not of importance. Human readable format includes anything from a simple comma separated list to a full fledged XML document. It's your choice, go wild!
Choose any one path and go with it. No technnique is given special preference. But whichever technique you use, do it well :)
The PHP format as shown above is a hybrid. It tries to be human readable (but it's really not). It tries to be compact (but it's really not). You can use this format if you are not able to make your own but you will loose points for that.
Judging
Each entry will be graded for:
1. The Format. You can choose either of the above two techniques, no partiality here. But the strength of your technique gets you points. If you choose compact, it has to be really efficient. If you choose readable, it has to be clear and legible.
2. Coding style. If your code is neat and tidy and maybe documented with comments then you are at an advantage.
Bonus Points: (Optional, Not Required)
1. Can handle serialization of Objects. This is easy to do in languages such as PHP which support variable variables for dynamic data member access, but in some languages it might just be impossible.
2. Error reporting for badly formatted strings.
Rules
1. You cannot use external libraries or inbuilt functions for serializing, unserializing, or parsing formats (if you choose XML, parse it yourself!)
2. You may use an external library for implementing associative arrays if your language does not support it (for example, usage of std::map in C++ is allowed)
3. You may use inbuilt functions for typecasting between primitives and strings (such as float to string, and string to float for the reverse process). But if a function exists to typecast arrays to strings and vice versa, you may not use that, as that would lose the point of making your own string format.
4. Your program must pass the primary test (see below). If it fails the test, it is disqualified regardless of how good the format is.
Primary Test
Your program must be able to serialize the following array structure and later unserialize it to recover the array structure as it was previously (complete with the same data types).
For dynamically typed programming languages: (2D associatve hybrid structure)
array (
'squares' => array (
'one' => 1,
'two' => 4,
'three' => 9
) ,
'roots' => array (
'four' => 2.0,
'three' => 1.732
) ,
'foo' => array (
'bar' => 'baz',
'eggs' => 'spam'
)
);
For statically typed langauges (array of array of integers)
array (
'squares' => array (
'one' => 1,
'two' => 4,
'three' => 9
) ,
'roots' => array (
'four' => 2,
'nine' => 3
)
);
Presentation
1. Mention which technique you are using, compact or readable.
2. Post your code that defines the serialize() and unserialize() functions (of course you can name them differently if you want).
3. Post the test script.
3.1. Construct the array of the primary test (above).
3.2. Call the serialize() method and store the string in a variable.
3.3. Print the serialized string to standard output.
3.4. Call unserialize() on the string which should return the same array that you created.
3.5. Prove that the returned array is same as before either by calling a variable dump function if it exists (such as var_dump() in PHP) or just print some test calls (such as print $array['squares']['two'] should print 4)
4. Post the output of the test script. This will contain the serialized string (for me to judge your format) and the variable dump (for me to be convinced that you passed the test).
5. If you implemented any of the Bonus features or if you put some of your own extra goodies into it then show an example of that too (with output).
Languages
You can use any language you want since you will be providing the output also. But don't try to cheat, as I will also test whatever I can myself. Just because I can't compile it doesn't mean I can't get a friend to do it ;)
Deadline
The deadline for this challenge is:
May 20, 2008. 12:00 (noon) GMT (5:30PM Indian Standard Time). I will then judge for the rest of the evening and announce the winner.
You may post queries or clarifications if required. Happy coding :D
Notes:
If you post one version first, then maybe fix some bugs or something and repost your updated code, please edit your previous post and mention at the top that there is a newer version so that I don't judge the wrong code.
Write a program implementing methods to serialize and unserialize (http://en.wikipedia.org/wiki/Serialization) data, specifically, Associative Arrays.
Description
You have to be able to take any variable, whether integer, float, string, or array, and serialize it to a string which can be stored somewhere and later unserialized to construct the same variable again.
Although this would seem simple for primitive data types, the main challenge is that your variable might very well be a complex array structure. For example it could be a 2D associative array with each array element having a varying data type.
Your serialize() function should create the string in such a way that your unserialize() function should restore the variable as it was, complete with having the same data types as before.
For an example, see PHP's built in serialize function.
echo serialize(5) . "\n"; // Integer
echo serialize( array( 'foo' => 'bar', 'eggs' => 'spam' ) ); // Associative array
i:5;
a:2:{s:3:"foo";s:3:"bar";s:4:"eggs";s:4:"spam";}
Of course, your format does not have to be like the one above. You can let your serialized string be of any format as long as you can safely unserialize it (see next section).
Needless to say, you can't use built-in functions such as the one above.
String Format
There are two paths you can take when creating your own format to store the serialized variables.
1. Compact. Reduce the size of the output string as much as possible. Basically, optimize the serialization process. This strategy is useful for large amounts of data. Your format does not have to human-readable at all. In fact, if you want to go extreme you can even work on the bit level if you are good with binary data handling.
2. Human Readable. If you choose this category your string format has to be neat and readable so that one can understand what the data is without even having to use the unserialize() function (you still have to implement that though :p). Here string size is not of importance. Human readable format includes anything from a simple comma separated list to a full fledged XML document. It's your choice, go wild!
Choose any one path and go with it. No technnique is given special preference. But whichever technique you use, do it well :)
The PHP format as shown above is a hybrid. It tries to be human readable (but it's really not). It tries to be compact (but it's really not). You can use this format if you are not able to make your own but you will loose points for that.
Judging
Each entry will be graded for:
1. The Format. You can choose either of the above two techniques, no partiality here. But the strength of your technique gets you points. If you choose compact, it has to be really efficient. If you choose readable, it has to be clear and legible.
2. Coding style. If your code is neat and tidy and maybe documented with comments then you are at an advantage.
Bonus Points: (Optional, Not Required)
1. Can handle serialization of Objects. This is easy to do in languages such as PHP which support variable variables for dynamic data member access, but in some languages it might just be impossible.
2. Error reporting for badly formatted strings.
Rules
1. You cannot use external libraries or inbuilt functions for serializing, unserializing, or parsing formats (if you choose XML, parse it yourself!)
2. You may use an external library for implementing associative arrays if your language does not support it (for example, usage of std::map in C++ is allowed)
3. You may use inbuilt functions for typecasting between primitives and strings (such as float to string, and string to float for the reverse process). But if a function exists to typecast arrays to strings and vice versa, you may not use that, as that would lose the point of making your own string format.
4. Your program must pass the primary test (see below). If it fails the test, it is disqualified regardless of how good the format is.
Primary Test
Your program must be able to serialize the following array structure and later unserialize it to recover the array structure as it was previously (complete with the same data types).
For dynamically typed programming languages: (2D associatve hybrid structure)
array (
'squares' => array (
'one' => 1,
'two' => 4,
'three' => 9
) ,
'roots' => array (
'four' => 2.0,
'three' => 1.732
) ,
'foo' => array (
'bar' => 'baz',
'eggs' => 'spam'
)
);
For statically typed langauges (array of array of integers)
array (
'squares' => array (
'one' => 1,
'two' => 4,
'three' => 9
) ,
'roots' => array (
'four' => 2,
'nine' => 3
)
);
Presentation
1. Mention which technique you are using, compact or readable.
2. Post your code that defines the serialize() and unserialize() functions (of course you can name them differently if you want).
3. Post the test script.
3.1. Construct the array of the primary test (above).
3.2. Call the serialize() method and store the string in a variable.
3.3. Print the serialized string to standard output.
3.4. Call unserialize() on the string which should return the same array that you created.
3.5. Prove that the returned array is same as before either by calling a variable dump function if it exists (such as var_dump() in PHP) or just print some test calls (such as print $array['squares']['two'] should print 4)
4. Post the output of the test script. This will contain the serialized string (for me to judge your format) and the variable dump (for me to be convinced that you passed the test).
5. If you implemented any of the Bonus features or if you put some of your own extra goodies into it then show an example of that too (with output).
Languages
You can use any language you want since you will be providing the output also. But don't try to cheat, as I will also test whatever I can myself. Just because I can't compile it doesn't mean I can't get a friend to do it ;)
Deadline
The deadline for this challenge is:
May 20, 2008. 12:00 (noon) GMT (5:30PM Indian Standard Time). I will then judge for the rest of the evening and announce the winner.
You may post queries or clarifications if required. Happy coding :D
Notes:
If you post one version first, then maybe fix some bugs or something and repost your updated code, please edit your previous post and mention at the top that there is a newer version so that I don't judge the wrong code.