nvteighen
April 5th, 2010, 01:54 PM
Hi there!
As we all know, all input and output is character-based. This may seem obvious and natural, but it has some counter-intuitive consequences like the fact that the "newline" concept is also abstracted as a character ('\n') and that even the "backspacing" and "carriage return" concepts also are ('\b' and '\r', respectively). Examples of their counter-intuitiveness are the many times we see people in these forums wondering why getchar() is also reading the '\n'...
So, as '\b' (backspace) is also a character, I asked myself at what extent the act of deleting a character is itself represented as a character. In other words, is backspace really the '\b' character all the time? If it was, then there'd be the possibility to overflow an input buffer by hitting backspace repeatedly.
With that idea in mind, I wrote the following test unit in C. I did it in C because it's the lowest-level language I know and because this is essentially a low-level issue.
#include <stdio.h>
#include <string.h>
int main(void)
{
const char *string_1 = "hello?";
const char *string_2 = "hello!\b?";
printf("%d\n", strcmp(string_1, string_2));
printf("\"hello?\": %s\n", string_1);
printf("'hello!\\b?': %s\n", string_2); /* The \b is 'shown'. */
return 0;
}
If you run the program, you'll pretty much see that '\b' is a character like any other one present in the string such as the strcmp() call will effectively return that string_1 and string_2 are different. They're printed in the "same" way just because of '\b' "illusion". But you could say that the absence of '!' in string_2's output is the mark for the presence of '\b'.
So, the next step was to test input. Here's the test unit:
#include <stdio.h>
#include <string.h>
int main(void)
{
const char *test1 = "hello?\n";
const char *test2 = "hello!\b?\n";
char input[10];
memset(input, '\0', sizeof(input));
printf("Input \"hello!\\b?\\n\": ");
fgets(input, sizeof(input), stdin);
printf("input: %s\n", input);
printf("test 1 (= \"hello?\\n\"): %d\n", strcmp(test1, input));
printf("test 2 (= \"hello!\\b?\\n\"): %d\n", strcmp(test2, input));
return 0;
}
The idea is to compare the user's input with two different tests, in order to observe how a 'hello!\b?\n' input is compared to a 'hello!\b?\n' string variable. The result is strikingly odd... the input is equal to 'hello?\n'. The '\b' "illusion" happening in output doesn't exist during input: The character previous to the backspace is actually deleted, not just hidden from screen, and therefore, never saved into the input variable in the program.
This makes it clear there's no chance to overflow a text buffer with '\b', but means a quite big inconsistency with, say, '\n', which is recorded from input when present. This means that having backspace abstracted as '\b' proves to be unsecure/useless/inappropriate and instead of having another approach, someone just implemented a "half-baked" solution... which is good, but that leads to this weird result of having two expectedly equal strings to be unequal.
Thoughts?
As we all know, all input and output is character-based. This may seem obvious and natural, but it has some counter-intuitive consequences like the fact that the "newline" concept is also abstracted as a character ('\n') and that even the "backspacing" and "carriage return" concepts also are ('\b' and '\r', respectively). Examples of their counter-intuitiveness are the many times we see people in these forums wondering why getchar() is also reading the '\n'...
So, as '\b' (backspace) is also a character, I asked myself at what extent the act of deleting a character is itself represented as a character. In other words, is backspace really the '\b' character all the time? If it was, then there'd be the possibility to overflow an input buffer by hitting backspace repeatedly.
With that idea in mind, I wrote the following test unit in C. I did it in C because it's the lowest-level language I know and because this is essentially a low-level issue.
#include <stdio.h>
#include <string.h>
int main(void)
{
const char *string_1 = "hello?";
const char *string_2 = "hello!\b?";
printf("%d\n", strcmp(string_1, string_2));
printf("\"hello?\": %s\n", string_1);
printf("'hello!\\b?': %s\n", string_2); /* The \b is 'shown'. */
return 0;
}
If you run the program, you'll pretty much see that '\b' is a character like any other one present in the string such as the strcmp() call will effectively return that string_1 and string_2 are different. They're printed in the "same" way just because of '\b' "illusion". But you could say that the absence of '!' in string_2's output is the mark for the presence of '\b'.
So, the next step was to test input. Here's the test unit:
#include <stdio.h>
#include <string.h>
int main(void)
{
const char *test1 = "hello?\n";
const char *test2 = "hello!\b?\n";
char input[10];
memset(input, '\0', sizeof(input));
printf("Input \"hello!\\b?\\n\": ");
fgets(input, sizeof(input), stdin);
printf("input: %s\n", input);
printf("test 1 (= \"hello?\\n\"): %d\n", strcmp(test1, input));
printf("test 2 (= \"hello!\\b?\\n\"): %d\n", strcmp(test2, input));
return 0;
}
The idea is to compare the user's input with two different tests, in order to observe how a 'hello!\b?\n' input is compared to a 'hello!\b?\n' string variable. The result is strikingly odd... the input is equal to 'hello?\n'. The '\b' "illusion" happening in output doesn't exist during input: The character previous to the backspace is actually deleted, not just hidden from screen, and therefore, never saved into the input variable in the program.
This makes it clear there's no chance to overflow a text buffer with '\b', but means a quite big inconsistency with, say, '\n', which is recorded from input when present. This means that having backspace abstracted as '\b' proves to be unsecure/useless/inappropriate and instead of having another approach, someone just implemented a "half-baked" solution... which is good, but that leads to this weird result of having two expectedly equal strings to be unequal.
Thoughts?