Page 1 of 2 12 LastLast
Results 1 to 10 of 12

Thread: size of a 1-character file == 2

  1. #1
    Join Date
    Aug 2012
    Beans
    623

    size of a 1-character file == 2

    Hello,
    I see that the size of a file which has only 1 character is 2 bytes. However, I would like to know what the second character is. I thought it would be EOF or something.
    Code:
    IAMTubby@IAMTubby-Inspiron-1545:~/Desktop$ cat onecharacter.txt 1
    IAMTubby@IAMTubby-Inspiron-1545:~/Desktop$ ls -l onecharacter.txt 
    -rw-rw-r-- 1 IAMTubby IAMTubby 2 May 26 18:23 onecharacter.txt
    But on printing the file in hex, I see that the one-character-file has OA once, whereas a file with a character followed by a new line has two OAs. Following is the hexdump of the files whose names are self-explanatory
    Code:
    IAMTubby@IAMTubby-Inspiron-1545:~/Desktop$ hexedit Justonecharacter.txt 
    00000000   31 0A                                                                                                                1.
    
    IAMTubby@IAMTubby-Inspiron-1545:~/Desktop$ hexedit onecharacterPlusEnter.txt 
    00000000   31 0A 0A                                                                                                             1..
    I was almost beginning to feel EOF == 2 consective EOFs, but the hexdump of 2ConsecutiveEnter.txt leaves me still confused on what the EOF really is and how it's arrived at.
    Last edited by IAMTubby; May 26th, 2014 at 02:26 PM.

  2. #2
    Join Date
    Aug 2010
    Location
    Lancs, United Kingdom
    Beans
    1,588
    Distro
    Ubuntu Mate 16.04 Xenial Xerus

    Re: size of a 1-character file == 2

    A file on a Linux filesystem does not have any EOF marker as such. In your first file, you have the character '1' and a linefeed (end of line marker), and the second one has '1' followed by 2 linefeeds. You can create a file without an end of line marker thus:
    Code:
    echo -n 1 > justone.txt
    and its size will be precisely 1 byte.

  3. #3
    Join Date
    Aug 2012
    Beans
    623

    Re: size of a 1-character file == 2

    Quote Originally Posted by spjackson View Post
    In your first file, you have the character '1' and a linefeed (end of line marker), and the second one has '1' followed by 2 linefeeds.
    Thanks spjackson,
    But I did not enter the linefeed in my first file. Is it inserted by default?

    echo -n 1 > justone.txt
    Tried and works!

    A file on a Linux filesystem does not have any EOF marker as such.
    How then does the system come to know that it's the end of the file ?

  4. #4
    Join Date
    Aug 2010
    Location
    Lancs, United Kingdom
    Beans
    1,588
    Distro
    Ubuntu Mate 16.04 Xenial Xerus

    Re: size of a 1-character file == 2

    I'm an application programmer and not an expert on system internals but I think it is determined from the size recorded in the inode.

  5. #5
    Join Date
    Aug 2011
    Location
    47°9′S 126°43W
    Beans
    2,172
    Distro
    Ubuntu 16.04 Xenial Xerus

    Re: size of a 1-character file == 2

    Quote Originally Posted by IAMTubby View Post
    But I did not enter the linefeed in my first file. Is it inserted by default?
    Depends on the text editor you used. Note that "wc -l" actually counts the linefeeds, so the "1" file above has zero lines... Edit: and this makes perfect sense: if you concatenate a file with one line to it, this is still one linefeed for the whole file, which is therefore considered to have only one line. So the line count of the concatenation is the sum of the line counts of its constituents.

    Quote Originally Posted by IAMTubby View Post
    How then does the system come to know that it's the end of the file ?
    The file descriptor contains the length in bytes. Some older file systems won't support files bigger than 2G or 4G because they store this size on 31 or 32 bits. In some very old file systems (CP/M, IIRC) the file system kept the size in sectors. This wasn't a problem for binary files (that usually have internal structures that indicate the size), and text files used a specific character (Ctrl-Z) to indicate the end of the actual text in the last sector.
    Last edited by ofnuts; May 26th, 2014 at 06:39 PM.
    Warning: unless noted otherwise, code in my posts should be understood as "coding suggestions", and its use may require more neurones than the two necessary for Ctrl-C/Ctrl-V.

  6. #6
    Join Date
    Aug 2012
    Beans
    623

    Re: size of a 1-character file == 2

    Quote Originally Posted by ofnuts View Post
    The file descriptor contains the length in bytes.
    Thans ofnuts.

    But, I need some more help here. I've used the EOF inside a C program few times that I've almost started to believe it's what signals the end of the road. Consider the code below, assuming "test.txt" contains just numbers from 0-9. That's it, no newline, just numbers.
    Code:
    #include <stdio.h>
    
    int main(void)
    {
     FILE* fp;
     char c;
    
    
     fp = fopen("test.txt","r");
    
    
     while(1)
     {
      c = getc(fp);
    
    
      if(c == EOF)
      {
       printf("EOF == [%d] reached, breaking now\n",c);
       break;
      }
      printf("read : [%c]==[%d]\n",c,c);
     }
    
    
     return 0;
    }
    And this is the output
    Code:
    read : [0]==[48]
    read : [1]==[49]
    read : [2]==[50]
    read : [3]==[51]
    read : [4]==[52]
    read : [5]==[53]
    read : [6]==[54]
    read : [7]==[55]
    read : [8]==[56]
    read : [9]==[57]
    read : [
    ]==[10]
    EOF == [-1] reached, breaking now
    This basically shows me that there are two characters which I haven't entered - 1.linefeed(ascii == 10) and 2.EOF, I guess(value == -1)

    Can you tell me something about each of these two characters ? I thought about it a bit and came up with these possible answers.
    1.(the linefeed) - as you said, probably, the text editor inserts this. But why ?
    2.I'm basically interested in knowing when the EOF is inserted.
    - Is it like, as you said, the inode tells the filesize, after which an EOF character(-1) is inserted so that C programs know when to stop reading ? OR
    - Is it like, newline and EOF are two characters that are always there in any file, by default, and these are concatenated to the file contents when you do a :wq! in vi. ?


    Thanks.
    Last edited by IAMTubby; May 28th, 2014 at 06:11 AM.

  7. #7
    Join Date
    Aug 2012
    Beans
    623

    Re: size of a 1-character file == 2

    Quote Originally Posted by spjackson View Post
    I think it is determined from the size recorded in the inode.
    Thanks spjackson.

  8. #8
    Join Date
    Aug 2011
    Location
    47°9′S 126°43W
    Beans
    2,172
    Distro
    Ubuntu 16.04 Xenial Xerus

    Re: size of a 1-character file == 2

    • Linefeed: is actually part of the file. Either you do something that your editor interprets as a line feed or it adds one by default at the end of the file (but I cannot tell you why, the editor I use (KDE's kate) doesn't). Two images: the first one, the file without a LF in the editor, notice the size and how the second "echo" output appears on the same line as the "cat" output, the second, with a LF in the editor, and the corresponding outputs.


    showlf-4.pngshowlf-5.png


    • The EOF isn't really a character, it's a special value used to tell you there are no more characters, a bit like NULL is returned by other calls to tell you that you can't have what you are asking (more memory or else).
    Warning: unless noted otherwise, code in my posts should be understood as "coding suggestions", and its use may require more neurones than the two necessary for Ctrl-C/Ctrl-V.

  9. #9
    Join Date
    Aug 2012
    Beans
    623

    Re: size of a 1-character file == 2

    Quote Originally Posted by ofnuts View Post
    • Linefeed: I cannot tell you why, the editor I use (KDE's kate) doesn't).


    • The EOF isn't really a character, it's a special value used to tell you there are no more characters, a bit like NULL is returned by other calls to tell you that you can't have what you are asking (more memory or else).
    I was wondering how EOF==-1 can be a negative ascii value.
    Thank you ofnuts, clears it.

  10. #10
    Join Date
    Aug 2011
    Location
    47°9′S 126°43W
    Beans
    2,172
    Distro
    Ubuntu 16.04 Xenial Xerus

    Re: size of a 1-character file == 2

    Quote Originally Posted by IAMTubby View Post
    I was wondering how EOF==-1 can be a negative ascii value.
    Thank you ofnuts, clears it.
    ... and also how it can be a 32-bit ASCII value...
    Warning: unless noted otherwise, code in my posts should be understood as "coding suggestions", and its use may require more neurones than the two necessary for Ctrl-C/Ctrl-V.

Page 1 of 2 12 LastLast

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •