Page 1 of 2 12 LastLast
Results 1 to 10 of 17

Thread: filenames ending with whitespace character

  1. #1
    Join Date
    Jan 2010
    Location
    Wheeling WV USA
    Beans
    2,053
    Distro
    Xubuntu 20.04 Focal Fossa

    filenames ending with whitespace character

    is it allowed for any file being installed by any package in the Ubuntu repository to end with any whitespace character, especially the space character (ASCII 32)? allowed, as in a policy not to distribute a package that installs such files.

    the purpose for knowing is to correctly parse the lists of file paths found in /var/lib/apt/lists/*.lz4 after decompression. parsing these lists requires know the length of each path. that length cannot be determined if the name ends in a space character due to a varying number of spaces following some paths in these lists. a name ending in a space character would be very unusual but is not impossible in the usual file systems seen in Linux or Unix. depending on how parsing code is written, other whitespace characters could also affect the parsing.
    Mask wearer, Social distancer, System Administrator, Programmer, Linux advocate, Command Line user, Ham radio operator (KA9WGN/8, tech), Photographer (hobby), occasional tweetXer

  2. #2
    currentshaft is offline Oops! My Coffee Cup is empty.
    Join Date
    May 2024
    Beans
    Hidden!

    Re: filenames ending with whitespace character

    I can't answer the first question, but there are likely better ways to parse file listings than deliminating on a whitespace character.

    You should post the code and what problem it is meant to solve for an improved solution.

  3. #3
    Join Date
    Jan 2010
    Location
    Wheeling WV USA
    Beans
    2,053
    Distro
    Xubuntu 20.04 Focal Fossa

    Re: filenames ending with whitespace character

    there is no code, yet. the problem was mostly described. it is to make a database that i can use a file name or path, or part of one to find the full path it could be and what package it could be in. these lists files that are lz4 compressed (bz2 would make them about half size and thus faster to download, especially on a sub-broadband connection) contain lists of paths with the path in column 1 and the package name in column 2. between the columns are one or more space characters. so a path can be followed by a variable number of spaces.
    Mask wearer, Social distancer, System Administrator, Programmer, Linux advocate, Command Line user, Ham radio operator (KA9WGN/8, tech), Photographer (hobby), occasional tweetXer

  4. #4
    Join Date
    Aug 2011
    Location
    52.5° N 6.4° E
    Beans
    6,853
    Distro
    Xubuntu 22.04 Jammy Jellyfish

    Re: filenames ending with whitespace character

    Like in the dpkg --search command or the web interface at packages.ubuntu.com, right? I've never seen a file name in a package ending in whitespace, but I don't know whether this is official policy.

  5. #5
    Join Date
    Mar 2010
    Location
    Squidbilly-Land
    Beans
    Hidden!
    Distro
    Ubuntu

    Re: filenames ending with whitespace character

    You can name files anything you like on Unix systems with Unix/Linux file systems. Typically, filenames with extra spaces are accidental and a mistake. If they are set on purpose, I would suspect nefarious reasons.

    I've never seen an official package name with a space in it either.

  6. #6
    currentshaft is offline Oops! My Coffee Cup is empty.
    Join Date
    May 2024
    Beans
    Hidden!

    Re: filenames ending with whitespace character

    Quote Originally Posted by Skaperen View Post
    there is no code, yet. the problem was mostly described. it is to make a database that i can use a file name or path, or part of one to find the full path it could be and what package it could be in. these lists files that are lz4 compressed (bz2 would make them about half size and thus faster to download, especially on a sub-broadband connection) contain lists of paths with the path in column 1 and the package name in column 2. between the columns are one or more space characters. so a path can be followed by a variable number of spaces.
    Again, files can be listed in a way which do not require you to delimit the output by spaces to parse, for example:

    $ touch test1 test\ 2 test3\
    $ python
    >>> import os
    >>> os.listdir(".")
    ['test3 ', 'test 2', 'test1']

  7. #7
    Join Date
    Jan 2010
    Location
    Wheeling WV USA
    Beans
    2,053
    Distro
    Xubuntu 20.04 Focal Fossa

    Re: filenames ending with whitespace character

    Quote Originally Posted by TheFu View Post
    You can name files anything you like on Unix systems with Unix/Linux file systems. Typically, filenames with extra spaces are accidental and a mistake. If they are set on purpose, I would suspect nefarious reasons.

    I've never seen an official package name with a space in it either.
    actually, many installable files have spaces in the middle of the name. counting files in directories with a space (because a simple grep over paths could not distinguish if a space follows the last '/' or not) there are over 900,000 possible. but these are in the middle. i did not try to check for spaces at the end because of the spaces that are there in the form the files are already in. those "lz4" (bad choice to compress distributed files) files have path followed by package with multiple spaces between them if the path is quite short (no real need to do this for the purpose of these files). if a file path really is short and has a space at the end, that space would be followed by the space(s) that follow the path and would look like a shorter path. long paths would have just one separator and so there would be two spaces (the one at the end of the path and the one that separates the path from the package).

    after decompressing those files, try reversing the order of path and package they are in now to be package and path. it's not so easy (impossible if paths can end with a space in them).

    i'd show things but the code tag in this forum still doesn't work right with regard to multiple spaces and text with variable width characters.
    Last edited by Skaperen; July 7th, 2024 at 11:31 PM.
    Mask wearer, Social distancer, System Administrator, Programmer, Linux advocate, Command Line user, Ham radio operator (KA9WGN/8, tech), Photographer (hobby), occasional tweetXer

  8. #8
    Join Date
    Jan 2010
    Location
    Wheeling WV USA
    Beans
    2,053
    Distro
    Xubuntu 20.04 Focal Fossa

    Re: filenames ending with whitespace character

    Quote Originally Posted by currentshaft View Post
    Again, files can be listed in a way which do not require you to delimit the output by spaces to parse, for example:

    $ touch test1 test\ 2 test3\
    $ python
    >>> import os
    >>> os.listdir(".")
    ['test3 ', 'test 2', 'test1']
    try doing that with the existing files:
    Code:
    time (cat /var/lib/apt/lists/*.lz4|lz4 -d|wc -lc)
    the above command will take a while to run, depending on your device speeds, from a few seconds to a few minutes. then it will show you the total number of lines and the total number of bytes. on my 20.04 (focal fossa) system i got 140,272,319 lines and 15,526,111,353 bytes in 5 seconds. fortunately, lz4 supports concatenating compressed files together.

    trying to convert those lists to python lists will need a lot of RAM. even as tuples it will need a lot. good luck parsing these files (after decompression) to create a list of 140+ million items containing 15+ billion characters. you'll need at least a 32GB machine. have fun!
    Mask wearer, Social distancer, System Administrator, Programmer, Linux advocate, Command Line user, Ham radio operator (KA9WGN/8, tech), Photographer (hobby), occasional tweetXer

  9. #9
    currentshaft is offline Oops! My Coffee Cup is empty.
    Join Date
    May 2024
    Beans
    Hidden!

    Re: filenames ending with whitespace character

    Quote Originally Posted by Skaperen View Post
    try doing that with the existing files:
    Code:
    time (cat /var/lib/apt/lists/*.lz4|lz4 -d|wc -lc)
    the above command will take a while to run, depending on your device speeds, from a few seconds to a few minutes. then it will show you the total number of lines and the total number of bytes. on my 20.04 (focal fossa) system i got 140,272,319 lines and 15,526,111,353 bytes in 5 seconds. fortunately, lz4 supports concatenating compressed files together.

    trying to convert those lists to python lists will need a lot of RAM. even as tuples it will need a lot. good luck parsing these files (after decompression) to create a list of 140+ million items containing 15+ billion characters. you'll need at least a 32GB machine. have fun!
    I do not have any lz4 files in /var/lib/apt/lists nor have any idea why you're trying to decompress all of them.

    Your question is like asking how to fit a circular window without giving any information about the construction of the rest of the house.

    Instead of asking to us to guess what you're trying to ultimately do, can you just inform us in an unambiguous way, with examples of input and output in the desired program?
    Last edited by currentshaft; July 8th, 2024 at 12:41 AM.

  10. #10
    Join Date
    Mar 2010
    Location
    Squidbilly-Land
    Beans
    Hidden!
    Distro
    Ubuntu

    Re: filenames ending with whitespace character

    Quote Originally Posted by Skaperen View Post
    actually, many installable files have spaces in the middle of the name. counting files in directories with a space (because a simple grep over paths could not distinguish if a space follows the last '/' or not) there are over 900,000 possible. but these are in the middle. i did not try to check for spaces at the end because of the spaces that are there in the form the files are already in. those "lz4" (bad choice to compress distributed files) files have path followed by package with multiple spaces between them if the path is quite short (no real need to do this for the purpose of these files). if a file path really is short and has a space at the end, that space would be followed by the space(s) that follow the path and would look like a shorter path. long paths would have just one separator and so there would be two spaces (the one at the end of the path and the one that separates the path from the package).

    after decompressing those files, try reversing the order of path and package they are in now to be package and path. it's not so easy (impossible if paths can end with a space in them).

    i'd show things but the code tag in this forum still doesn't work right with regard to multiple spaces and text with variable width characters.

    If the "installable files" have spaces in them, I'd first wonder where you are finding these. I just searched my APT cache which has all the deb packages for 6 OSes inside it - NONE, ZERO, NADA have spaces in any filenames. NONE.

    I can't remember ever seeing official package filenames with spaces inside them. Additionally, on my systems, I don't allow spaces in file names. I use a little rename script to fix filenames with spaces, to prevent it. If you use tab completion, it is pretty clear when a filename has odd characters, like a space, because the entire filename will be quoted automatically.

    The worst places I have to deal with spaces in filenames is video or audio content. As those files arrive, I fix the filenames with the script. A number of us in these forums have posted our file renaming scripts multiple times. Mine just uses the perl 'rename' command a few times to replace spaces with '_' characters, then remove duplicates and if there's a '_-_', to change that into a '-'. Also, since I record OTA TV and those recordings have timestamps in their filenames, my script cleans those up into SxxExx instead by doing some fancy grep/cut from text files holding that information.

    Allowing spaces in filenames breaks lots of my scripts, so it is just best for me to prevent them from the start. It is mostly automatic at this point, unless there are conflicting final names that need manual resolution.

Page 1 of 2 12 LastLast

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •