Daem0hn
March 12th, 2008, 02:16 PM
Hello All,
I'm hoping that this is the right forum to post this question in, if not, I appologise.
The question:
Would increasing NR_FILE and NR_OPEN to 500,000+ (up from 8192,1024 respectively) cause issues if the file handles were fully utilised?
Is there a quick, once-off way of increasing them (similar to echo xxxxxxx > /proc/sys/kernel/shmmax temporarily increases SHMMAX, or altering kern.maxfiles in /boot/loader.conf in BSD)? or do they require a kernel recompile? (all the googling i have done indicates the latter).
I've never run into this problem before, file handle limits imposed on processes (and the system in general) are too small. I only require a one time increase (however the increase is quite significant).
The problem occurs because i have 480,000 files requiring 350,000,000+ lines of randomly organised data (100 chars per line) to be appended to the files. Each line of data contains a flag indicating to which file it should be appended.
Fairly simple code, yes.
read line, decide which file to append the data to, open the file to append, append, close the file, move onto the next line.
Now this works. I have a program which does just that. The issue is the speed of access.
As more lines of data are appended to the files, the files become longer (obviously) and hence the 'open for append' operation takes longer to seek to the end of the file and become ready for append. After running this program for a day, and calculating based on the rate at which it was slowing down, that it would take 8.5 days to complete, i decided to seek another solution.
The easiest solution I thought of (without considering the implications of open file limits), was to create an array of 480,000 fstreams, each one remaining open for an append. This would mean no seeking after the initial open.
Obviously I considered holding the data in RAM and dumping, however there is too much data to fit into RAM (i have 4 gigs, the minimum size the data would take is 6 gigs)
The question i have, is that the default Ubuntu kernel comes with a per process limit #define NR_OPEN 1024 (as per limits.h) and a system wide limit of #define NR_FILE 8192 (as per fs.h)
In my time using linux, i've done plenty of kernel recompiles (my first linux experience was gentoo '03 stage 1 install, boy that was a steep learning curve), but i've never played around with NR_FILE and NR_OPEN. From your experience (if anyone has any), would increasing these values to say, 500,000 cause problems, assuming the 500,000 per process limit is fully utilized?
Also, is there a faster way of temporarily altering these values (like echoing to /proc/sys/kernel/shmmax to change shared memory? or editing /boot/loader.conf in BSD) or do they require a kernel recompile (all the google results i have found indicate the latter). The only reason i ask, is that this processing only needs to happen once.
Thanks in advance for your help.
If anyone has any other suggestions as to how I can (in a timely fashion) process the data, please let me know.
Cheers,
Mike
I'm hoping that this is the right forum to post this question in, if not, I appologise.
The question:
Would increasing NR_FILE and NR_OPEN to 500,000+ (up from 8192,1024 respectively) cause issues if the file handles were fully utilised?
Is there a quick, once-off way of increasing them (similar to echo xxxxxxx > /proc/sys/kernel/shmmax temporarily increases SHMMAX, or altering kern.maxfiles in /boot/loader.conf in BSD)? or do they require a kernel recompile? (all the googling i have done indicates the latter).
I've never run into this problem before, file handle limits imposed on processes (and the system in general) are too small. I only require a one time increase (however the increase is quite significant).
The problem occurs because i have 480,000 files requiring 350,000,000+ lines of randomly organised data (100 chars per line) to be appended to the files. Each line of data contains a flag indicating to which file it should be appended.
Fairly simple code, yes.
read line, decide which file to append the data to, open the file to append, append, close the file, move onto the next line.
Now this works. I have a program which does just that. The issue is the speed of access.
As more lines of data are appended to the files, the files become longer (obviously) and hence the 'open for append' operation takes longer to seek to the end of the file and become ready for append. After running this program for a day, and calculating based on the rate at which it was slowing down, that it would take 8.5 days to complete, i decided to seek another solution.
The easiest solution I thought of (without considering the implications of open file limits), was to create an array of 480,000 fstreams, each one remaining open for an append. This would mean no seeking after the initial open.
Obviously I considered holding the data in RAM and dumping, however there is too much data to fit into RAM (i have 4 gigs, the minimum size the data would take is 6 gigs)
The question i have, is that the default Ubuntu kernel comes with a per process limit #define NR_OPEN 1024 (as per limits.h) and a system wide limit of #define NR_FILE 8192 (as per fs.h)
In my time using linux, i've done plenty of kernel recompiles (my first linux experience was gentoo '03 stage 1 install, boy that was a steep learning curve), but i've never played around with NR_FILE and NR_OPEN. From your experience (if anyone has any), would increasing these values to say, 500,000 cause problems, assuming the 500,000 per process limit is fully utilized?
Also, is there a faster way of temporarily altering these values (like echoing to /proc/sys/kernel/shmmax to change shared memory? or editing /boot/loader.conf in BSD) or do they require a kernel recompile (all the google results i have found indicate the latter). The only reason i ask, is that this processing only needs to happen once.
Thanks in advance for your help.
If anyone has any other suggestions as to how I can (in a timely fashion) process the data, please let me know.
Cheers,
Mike