This is actually a more complicated question for larger files than it is for many small files. For many small files my general solution is to use xargs or parallel to farm out the smaller jobs.
For example, say I want to find all .gz files that contain the word "ubuntuforums".
The slow way (this runs zgrep on each .gz file found):
Code:
find . -name "*.gz" -exec zgrep -l "ubuntuforums" {} \;
But since I don't care about the order of the returned list I can use xargs to do things in your case about twice as fast:
Code:
find . -name "*.gz" | xargs -P 2 -n 10 zgrep -l "ubuntuforums"
This splits the list of .gz files into chunks of 10 and starts two different instances of zgrep passing 10 files to each, when an instance of zgrep finishes it starts a new one. The OS will take care of putting one on each core.
And finally you can use parallel to accomplish the same:
Code:
parallel zgrep -l "ubuntuforums" -- `find . -name "*.gz"`
Parallel is cool in that it determines how many jobs to start. It processes each argument individually (at least by default).
If you're running a script on a large file you could plausibly split the file in two and run the command on each half and then combine the results. But that sounds like it's going to involve trading your old problem (slow) for a new one (how to combine results from the two halves of the file run separately). If you're thinking of something like image processing, the software has to be written in such a way as to take advantage of multiple cores, there's no way to split processes across multiple cores. So short of getting your hands dirty (as in absolutely filthy, disassembling and reassembling the program) there's no easy way.
And the simple answer to your second question, there's nothing wrong with making all your cores to work. Your fan might spin up pretty high, but it shouldn't be an issue.
Bookmarks