PDA

View Full Version : [Python -Newbie] is this correct ?



baskar007
November 29th, 2009, 08:30 PM
I have a list of part files (that is many parts of single file).

I want to append all parts into a single file to make a full file.

I used this code:



import os
l = ["home/baskar/1.part","home/baskar/2.part","home/baskar/3.part"]

source_file ="/home/baskar/source.file"

re_size = 0
buffer = 10*1024 #10 KiB's
for x in range(len(l)):
target_size = os.path.getsize(l[x])
part = open(l[x],"r")
while 1:
output = part.read(buffer)
re_size = re_size + len(output)
F = open(source_file,"ab")
F.write(output)
F.close()
if re_size == target_size:
break
else:print "Appending file",l[x]



This code works perfectly ,But it takes much long time for appending file, even small file like 5mb file takes 3 min:mad: to complete.

Is there is any way to append file or is there is any wrong with my code ?

if anyone know please tell me.:confused:

benj1
November 29th, 2009, 08:49 PM
why not this ?

l = ["home/baskar/1.part","home/baskar/2.part","home/baskar/3.part"]

for part in l:
open("/home/baskar/source.file","a").write(open(part).read())

you certainly don't need the while loop, and associated checks

snova
November 29th, 2009, 09:31 PM
why not this ?

l = ["home/baskar/1.part","home/baskar/2.part","home/baskar/3.part"]

for part in l:
open("/home/baskar/source.file","a").write(open(part).read())

you certainly don't need the while loop, and associated checks

I think the point is so as not to read the files all at once.

I'm only guessing at the bottlenecks here, but I suspect a significant problem is that you reopen the output file every time you read a block.

Your for loop can be significantly improved by not iterating over the indexes, but the filenames themselves. You aren't using the index anyway. (see benj1's code for an example of this)

Your method of testing for the end of the file can be simplified. Just compare the length of the string that read() returned to the size you expected, and if it did not return enough, it has met EOF.

In addition, I think 10 KB is a rather small block size.

My version:



import os.path

Files = ["1.part", "2.part", "3.part"]
OutFile = "source"

ReadSize = 1024 * 1024 # 1 MB

out = open(OutFile, "a")

for filename in Files:
# Context manager (with statement). Ensures the file is closed, without doing it myself
with open(filename) as file:
while True:
tmp = file.read(ReadSize)
out.write(tmp)
if len(tmp) != ReadSize:
break

Can+~
November 29th, 2009, 11:46 PM
If files are small, you could load them into main memory and write them all at once.

baskar007
November 30th, 2009, 06:21 AM
while loop
If the file is too large reading of full file takes large memory ?

baskar007
November 30th, 2009, 06:25 AM
I think the point is so as not to read the files all at once.

I'm only guessing at the bottlenecks here, but I suspect a significant problem is that you reopen the output file every time you read a block.

Your for loop can be significantly improved by not iterating over the indexes, but the filenames themselves. You aren't using the index anyway. (see benj1's code for an example of this)

Your method of testing for the end of the file can be simplified. Just compare the length of the string that read() returned to the size you expected, and if it did not return enough, it has met EOF.

In addition, I think 10 KB is a rather small block size.

My version:



import os.path

Files = ["1.part", "2.part", "3.part"]
OutFile = "source"

ReadSize = 1024 * 1024 # 1 MB

out = open(OutFile, "a")

for filename in Files:
# Context manager (with statement). Ensures the file is closed, without doing it myself
with open(filename) as file:
while True:
tmp = file.read(ReadSize)
out.write(tmp)
if len(tmp) != ReadSize:
break

Thank you,
Now i got an idea.

I have one more question :
Is there is any other way to move all files in to a single file?

snova
November 30th, 2009, 06:42 AM
while loop
If the file is too large reading of full file takes large memory ?

A plain read() call will pull the entire file into memory. Memory is getting cheap and files aren't generally huge, but it's not the best idea.


Thank you,
Now i got an idea.

I have one more question :
Is there is any other way to move all files in to a single file?

How about cat? It just occurred to me:


cat 1.part 2.part 3.part > source
# Or even this:
cat *.part > source

baskar007
December 2nd, 2009, 09:06 AM
A plain read() call will pull the entire file into memory. Memory is getting cheap and files aren't generally huge, but it's not the best idea.



How about cat? It just occurred to me:


cat 1.part 2.part 3.part > source
# Or even this:
cat *.part > source
is *cat* is python code?

nvteighen
December 2nd, 2009, 02:53 PM
is *cat* is python code?

No, it's a command line utility.

DaithiF
December 2nd, 2009, 11:27 PM
is *cat* is python code?
+1 for cat. if it has to be python, you could cheat by wrapping python around cat :p

import subprocess
subprocess.Popen('cat %s > newfile.txt' % ' '.join(files), shell=True)though cats don't normally like being wrapped by pythons ;) melon helmets are fine tho.

fiddler616
December 3rd, 2009, 04:06 AM
though cats don't normally like being wrapped by pythons
Classy.

Does subprocess.Popen(str) arbitarily run str through a bash interpreter? Are there any limits? I feel like this could get out of hand, although I guess bash has:

python -c "<insert code here>"

DaithiF
December 3rd, 2009, 10:08 PM
Classy.

Does subprocess.Popen(str) arbitarily run str through a bash interpreter?

no, not unless you pass True for the Shell parameter.