PDA

View Full Version : Python identical string don't match, sometimes



Frozen Forest
September 22nd, 2012, 06:58 PM
I got a text file containing a list of folder and file names, then i have a file system containing these items along with other files and folders. The script I'm writing scan in the file and folder names from the text file, then start searching the file system for these. The strange thing is that I get match sometimes.

Example
text file


file_a
file_c
file system


/some/path/file_a
/some/path/file_b
/some/path/file_c
/some/path/file_d
output from script


found:
/some/path/file_a
it should have been


found:
/some/path/file_a
/some/path/file_c

juancarlospaco
September 22nd, 2012, 07:05 PM
Show code . . .

hakermania
September 22nd, 2012, 07:12 PM
Yes, show us your code. I suspect that the files are NOT identical and the thing that causes it is that in fact the contents of the 1st file are:


a
b
c
d

and the other's


a
b
c
d (\n)


So, there's a trailing \n

Frozen Forest
September 24th, 2012, 05:06 PM
Show code . . .


Yes, show us your code. I suspect that the files are NOT identical and the thing that causes it is that in fact the contents of the 1st file are:


a
b
c
d
and the other's


a
b
c
d (\n)

So, there's a trailing \n

see below

trent.josephsen
September 24th, 2012, 06:09 PM
That's... not how you use os.path.walk... is it?


>>> os.path.walk('/home')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: walk() takes exactly 3 arguments (1 given)

os.path.walk is deprecated and has been for quite some time. Use os.walk instead.

Frozen Forest
September 24th, 2012, 06:52 PM
That's... not how you use os.path.walk... is it?


>>> os.path.walk('/home')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: walk() takes exactly 3 arguments (1 given)os.path.walk is deprecated and has been for quite some time. Use os.walk instead.

You are correct, was using the walk function from the beginning, it show the same thing. What I actually use is a recursive tree structure. The point was the same, but it was os.walk. Here is the code. The problem is still the same 'if p in os.path.basename(tree_node.dir):' return false. I'm starting to guess it has something to do with the encoding



class Tree(object):
def __init__(self, dirpath, parent = None, ignore = []):
self.parent = parent
self.dir = dirpath
tmp_list = [os.path.join(dirpath, path) for path in os.listdir(dirpath)]
subdirs = [path for path in tmp_list if os.path.isdir(path)]
self.files = [path for path in tmp_list if os.path.isfile(path)]
self.subdirs = []
if subdirs:
self.__scan__(subdirs, ignore)

def __scan__(self, subdirs, ignore):
self.subdirs = []
for subdir in subdirs:
if not os.path.basename(subdir) in ignore:
self.subdirs.append(Tree(subdir, self, ignore))

def __to_list__(self, dir_list):
dir_list.append(self.dir)
for d in self.subdirs:
d.__to_list__(dir_list)
return dir_list

def delete(self):
self.parent.subdirs.remove(self)
del self

def to_list(self):
dir_list = []
self.__to_list__(dir_list)
return dir_list

def p(self):
print self.dir
for d in self.subdirs:
d.p()
def scan(tree_node, find_other):
for p in find_other:
if p in os.path.basename(tree_node.dir):
tree_node.delete()
return
for x in tree_node.files:
if p in x:
tree_node.delete()
return
if tree_node.subdirs:
for node in tree_node.subdirs:
scan(node, find_other)

trent.josephsen
September 24th, 2012, 08:00 PM
I'm not 100% sure, but os.path.basename may return the empty string when called on a directory -- could that be your mistake? You don't mention whether the bug affects only directory names, only file names, or both, which seems to me very relevant.