PDA

View Full Version : [SOLVED] [Regular Expression] Matching the same character in different spots?


xelapond
April 19th, 2008, 10:16 AM
I am just learning regular expressions and was wondering if there was a way to match the same character in different spots. take for example, the following example:

aaathdfjehsdyaaaWould match three of the same character at the beginning, and random sequence of character in the middle, and then three more of the same(That is the same as the one matched in the beginning) character at the end.

Also, this may have to match when that is inside a string:
ajkl;dilgjvafaaahhgydhaaaiorehdfkThanks,

Alex

aks44
April 19th, 2008, 10:28 AM
The concept you're looking for is: back-references (which go together with captures).

ghostdog74
April 19th, 2008, 10:33 AM
what happens when you have this

ajkl;dilgjvafaaahhgydhaaaiorehdfkdfjfdnidaaafdfdsf aaa

what would your result look like?

xelapond
April 19th, 2008, 10:50 AM
what happens when you have this

ajkl;dilgjvafaaahhgydhaaaiorehdfkdfjfdnidaaafdfdsf aaa

what would your result look like?

I would expect it to return aaafdfdsfaaa.

ghostdog74
April 19th, 2008, 11:37 AM
so you only want the very last set of patterns to be displayed. Say you have a file like this

aaa12345aaaxxxxxxaaawerttyewweerwerwerwerwersfdsfs dfsaaadfdfsdfdsfsdf

the part you want to get is "aaawerttyewweerwerwerwerwersfdsfsdfsaaa"
. you can wait for someone to give you regexp solution, in the meantime, here's a non regexp solution, using splits on pattern.

#!/bin/bash
awk 'BEGIN{OFS=FS="aaa"}{ print OFS $(NF-1) OFS}' file
'
output:

# ./test.sh
aaawerttyewweerwerwerwerwersfdsfsdfsaaa


you can use the split on pattern method with other languages as well.

xelapond
April 19th, 2008, 11:46 AM
Oh, sorry, I did not realize there were two sets in there. It would be nice if it could return both.

Here is what I came up with, but it doesn't work:
r'(.{3}).+/1'If gives the following output:
>>> string = 'jksdfajsdhfjtrklsdfjaaajjdhjaaa'
>>> s = re.search(r'(.{3}).+/1', string)
>>> s.group()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'
I am using Python for the programming language.

aks44
April 19th, 2008, 12:44 PM
r'(.{3}).+/1'

The forward slash ( /1 ) should be a backslash ( \1 ).


EDIT: FWIW that "AttributeError: 'NoneType' object has no attribute 'group'" error means that the regex didn't match at all. You have to ensure the object resulting from the search() call is not null (or whatever that concept is called in Python) before actually using it.

EDIT2:
string = 'jksdfajsdhfjtrklsdfjaaajjdhjaaa'
s = re.search(r'(.{3}).+\1', string)
if s != None :
s.group()
else:
'Error'

xelapond
April 19th, 2008, 03:12 PM
Thanks a ton,

Now I have this:
>>> string = 'jksdfajsdhfjtrklsdfjaaajjdhjaaa'
>>> s = re.search(r'(.{3}).+\1', string)
>>> s.group()
'sdfajsdhfjtrklsdf'
It appears to be only taking the first on in the sequence. How do I make it return all of them? Do I have to remove that from the string and run it again?

Thanks,

Alex

aks44
April 19th, 2008, 04:03 PM
How do I make it return all of them?

Read the manual or search the web? FWIW here's what I found after about 20 seconds : http://www.amk.ca/python/howto/regex/regex.html#SECTION000430000000000000000

And no, I don't know anything about Python...

ghostdog74
April 19th, 2008, 08:31 PM
you want to return them all? that means from this :

aaa12345aaaxxxxxxaaawerttyewweerwerwerwerwersfdsfs dfsaaadfdfsdfdsfsdf

you want to return

1) aaa12345aaa
2) aaaxxxxxxaaa
3) aaawerttyewweerwerwerwerwersfdsfsdfsaaa
right?

you can use split method like i suggested

for lines in open("file"):
lines = lines.replace("aaa","|").split("|")
print lines

output:

# ./test1.py
['', '12345', 'xxxxxx', 'werttyewweerwerwerwerwersfdsfs dfs', 'dfdfsdfdsfsdf']

i leave it to you to join each element back with "aaa". You are using Python, so there's really no need to waste your time setting up regexp. If you are still bent on it,

import re
pat = re.compile("(?=aaa(.*?)aaa)")
data = open("file").read()
print pat.findall(data)

output:

# ./test1.py
['12345', 'xxxxxx', 'werttyewweerwerwerwerwersfdsfs dfs']

ghostdog74
April 20th, 2008, 06:51 AM
make adjustments to the brackets to include the "aaa"s. i leave it to you.

xelapond
April 21st, 2008, 09:50 AM
Thanks everyone, I got it working.