PDA

View Full Version : Remove Even Characters from text file



mbstrlbstr
June 3rd, 2010, 08:21 PM
I have a list with random characters inserted in between each relevant character. I need to remove the random characters. I have limited experience with scripting and such, so go easy on me!
Thanks,
Shaun

stylishpants
June 3rd, 2010, 10:47 PM
Here's a little ruby one-liner that throws out every second character on each line.


bob@cob:~$ echo -e "a1b2c3\nd4e5g6\n\nh7i8j9" > file
bob@cob:~$ cat file
a1b2c3
d4e5f6

g7h8i9
bob@cob:~$ ruby -e 'ARGF.readlines.each do |line| line.chars.each_slice(2){|arr| print arr[0]}; end' file
abc
def

ghi


Whether this is a usable solution for you depends on your input data, it would be useful if you posted a representative sample.

mbstrlbstr
June 3rd, 2010, 11:35 PM
Thanks for that. I don't know if I have ruby installed on here or not. I have run some python and perl scripts recently, but not ruby. Below is an example of what I'm talking about


sahdatuhnemycwac
adnhdornexwz
jpulidlweqthtbex

If the script was run, I would want it to look like


shaunmca
andrew
juliette


Thanks!

trent.josephsen
June 4th, 2010, 12:20 AM
sed -e 's/\(.\)./\1/g'

I didn't post it immediately because it treats newlines differently from other characters, but that seems to be o.k.

mbstrlbstr
June 4th, 2010, 12:25 AM
Is that python?

kaibob
June 4th, 2010, 01:02 AM
I put the words from your post in a file named wordlist.txt and used bash parameter expansion:


#!/bin/bash

while read word ; do
for (( i=0 ; i<=${#word} ; i=i+2 )) ; do
echo -n ${word:${i}:1}
done
echo
done < wordlist.txt

mbstrlbstr
June 4th, 2010, 01:38 AM
@trent.josephsen
I tried that, and it removed the wrong characters?
Maybe if I inserted a character in front of every line, it would work then?

@kaibob
I entered it like this in my console

cat file.txt | wordscript

and it ins't sure what to do with the wordscript file?

mbstrlbstr
June 4th, 2010, 01:44 AM
Never mind, I changed the code to start at bash script to start at 1 instead of 0. It worked great!

kaibob
June 4th, 2010, 02:19 AM
@kaibob
I entered it like this in my console

cat file.txt | wordscript

and it ins't sure what to do with the wordscript file?
I have modified my earlier post to make things clearer. The words being manipulated are in the file named wordlist.txt. This text file has to be in the current directory, or you have to provide a path to this file. Then, in a terminal, just enter the name of the script and nothing else.

mo.reina
June 4th, 2010, 05:25 AM
python 3:


f = open('worldlist.txt').readlines()
print([x[::2].replace('\n', '') for x in f])

for version 2.6 or below:

f = open('worldlist.txt').readlines()
print [x[::2].replace('\n', '') for x in f]

wmcbrine
June 5th, 2010, 01:33 AM
print [x[::2].replace('\n', '') for x in f]I don't think that does quite what you want.

How about this?


for line in file('worldlist.txt'):
print line.rstrip()[::2]

mo.reina
June 5th, 2010, 01:55 AM
>>> f = open('wordlist.txt').readlines()
>>> print [x[::2].replace('\n', '') for x in f]
['shaunmca', 'andrew', 'juilette']
>>>


isn't that what the OP wanted?


mo@mo-laptop:~/python/gui$ cat wordlist.txt
sahdatuhnemycwac
adnhdornexwz
jpulidlweqthtbex

wmcbrine
June 5th, 2010, 06:21 AM
isn't that what the OP wanted?I think the OP wanted


shaunmca
andrew
juilette


not


['shaunmca', 'andrew', 'juilette']

schauerlich
June 5th, 2010, 07:23 AM
>>> print "\n".join(x.strip()[::2] for x in open("wordfile.txt", "r").readlines())
shaunmca
andrew
juilette


Also: obvious homework is obvious

wmcbrine
June 5th, 2010, 05:43 PM
Yeah, I kinda assumed homework, which is why I didn't answer initially. Then again, the OP didn't specify a language.

I think .strip() might be overzealous here (what if there are lines with leading spaces to preserve?), readlines() is not needed, and mode "r" is not needed.


print '\n'.join(line.rstrip()[::2] for line in file('wordfile.txt'))