PDA

View Full Version : simple character count script



jakupl
March 30th, 2009, 11:22 PM
I want to do a faroese DVORAK keyboard layout, I cant find any list that tells me witch characters are most used, so is there a way of doing a bash script, that can take large amounts of text and count the instances of each letter, and maybe sort them by frequency "not case sensitive"? bear in mind that Faroese contains special letters like , , , , and

jakupl
April 1st, 2009, 12:09 AM
bump

Arndt
April 1st, 2009, 09:08 AM
I want to do a faroese DVORAK keyboard layout, I cant find any list that tells me witch characters are most used, so is there a way of doing a bash script, that can take large amounts of text and count the instances of each letter, and maybe sort them by frequency "not case sensitive"? bear in mind that Faroese contains special letters like , , , , and

I would do it in C or Perl. Does it have to be a shell script?

Maybe this old thread will help: http://ubuntuforums.org/showthread.php?t=957610

ghostdog74
April 1st, 2009, 09:32 AM
there are many of such scripts lying around in the internet. just have to a search on them.


awk 'BEGIN{FS=""}{for(i=1;i<NF;i++)a[$i]++}END{for(o in a) {print a[o],o}}' file

jakupl
April 1st, 2009, 08:54 PM
there are many of such scripts lying around in the internet. just have to a search on them.


awk 'BEGIN{FS=""}{for(i=1;i<NF;i++)a[$i]++}END{for(o in a) {print a[o],o}}' file



I tried to search, but I didn't find anything. Thanks, I will try this when I get home

jakupl
April 2nd, 2009, 06:09 PM
This works great. Now I am going to figure out how to get percentages, make it count these letters: , , , , and , I also would like the output arranged in order of what is used the most. And exclude "space", "enter" and "tab", and all the strange letters that are shown as a question mark.