Results 1 to 7 of 7

Thread: awk question: grouping 3 columns

  1. #1
    Join Date
    Feb 2013
    Beans
    2

    awk question: grouping 3 columns

    Hi to all,
    I am new to this forum. I have an awk problem:
    The input data looks something like that

    a1 b1 c1
    a1 b2 c2
    a2 b2 c3
    a3 b4 c1
    a1 b1 c10
    a2 b5 c5
    a3 b1 c4
    a4 b2 c3
    a1 b2 c2
    a2 b2 c3
    a1 b1 c1
    a1 b2 c2
    a1 b2 c2
    a2 b2 c3
    a2 b2 c3
    b1 c1: 2
    The output should look like that:
    a1 b1 c1: 2 times
    ______c10:1 time
    ___b2 c2: 3 times
    a2 b2 c3: 3 times
    and so on.

    It should count the groups for all $i.
    I have allready done some easy awk-programming, but I can't get this one right.

    Maybe you can help me
    yours
    Petra

  2. #2
    Join Date
    May 2007
    Location
    Leeds, UK
    Beans
    1,664
    Distro
    Ubuntu 13.10 Saucy Salamander

    Re: awk question: grouping 3 columns

    Is this a homework problem? What do you have so far?

    If you are completely stuck, you might want to read up about associative arrays:

    https://www.gnu.org/software/gawk/ma...l#Array-Basics
    Please create new threads for new questions.
    Please wrap code in code tags using the '#' button or enter it in your post like this: [code]...[/code].

  3. #3
    Join Date
    Feb 2013
    Beans
    2

    Re: awk question: grouping 3 columns

    Hi,
    no,no homework. These years are long gone Allas....
    I have: awk -F" " '{if(k[$1])k[$1]=k[$1]":"$2 ":" $3; else k[$1]=$2;}END{for (j in k)print j, k[j];}' test

    this gives:
    a1 b1:b2:c2:b3:c10:b2:c2:b1:c1:b2:c2:b2:c2
    a2 b2:b5:c5:b2:c3:b2:c3:b2:c3
    a3 b4:b1:c4 but there are strange signs in the output... I can't copy past them. But they are not in the input file, but in the output. I tried to change the character encoding of the terminal, but they remain...
    a4 b2

    The first set is allways wrong. And I can't get the counting right


    I now tried something slightly different:
    awk -F":" '{coli[$1,$2,$3]++} END {for (i in coli) print i,":" col[i]}' test.txt |sort

    This gives
    a1 b1 c1 :2
    a1 b2 c2 :4
    a1 b3 c10 :1
    a2 b2 c3 :4
    a2 b5 c5 :1
    a3 b1 c4 :1
    a3 b4 c1 :1
    a4 b2 c3 :1


    Everything seems fine, but this seems too easy to me. I will have to test it on the real data tomorrow...
    Yours Petra

  4. #4
    Join Date
    Feb 2008
    Beans
    251
    Distro
    Ubuntu 12.04 Precise Pangolin

    Re: awk question: grouping 3 columns

    Hi!

    Maybe I'm oversimplifying it, but it almost seems like you might be able to do it by using the commands:

    Code:
    sort file.txt | uniq -c
    Which will tell you the count of each unique line in file.txt

    Would that work?

    Cheers,
    Gp

  5. #5
    Join Date
    May 2007
    Location
    Leeds, UK
    Beans
    1,664
    Distro
    Ubuntu 13.10 Saucy Salamander

    Re: awk question: grouping 3 columns

    I think this does it in awk, if you want control over the output:

    Code:
    awk '{++count[$0]} END {for (pattern in count) printf("%-12s: %d times\n", pattern, count[pattern])}'
    For example:

    Code:
    $ cat input
    a1 b1 c1 
    a1 b2 c2 
    a2 b2 c3 
    a3 b4 c1 
    a1 b1 c10 
    a2 b5 c5 
    a3 b1 c4 
    a4 b2 c3 
    a1 b2 c2 
    a2 b2 c3 
    a1 b1 c1 
    a1 b2 c2 
    a1 b2 c2 
    a2 b2 c3 
    a2 b2 c3 
    $ awk '{++count[$0]} END {for (pattern in count) printf("%-12s: %d times\n", pattern, count[pattern])}' input
    a2 b5 c5    : 1 times
    a3 b1 c4    : 1 times
    a2 b2 c3    : 4 times
    a1 b1 c10   : 1 times
    a1 b2 c2    : 4 times
    a1 b1 c1    : 2 times
    a3 b4 c1    : 1 times
    a4 b2 c3    : 1 times
    So it makes an associative array 'count' using the whole input line as the key, incrementing the value in the associative array on each occurrence. Then the end pattern iterates over the array and prints the key and value.

    EDIT: That's basically what you have in your second attempt. I just didn't read it properly.
    Last edited by r-senior; February 27th, 2013 at 07:34 PM.
    Please create new threads for new questions.
    Please wrap code in code tags using the '#' button or enter it in your post like this: [code]...[/code].

  6. #6
    Join Date
    Apr 2011
    Location
    Maryland
    Beans
    1,461
    Distro
    Kubuntu 12.04 Precise Pangolin

    Re: awk question: grouping 3 columns

    If you're not fixed on using Awk for this, you could do it fairly easily in Perl (with the same exact strategy, I should point out):

    Code:
    #!/usr/bin/perl
    
    use warnings;
    use strict;
    
    my %lineCount;
    
    while(<>) {
            chomp;
            $lineCount{$_}++;
    }
    
    foreach my $key ( keys %lineCount ) {
            my $count = $lineCount{$key};
            print "$key: $count times\n";
    }
    Assuming your input file is called 'file.txt' just run it like so:

    Code:
     perl count.pl file.txt

  7. #7
    Join Date
    Aug 2011
    Location
    47°9′S 126°43W
    Beans
    2,165
    Distro
    Kubuntu 14.04 Trusty Tahr

    Re: awk question: grouping 3 columns

    Quote Originally Posted by greenpeace View Post
    Hi!

    Maybe I'm oversimplifying it, but it almost seems like you might be able to do it by using the commands:

    Code:
    sort file.txt | uniq -c
    Which will tell you the count of each unique line in file.txt

    Would that work?

    Cheers,
    Gp
    I was going to suggest the same...

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •