Page 1 of 2 12 LastLast
Results 1 to 10 of 12

Thread: Import log flie data - shell script

  1. #1
    Join Date
    Jun 2010
    Location
    Manchester
    Beans
    88
    Distro
    Ubuntu 10.04 Lucid Lynx

    Import log flie data - shell script

    hi

    Can anyone start me off or point me in the right direction as to:

    Create a command-line script that will import log file data into the application. The application will be a PHP/MySQL one.

    An example log line:

    Code:
    VOTE 1168241980 Campaign:ssss_uk_01B Validity:during Choice:Tupele CONN:MIG00VU MSISDN:00088866655598 GUID:A34-FDS87-FHDHDH-DHDHDHD0 Shortcode:63334
    I've been very brief here for obvious reasons and there's a lot more to it than I've let on.

    It's an assignment and I don't want to ask "Will you do it for me?" rather, start me off or direct me to a good tutorial for #!/bin/bash beginners. That's assuming #!/bin/bash is the best solution; as of yet I don't even know this much.

  2. #2
    Join Date
    Sep 2006
    Beans
    7,698
    Distro
    Lubuntu Development Release

    perl

    That's the kind of thing I would use perl for. The package libdbd-mysql-perl gives you the module DBD::mysql which allows you to work with MySQL via an API. Nothing is better than perl at parsing log files.

    There are other modules for other languages which will provide you an API for MySQL, but this is the kind of task I would use perl for.

  3. #3
    Join Date
    Jun 2010
    Location
    Manchester
    Beans
    88
    Distro
    Ubuntu 10.04 Lucid Lynx

    Re: perl

    Quote Originally Posted by Lars Noodén View Post
    That's the kind of thing I would use perl for. The package libdbd-mysql-perl gives you the module DBD::mysql which allows you to work with MySQL via an API. Nothing is better than perl at parsing log files.

    There are other modules for other languages which will provide you an API for MySQL, but this is the kind of task I would use perl for.
    Interesting Lars....

  4. #4
    Join Date
    Jun 2010
    Location
    Manchester
    Beans
    88
    Distro
    Ubuntu 10.04 Lucid Lynx

    Re: Import log flie data - shell script

    hi

    I need to read data in from a file, parse it, then insert each record into a mysql database in Perl.

    So far I have:

    Code:
     #!/usr/bin/perl 
    
    # perl_test.pl
    
      use warnings;
      use strict;
      use DBI();
    
     # Connect to the database.
     my $dbh = DBI->connect("DBI:mysql:database=demo;host=localhost",
                            "cookie", "*********",
                            {'RaiseError' => 1});
    
    open (LOGFILE, '/home/cookie/Desktop/data.txt') or die("Could not open log file.");
    
     foreach my $line (<LOGFILE>) {
    
        my $name = $1;
        my $age = $2;
        
        ($name, $age) = split(' ', $line);
        
        my $query = 'INSERT INTO perl_test (name, age) VALUES ("$name","$age")';
        my $sth = $dbh->prepare( $query );
        $sth->execute();
    
        #print "$name $age\n";  debugging - works!
    
     }
    
    close (LOGFILE);
    // data.txt
    Code:
    Peter 34
    Andy 23
    Steven 29
    Jane 67
    Laura 17
    When I check the database the script doesn't insert the dynamic ($name, $age) values - rather the actual variables themselves:

    Code:
    mysql> select * from perl_test;
    +----+---------+------+
    | id | name    | age  |
    +----+---------+------+
    |  1 | Peter   |   76 |
    |  2 | $name   |    0 |
    |  3 | $name   |    0 |
    |  4 | $name   |    0 |
    |  5 | $name   |    0 |
    |  6 | $name   |    0 |
    |  7 | $name   |    0 |
    +----+---------+------+
    13 rows in set (0.01 sec)
    Ignore the first value in the table below; I did that manually.

    I'm pretty sure to interpolate values in strings we use double quotes within MySQL or am I missing a Perl syntax quirk?




    A clue would be good?

  5. #5
    Join Date
    Sep 2006
    Beans
    7,698
    Distro
    Lubuntu Development Release

    Re: Import log flie data - perl script

    It is the single quotes that are throwing off your statement. For that reason, qq,qw and q operators are preferred around the base statement. That allows you to use single and double quotes inside the statement without trouble.

    Try this line instead:
    Code:
    my $query = qq(INSERT INTO perl_test (name, age) VALUES ("$name","$age"));
    Nice that you picked up prepare/execute already. That helps in several areas. If you take it further, you can also use place holders for the variables in the prepared statement.

  6. #6
    Join Date
    Sep 2006
    Beans
    7,698
    Distro
    Lubuntu Development Release

    Re: Import log flie data - perl script

    Here are two examples of placeholders.

    http://stackoverflow.com/questions/1...one-mysql-call

    http://stackoverflow.com/questions/1...se-in-perl-dbi

    The query only needs to be created / prepared once then, but it has other advantages a far as string management goes.
    Last edited by Lars Noodén; October 30th, 2012 at 10:18 AM.

  7. #7
    Join Date
    Jun 2010
    Location
    Manchester
    Beans
    88
    Distro
    Ubuntu 10.04 Lucid Lynx

    Re: Import log flie data - perl script

    Quote Originally Posted by Lars Noodén View Post
    Try this line instead:
    Thanks Lars

  8. #8
    Join Date
    Jun 2010
    Location
    Manchester
    Beans
    88
    Distro
    Ubuntu 10.04 Lucid Lynx

    Re: Import log flie data - shell script

    OK, now I need to scale it up a little. This part is a small part of a bigger problem - full details here: http://acookson.org/sms-voting-campaign/

    My elementary script now loops through a file and picks out selected field values with the use of the substr construct:

    Code:
     #!/usr/bin/perl 
    
      use warnings;
      use strict;
      use DBI();
    
     # Connect to the database.
     my $dbh = DBI->connect("DBI:mysql:database=voting_campaigns;host=localhost",
                            "cookie", "*********",
                            {'RaiseError' => 1});
    
    open (LOGFILE, '/var/www/voting/votes.txt') or die("Could not open log file.");
    
     foreach my $line (<LOGFILE>) {
    
        my $vote; 
        my $epoch; 
        my $campaign; 
        my $validity; 
        my $choice; 
        my $CONN; 
        my $MSISDN; 
        my $GUID; 
        my $Shortcode;    
        
        ($vote, $epoch, $campaign, $validity, $choice, $CONN, $MSISDN, $GUID, $Shortcode) = split(' ', $line);
    
        # parse the field:value entries...
        $campaign = substr $campaign, 9, 11, ''; # ssss_uk_01C
        $validity = substr $validity, 9, 6, ''; # during
        $choice = substr $choice, 7, 10, ''; # Brown
    
        my $sth = $dbh->prepare("INSERT INTO voting (vote, epoch, campaign, validity, choice, 
        CONN, MSISDN, GUID, Shortcode) VALUES (?,?,?,?,?,?,?,?,?)");
    
        $sth->execute($vote, $epoch, $campaign, $validity, $choice, $CONN, $MSISDN, $GUID, $Shortcode);
        
        # debug
        print "$campaign $validity $choice \n"; # ssss_uk_01B during Green 
    
     }
    
    close (LOGFILE);
    All good so far, the script executes successfully. Now I need to focus on:

    Any lines that are not well-formed should be discarded. Your script should assume the data is in uncompressed plain text.

    A brief explanation of what we would need to consider if we wanted to import a large number of similar files (millions of votes)

    - ‘Choice:’ indicates the candidate the user is voting for. In every campaign there will be a limited set of candidates that can be voted for. If Choice is blank, it means the system could not identify the chosen candidate from the text of the SMS message.

    All such messages should be counted together as ‘errors’, irrespective of their Validity values. There is a limited set of values for ‘Choice’, each of which represents a candidate who can be voted for.

    I'm thinking a regular expression is the way forward:

    Code:
        my $MSISDN; 
        my $GUID; 
        my $Shortcode;
    
      # Skip lines that have empty field values... 
    
        next if !~ /^VOTE\sCampaign:\w{11,}\sValidity:\w{6,}\sChoice:\w{2,30}\s $/;
        
        # Keep count of errors or malformed lines
        my $errors++;
    
    ($vote, $epoch, $campaign, $validity, $choice, $CONN, $MSISDN, $GUID, $Shortcode) = split(' ', $line);
    
        # parse the field:value entries...
        $campaign = substr $campaign, 9, 11, ''; # ssss_uk_01C
    I've not ran it yet. How bad is it? I have my doubts over the

    Code:
    next if
    my $errors++;
    construct, It's not going to fly is it? I'd expect $errors to increment regardless of the next statement. A more logical way for me would be:

    Code:
    my $regex = /^VOTE\sCampaign:\w{11,}\sValidity:\w{6,}\sChoice:\w{2,30}\s $/;
    if(!$regex){
      # bad line, increment $errors..
     $errors++;
    # skip to next line
     next
    }
     #continue loop
    but I'm not using Perl now, I'm using psuedo code.

  9. #9
    Join Date
    Sep 2006
    Beans
    7,698
    Distro
    Lubuntu Development Release

    Re: Import log flie data - shell script

    I think the prepare statement can probably go outside the loop in principle.

    Also, the multiple my declarations can be applied when the variables are first used. But that's more a style thing. Both are "right"

    Code:
    my ($vote, $epoch, $campaign, $validity, $choice, $CONN, $MSISDN, $GUID, $Shortcode) = split(' ', $line);
    If none of the values will be blank or zero (0), then you could check like this:

    Code:
    if ($vote && $epoch && $campaign && $validity && $choice && $CONN && $MSISDN && $GUID && $Shortcode) {
      $errors++;
      next;
    }
    Remember that 'my' can make a variable local to a loop or statement. So errors should probably be declared at the beginning of the program so it can be used at the end of the program.

    Code:
    my $a = 1;
    print qq(a1=$a\n);
    if ( $a ) {
     my $a = 2;
     print qq(a2=$a\n);
    }
    print qq(a3=$a\n);

  10. #10
    Join Date
    Jun 2010
    Location
    Manchester
    Beans
    88
    Distro
    Ubuntu 10.04 Lucid Lynx

    Re: Import log flie data - shell script

    Quote Originally Posted by Lars Noodén View Post
    I think the prepare statement can probably go outside the loop in principle.
    Indeed it's better there me thinks.

    Also, the multiple my declarations can be applied when the variables are first used. But that's more a style thing. Both are "right"
    It certainly makes the code less bloated.

    Remember that 'my' can make a variable local to a loop or statement. So errors should probably be declared at the beginning of the program so it can be used at the end of the program.
    Thanks, I'll have to keep my eyes on scope. I'm finding Perl has syntax very similar to PHP.

    My revised code now looks like:

    Code:
     #!/usr/bin/perl 
    
      use warnings;
      use strict;
      use DBI();
    
     # Connect to the database.
     my $dbh = DBI->connect("DBI:mysql:database=voting_campaigns;host=localhost",
                            "cookie", "*********",
                            {'RaiseError' => 1});
    
        my $sth = $dbh->prepare("INSERT INTO voting (vote, epoch, campaign, validity, choice, 
        CONN, MSISDN, GUID, Shortcode) VALUES (?,?,?,?,?,?,?,?,?)");
    
        open (LOGFILE, '/var/www/voting/votes.txt') or die("Could not open log file.");
    
        my $errors = 0;
        
     foreach my $line (<LOGFILE>) {    
        
            my ($vote, $epoch, $campaign, $validity, $choice, $CONN, $MSISDN, $GUID, $Shortcode) = split(' ', $line);
    
        # parse the field:value entries...
        $campaign = substr $campaign, 9, 11, ''; # ssss_uk_01C
        $validity = substr $validity, 9, 6, ''; # during
        $choice = substr $choice, 7, 10, ''; # Brown
    
        if ($vote && $epoch && $campaign && $validity && $choice && $CONN && $MSISDN && $GUID && $Shortcode) {
    
              $sth->execute($vote, $epoch, $campaign, $validity, $choice, $CONN, $MSISDN, $GUID, $Shortcode);
            # debug
            print "$campaign $validity $choice \n"; # ssss_uk_01B during Green
            next;
        } 
    
        $errors++;
    
     }
    
    close (LOGFILE);
    
    
    # debug
    print qq(errors=$errors\n);
    And if I edit votes.txt with null values - the script reports them:

    Code:
    ssss_uk_01B during Tupele 
    ssss_uk_01B pre Joseph 
    ssss_uk_01B during Balls 
    ssss_uk_06B during Balls 
    ssss_uk_01B during Brown 
    ssss_uk_01D pre Boring 
    ssss_uk_01B during Green 
    errors=3
    I think I'm pretty close to the deliverables required (for this part of assignment). Thanks again Lars.

Page 1 of 2 12 LastLast

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •