Results 1 to 4 of 4

Thread: How Can I make a POSIX/C regex non-greedy?

  1. #1
    Join Date
    Sep 2008
    Location
    RPI
    Beans
    52
    Distro
    Ubuntu 10.04 Lucid Lynx

    How Can I make a POSIX/C regex non-greedy?

    So I have a code where I am trying to extract a part of a url, the part between the http:// and the first /. Like this:

    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <sys/types.h>
    #include <string.h>
    #include <regex.h>
    
    int main (int argc, char **argv)
    {
            regex_t regex;
            int reti;
            char msgbuf[100];
            size_t nmatch = 2;
            regmatch_t pmatch[2];
    
            /* Match everything between http:// and
             * the first / */
            reti = regcomp (&regex, "http://\\([^/]*\\)/", 0);
            if (reti) {
                    fprintf (stderr, "Failed to compile regular expression");
                    exit (EXIT_FAILURE);
            }
            reti = regexec (&regex, argv[1], nmatch, pmatch, 0);
    
            char *match = strndup(argv[1]+pmatch[1].rm_so, pmatch[1].rm_eo);
    
            printf ("%s\n", match);
    
            return 0;
    }
    Assume the argv[1] is something like http://www.google.com/stuff. I am trying to match only the www.google.com part. Even with the [^/] in there, it still matches everything. (i.e, www.google.com/stuff) I appreciate any help anyone can provide.

  2. #2
    Join Date
    May 2006
    Beans
    1,790

    Re: How Can I make a POSIX/C regex non-greedy?

    Quote Originally Posted by ItecKid View Post
    So I have a code where I am trying to extract a part of a url, the part between the http:// and the first /. Like this:

    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <sys/types.h>
    #include <string.h>
    #include <regex.h>
    
    int main (int argc, char **argv)
    {
            regex_t regex;
            int reti;
            char msgbuf[100];
            size_t nmatch = 2;
            regmatch_t pmatch[2];
    
            /* Match everything between http:// and
             * the first / */
            reti = regcomp (&regex, "http://\\([^/]*\\)/", 0);
            if (reti) {
                    fprintf (stderr, "Failed to compile regular expression");
                    exit (EXIT_FAILURE);
            }
            reti = regexec (&regex, argv[1], nmatch, pmatch, 0);
    
            char *match = strndup(argv[1]+pmatch[1].rm_so, pmatch[1].rm_eo);
    
            printf ("%s\n", match);
    
            return 0;
    }
    Assume the argv[1] is something like http://www.google.com/stuff. I am trying to match only the www.google.com part. Even with the [^/] in there, it still matches everything. (i.e, www.google.com/stuff) I appreciate any help anyone can provide.
    "so" and "eo" sound like they could mean "start offset" and "end offset". You use "so" that way, but you use "eo" as if it was a count.

  3. #3
    Join Date
    Jun 2007
    Location
    Canada
    Beans
    370

    Re: How Can I make a POSIX/C regex non-greedy?

    Have you tried using "?" to make the selection lazy?

    Code:
    reti = regcomp (&regex, "http://\\([^/]*?\\)/", 0);
    I've never worked with regexes in C, but in other languages that works.
    GCS/O d+(-@) s: a-->? C(++) UL P+ L+++@ E@
    W++$ N++ !o K++ w(++) !O M(-) !V PS+(++)
    PE-() Y+ PGP++ t++(+++@)* 5++ X++@ R+++@
    tv+ b++(+++) DI++ D+ G+ e++>++++ h- r y?

  4. #4
    Join Date
    Sep 2008
    Location
    RPI
    Beans
    52
    Distro
    Ubuntu 10.04 Lucid Lynx

    Re: How Can I make a POSIX/C regex non-greedy?

    Quote Originally Posted by ve4cib View Post
    Have you tried using "?" to make the selection lazy?

    Code:
    reti = regcomp (&regex, "http://\\([^/]*?\\)/", 0);
    I've never worked with regexes in C, but in other languages that works.
    Yeah, the ? only works for Perl-based regexp, not for POSIX...

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •