Page 1 of 2 12 LastLast
Results 1 to 10 of 19

Thread: Conditional Statements in "R"

  1. #1
    Join Date
    Aug 2009
    Beans
    5

    Conditional Statements in "R"

    Can someone tell me how to execute this SAS statement in "R"?
    if team="Texas" then team="TEX";
    I tried:

    if (team=="Texas") {team <- "TEX"}
    but apparently that only works for numeric values.

  2. #2
    Join Date
    Dec 2008
    Beans
    67
    Distro
    Ubuntu 8.10 Intrepid Ibex

    Re: Conditional Statements in "R"

    That should work fine. What error do you get ?

  3. #3
    Join Date
    Jul 2006
    Location
    Germany
    Beans
    1,805

    Re: Conditional Statements in "R"

    It works fine for me.
    Code:
    > team <- "Texas"
    > team
    [1] "Texas"
    > if (team=="Texas") {team <- "TEX"} 
    > team
    [1] "TEX"
    ...

  4. #4
    Join Date
    Oct 2005
    Location
    Albany, NY
    Beans
    828
    Distro
    Ubuntu

    Re: Conditional Statements in "R"

    WARNING: The following comments are written by someone who does NOT know SAS.

    It depends on what you are trying to do. I know SPSS tends to handle variables in a manner very different from R. You can have a variable such as:

    Code:
    team <- "Texas"
    Here, team is a simple variable. With a basic variable, if() will work.

    Code:
    if(team=="Texas") team <- "TEX"
    But, if team looks like this:

    Code:
    team <- c("Texas", "Georgia", "Alabama")
    Now team is a vector. If you try to run if against it, it will fail.
    Code:
    Warning message:
    In if (team == "Texas") team <- "TEX" :
      the condition has length > 1 and only the first element will be used
    Now my vector variable, team, which used to have three values, only has one. In this case it is "TEX" since my if() statement evaluated as True, because "Texas" was the first item in the vector. But, that's probably NOT what you are trying to do. I suspect you may be trying to replace ALL values of "Texas" with "TEX" in a vector, not a variable. To do this, we need to vectorize.

    Code:
    team[team=="Texas"]<-"TEX"
    This will replace the value of "Texas" with "TEX" in the vector team. This would be the "best" way to do this in R. You could also do this:

    Code:
    team<-ifelse(team=="Texas","TEX", team)
    Essentially, these are saying the exact same thing. If you spend some time lurking on the R-users mailing list you will see that the first solution is the preferred methodology by experienced R programmers. It is slightly more compact and is easy to understand, if you understand how R uses vector logic. If you are coming to R from a "traditional" programming languge (don't know if SAS works this way or not) then this whole vector things will seem a little weird to you for a while. It is WELL worth your time to read something like "An Introduction to R" or simpleR to get a basic understanding of how R operates.
    Please Insert Funny Statement Here.

  5. #5
    Join Date
    Aug 2009
    Beans
    5

    Re: Conditional Statements in "R"

    Quote Originally Posted by gunksta View Post
    But, if team looks like this:

    Code:
    team <- c("Texas", "Georgia", "Alabama")
    Now team is a vector. If you try to run if against it, it will fail.
    Code:
    Warning message:
    In if (team == "Texas") team <- "TEX" :
      the condition has length > 1 and only the first element will be used
    This is the message that I got. Team is a vector.

    Now my vector variable, team, which used to have three values, only has one. In this case it is "TEX" since my if() statement evaluated as True, because "Texas" was the first item in the vector. But, that's probably NOT what you are trying to do. I suspect you may be trying to replace ALL values of "Texas" with "TEX" in a vector, not a variable. To do this, we need to vectorize.

    Code:
    team[team=="Texas"]<-"TEX"
    This will replace the value of "Texas" with "TEX" in the vector team. This would be the "best" way to do this in R.
    I tried this, and I got:
    Code:
    Warning message:
    In `[<-.factor`(`*tmp*`, Team == "Texas", value = "TEX") :
      invalid factor level, NAs generated
    Code:
    team<-ifelse(team=="Texas","TEX", team)
    Essentially, these are saying the exact same thing. If you spend some time lurking on the R-users mailing list you will see that the first solution is the preferred methodology by experienced R programmers. It is slightly more compact and is easy to understand, if you understand how R uses vector logic. If you are coming to R from a "traditional" programming languge (don't know if SAS works this way or not) then this whole vector things will seem a little weird to you for a while. It is WELL worth your time to read something like "An Introduction to R" or simpleR to get a basic understanding of how R operates.
    Maybe I should do some studying on vectors, because I'm a little unclear. Thanks much for all you guys's help!

  6. #6
    Join Date
    Oct 2005
    Location
    Wyoming, USA
    Beans
    484
    Distro
    Ubuntu 12.04 Precise Pangolin

    Re: Conditional Statements in "R"

    I will make one more assumption: that you are not simply dealing with a vector, but with a data frame.
    Code:
    team<-rep(c("Texas","Nebraska","Oklahoma"),4)
    awesomeness<-rep(c("sucks","greatest","losers"),4)
    data.frame(team,awesomeness)->team.dat
    In this scenario, gunksta's preferred method will give you an error (although I agree it is the best method for dealing with vectors). However, the ifelse statement will still work if slightly modified:
    Code:
    team.dat # before 
    team.dat$team<-ifelse(team=="Texas","TEX",team.dat$team)
    team.dat$team<-ifelse(team=="Nebraska","NEB",team.dat$team)
    team.dat$team<-ifelse(team=="Oklahoma","OKL",team.dat$team)
    team.dat # after
    This looks nothing like my signature...
    My Page

  7. #7
    Join Date
    Oct 2005
    Location
    Albany, NY
    Beans
    828
    Distro
    Ubuntu

    Re: Conditional Statements in "R"

    akniss - good point.

    chino cochino - Could you post the syntax you are using to create team? I played around with a couple of ideas, but couldn't find a way to get the same error you are. If we knew for sure how you were creating the variable/vector/data.frame/whatever called team, it would help.
    Please Insert Funny Statement Here.

  8. #8
    Join Date
    Mar 2007
    Location
    Finland
    Beans
    256
    Distro
    Ubuntu 9.10 Karmic Koala

    Re: Conditional Statements in "R"

    chino cochino,
    You get the error because your datatype is factor and not string. The easiest way to recode a vector of factors is to use the factor command and specify new labels. See ?factor, I'd look for the syntax for you, but don't have access to R right now.

  9. #9
    Join Date
    Aug 2009
    Beans
    5

    Re: Conditional Statements in "R"

    Quote Originally Posted by gunksta View Post
    akniss - good point.

    chino cochino - Could you post the syntax you are using to create team? I played around with a couple of ideas, but couldn't find a way to get the same error you are. If we knew for sure how you were creating the variable/vector/data.frame/whatever called team, it would help.
    OK, this is what I did. I pretty much imported it from a CSV file:

    Code:
    runs <-read.csv("c:/baseball09/runs.csv")
    > runs
                Team  W  L  G AEQR AEQRA    EQRPG   EQRAPG
    1         Tampa Bay 52 44 96  542   429 5.645833 4.468750
    2  New York Yankees 58 37 95  540   453 5.684211 4.768421
    3               LAA 56 38 94  531   483 5.648936 5.138298
    4               BOS 55 39 94  500   426 5.319149 4.531915
    5               CLE 38 58 96  482   535 5.020833 5.572917
    6               TOR 47 49 96  473   442 4.927083 4.604167
    7               MIN 48 48 96  473   449 4.927083 4.677083
    8               PHI 54 39 93  472   449 5.075269 4.827957
    9             Texas 52 41 93  469   442 5.043011 4.752688
    10         Colorado 52 43 95  467   424 4.915789 4.463158
    11              LAD 61 34 95  464   368 4.884211 3.873684
    12              BAL 41 53 94  445   490 4.734043 5.212766
    13              MIL 48 47 95  443   480 4.663158 5.052632
    14              CHW 50 45 95  442   438 4.652632 4.610526
    15              ARZ 41 55 96  439   460 4.572917 4.791667
    16       Washington 28 67 95  430   493 4.526316 5.189474
    17              DET 49 44 93  412   424 4.430108 4.559140
    18              ATL 49 47 96  408   382 4.250000 3.979167
    19              OAK 40 54 94  407   429 4.329787 4.563830
    20              HOU 49 46 95  404   442 4.252632 4.652632
    21              STL 52 46 98  402   389 4.102041 3.969388
    22              SEA 51 44 95  401   377 4.221053 3.968421
    23              FLA 49 47 96  394   421 4.104167 4.385417
    24              NYM 44 50 94  393   412 4.180851 4.382979
    25              PIT 42 53 95  389   442 4.094737 4.652632
    26              CHC 48 45 93  378   388 4.064516 4.172043
    27               KC 37 57 94  376   429 4.000000 4.563830
    28               SD 37 59 96  369   461 3.843750 4.802083
    29       Cincinnati 44 50 94  362   428 3.851064 4.553191
    30               SF 51 44 95  351   374 3.694737 3.936842
    
    > attach(runs)
    > if (Team=="Texas") {Team<-"TEX"}
    Warning message:
    In if (Team == "Texas") { :
      the condition has length > 1 and only the first element will be used
    I'm still stuck in my SAS way of thinking.
    Last edited by chino cochino; August 15th, 2009 at 11:03 PM.

  10. #10
    Join Date
    Oct 2005
    Location
    Albany, NY
    Beans
    828
    Distro
    Ubuntu

    Re: Conditional Statements in "R"

    I should have asked for your syntax upfront. Would have saved a lot of time. The attached .r file shows two different ways to change "Texas" to "TEX". For consistency, I created a data file from chino cochino's posted syntax. On my computer, the syntax in this .r file works find. YMMV.

    The attached syntax is documented, but there are a couple of things I should point out. attach() is a tricky command to use. I tend to avoid it. It seems nice at first, but many consider it to be a pain in the tail.

    Finally, I should mention that R is a funky little language. It is _very_ good at somethings, and less so at others. R is perfectly good at doing this, but it's not necessarily the first tool I would reach for. I avoid doing complex data munging and data manipulation in R itself. There are times where some of R's structures and conveniences get in the way when doing data munging (in my opinion). For example, this task could be easily done with sed rather than with R since the data already exists as a text file.

    Here's my "typical" work-flow: I usually start with some kind of .csv or .txt file (tab-delimited). When life sucks, I'll start with an Access Database. From there I will do any basic alterations to the data, such as change "Texas" to "TEX" while the data is still in a plain text format. Once this is done, I import the data into PostgreSQL. Postgre does a terrific job managing the data. This is an unnecessary step with small data sets, but most of my data sets are quite a bit larger than this. R is easy to connect to Postgres and I will only import the data into R that I want to analyze. The rest of it can stay in Postgres. This works especially well since I have a separate Postgres server at work and I just run R on my aging laptop. It all works faster when I can throw multiple processors at the problem.

    Let me know if this works syntax works for you.

    Note: The syntax and the .csv are in the tar.gz attachment.
    Attached Files Attached Files
    Last edited by gunksta; August 17th, 2009 at 02:02 AM. Reason: Thanks to unutbu, I fixed a silly typo in the runs.r file.
    Please Insert Funny Statement Here.

Page 1 of 2 12 LastLast

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •