Can someone tell me how to execute this SAS statement in "R"?
I tried:if team="Texas" then team="TEX";
but apparently that only works for numeric values.if (team=="Texas") {team <- "TEX"}
Can someone tell me how to execute this SAS statement in "R"?
I tried:if team="Texas" then team="TEX";
but apparently that only works for numeric values.if (team=="Texas") {team <- "TEX"}
That should work fine. What error do you get ?
It works fine for me.
Code:> team <- "Texas" > team [1] "Texas" > if (team=="Texas") {team <- "TEX"} > team [1] "TEX"
...
WARNING: The following comments are written by someone who does NOT know SAS.
It depends on what you are trying to do. I know SPSS tends to handle variables in a manner very different from R. You can have a variable such as:
Here, team is a simple variable. With a basic variable, if() will work.Code:team <- "Texas"
But, if team looks like this:Code:if(team=="Texas") team <- "TEX"
Now team is a vector. If you try to run if against it, it will fail.Code:team <- c("Texas", "Georgia", "Alabama")
Now my vector variable, team, which used to have three values, only has one. In this case it is "TEX" since my if() statement evaluated as True, because "Texas" was the first item in the vector. But, that's probably NOT what you are trying to do. I suspect you may be trying to replace ALL values of "Texas" with "TEX" in a vector, not a variable. To do this, we need to vectorize.Code:Warning message: In if (team == "Texas") team <- "TEX" : the condition has length > 1 and only the first element will be used
This will replace the value of "Texas" with "TEX" in the vector team. This would be the "best" way to do this in R. You could also do this:Code:team[team=="Texas"]<-"TEX"
Essentially, these are saying the exact same thing. If you spend some time lurking on the R-users mailing list you will see that the first solution is the preferred methodology by experienced R programmers. It is slightly more compact and is easy to understand, if you understand how R uses vector logic. If you are coming to R from a "traditional" programming languge (don't know if SAS works this way or not) then this whole vector things will seem a little weird to you for a while. It is WELL worth your time to read something like "An Introduction to R" or simpleR to get a basic understanding of how R operates.Code:team<-ifelse(team=="Texas","TEX", team)
Please Insert Funny Statement Here.
This is the message that I got. Team is a vector.
I tried this, and I got:Now my vector variable, team, which used to have three values, only has one. In this case it is "TEX" since my if() statement evaluated as True, because "Texas" was the first item in the vector. But, that's probably NOT what you are trying to do. I suspect you may be trying to replace ALL values of "Texas" with "TEX" in a vector, not a variable. To do this, we need to vectorize.
This will replace the value of "Texas" with "TEX" in the vector team. This would be the "best" way to do this in R.Code:team[team=="Texas"]<-"TEX"
Code:Warning message: In `[<-.factor`(`*tmp*`, Team == "Texas", value = "TEX") : invalid factor level, NAs generatedMaybe I should do some studying on vectors, because I'm a little unclear. Thanks much for all you guys's help!Essentially, these are saying the exact same thing. If you spend some time lurking on the R-users mailing list you will see that the first solution is the preferred methodology by experienced R programmers. It is slightly more compact and is easy to understand, if you understand how R uses vector logic. If you are coming to R from a "traditional" programming languge (don't know if SAS works this way or not) then this whole vector things will seem a little weird to you for a while. It is WELL worth your time to read something like "An Introduction to R" or simpleR to get a basic understanding of how R operates.Code:team<-ifelse(team=="Texas","TEX", team)
I will make one more assumption: that you are not simply dealing with a vector, but with a data frame.
In this scenario, gunksta's preferred method will give you an error (although I agree it is the best method for dealing with vectors). However, the ifelse statement will still work if slightly modified:Code:team<-rep(c("Texas","Nebraska","Oklahoma"),4) awesomeness<-rep(c("sucks","greatest","losers"),4) data.frame(team,awesomeness)->team.dat
Code:team.dat # before team.dat$team<-ifelse(team=="Texas","TEX",team.dat$team) team.dat$team<-ifelse(team=="Nebraska","NEB",team.dat$team) team.dat$team<-ifelse(team=="Oklahoma","OKL",team.dat$team) team.dat # after
This looks nothing like my signature...
My Page
akniss - good point.
chino cochino - Could you post the syntax you are using to create team? I played around with a couple of ideas, but couldn't find a way to get the same error you are. If we knew for sure how you were creating the variable/vector/data.frame/whatever called team, it would help.
Please Insert Funny Statement Here.
chino cochino,
You get the error because your datatype is factor and not string. The easiest way to recode a vector of factors is to use the factor command and specify new labels. See ?factor, I'd look for the syntax for you, but don't have access to R right now.
OK, this is what I did. I pretty much imported it from a CSV file:
I'm still stuck in my SAS way of thinking.Code:runs <-read.csv("c:/baseball09/runs.csv") > runs Team W L G AEQR AEQRA EQRPG EQRAPG 1 Tampa Bay 52 44 96 542 429 5.645833 4.468750 2 New York Yankees 58 37 95 540 453 5.684211 4.768421 3 LAA 56 38 94 531 483 5.648936 5.138298 4 BOS 55 39 94 500 426 5.319149 4.531915 5 CLE 38 58 96 482 535 5.020833 5.572917 6 TOR 47 49 96 473 442 4.927083 4.604167 7 MIN 48 48 96 473 449 4.927083 4.677083 8 PHI 54 39 93 472 449 5.075269 4.827957 9 Texas 52 41 93 469 442 5.043011 4.752688 10 Colorado 52 43 95 467 424 4.915789 4.463158 11 LAD 61 34 95 464 368 4.884211 3.873684 12 BAL 41 53 94 445 490 4.734043 5.212766 13 MIL 48 47 95 443 480 4.663158 5.052632 14 CHW 50 45 95 442 438 4.652632 4.610526 15 ARZ 41 55 96 439 460 4.572917 4.791667 16 Washington 28 67 95 430 493 4.526316 5.189474 17 DET 49 44 93 412 424 4.430108 4.559140 18 ATL 49 47 96 408 382 4.250000 3.979167 19 OAK 40 54 94 407 429 4.329787 4.563830 20 HOU 49 46 95 404 442 4.252632 4.652632 21 STL 52 46 98 402 389 4.102041 3.969388 22 SEA 51 44 95 401 377 4.221053 3.968421 23 FLA 49 47 96 394 421 4.104167 4.385417 24 NYM 44 50 94 393 412 4.180851 4.382979 25 PIT 42 53 95 389 442 4.094737 4.652632 26 CHC 48 45 93 378 388 4.064516 4.172043 27 KC 37 57 94 376 429 4.000000 4.563830 28 SD 37 59 96 369 461 3.843750 4.802083 29 Cincinnati 44 50 94 362 428 3.851064 4.553191 30 SF 51 44 95 351 374 3.694737 3.936842 > attach(runs) > if (Team=="Texas") {Team<-"TEX"} Warning message: In if (Team == "Texas") { : the condition has length > 1 and only the first element will be used
Last edited by chino cochino; August 15th, 2009 at 11:03 PM.
I should have asked for your syntax upfront. Would have saved a lot of time. The attached .r file shows two different ways to change "Texas" to "TEX". For consistency, I created a data file from chino cochino's posted syntax. On my computer, the syntax in this .r file works find. YMMV.
The attached syntax is documented, but there are a couple of things I should point out. attach() is a tricky command to use. I tend to avoid it. It seems nice at first, but many consider it to be a pain in the tail.
Finally, I should mention that R is a funky little language. It is _very_ good at somethings, and less so at others. R is perfectly good at doing this, but it's not necessarily the first tool I would reach for. I avoid doing complex data munging and data manipulation in R itself. There are times where some of R's structures and conveniences get in the way when doing data munging (in my opinion). For example, this task could be easily done with sed rather than with R since the data already exists as a text file.
Here's my "typical" work-flow: I usually start with some kind of .csv or .txt file (tab-delimited). When life sucks, I'll start with an Access Database. From there I will do any basic alterations to the data, such as change "Texas" to "TEX" while the data is still in a plain text format. Once this is done, I import the data into PostgreSQL. Postgre does a terrific job managing the data. This is an unnecessary step with small data sets, but most of my data sets are quite a bit larger than this. R is easy to connect to Postgres and I will only import the data into R that I want to analyze. The rest of it can stay in Postgres. This works especially well since I have a separate Postgres server at work and I just run R on my aging laptop. It all works faster when I can throw multiple processors at the problem.
Let me know if this works syntax works for you.
Note: The syntax and the .csv are in the tar.gz attachment.
Last edited by gunksta; August 17th, 2009 at 02:02 AM. Reason: Thanks to unutbu, I fixed a silly typo in the runs.r file.
Please Insert Funny Statement Here.
Bookmarks