gunksta
August 14th, 2009, 02:06 PM
WARNING: The following comments are written by someone who does NOT know SAS.
It depends on what you are trying to do. I know SPSS tends to handle variables in a manner very different from R. You can have a variable such as:
team <- "Texas"Here, team is a simple variable. With a basic variable, if() will work.
if(team=="Texas") team <- "TEX"But, if team looks like this:
team <- c("Texas", "Georgia", "Alabama")Now team is a vector. If you try to run if against it, it will fail.
Warning message:
In if (team == "Texas") team <- "TEX" :
the condition has length > 1 and only the first element will be usedNow my vector variable, team, which used to have three values, only has one. In this case it is "TEX" since my if() statement evaluated as True, because "Texas" was the first item in the vector. But, that's probably NOT what you are trying to do. I suspect you may be trying to replace ALL values of "Texas" with "TEX" in a vector, not a variable. To do this, we need to vectorize.
team[team=="Texas"]<-"TEX"This will replace the value of "Texas" with "TEX" in the vector team. This would be the "best" way to do this in R. You could also do this:
team<-ifelse(team=="Texas","TEX", team)Essentially, these are saying the exact same thing. If you spend some time lurking on the R-users mailing list you will see that the first solution is the preferred methodology by experienced R programmers. It is slightly more compact and is easy to understand, if you understand how R uses vector logic. If you are coming to R from a "traditional" programming languge (don't know if SAS works this way or not) then this whole vector things will seem a little weird to you for a while. It is WELL worth your time to read something like "An Introduction to R" or simpleR to get a basic understanding of how R operates.
chino cochino
August 14th, 2009, 10:35 PM
But, if team looks like this:
team <- c("Texas", "Georgia", "Alabama")Now team is a vector. If you try to run if against it, it will fail.
Warning message:
In if (team == "Texas") team <- "TEX" :
the condition has length > 1 and only the first element will be used
This is the message that I got. Team is a vector.
Now my vector variable, team, which used to have three values, only has one. In this case it is "TEX" since my if() statement evaluated as True, because "Texas" was the first item in the vector. But, that's probably NOT what you are trying to do. I suspect you may be trying to replace ALL values of "Texas" with "TEX" in a vector, not a variable. To do this, we need to vectorize.
team[team=="Texas"]<-"TEX"This will replace the value of "Texas" with "TEX" in the vector team. This would be the "best" way to do this in R.
I tried this, and I got:
Warning message:
In `[<-.factor`(`*tmp*`, Team == "Texas", value = "TEX") :
invalid factor level, NAs generated
team<-ifelse(team=="Texas","TEX", team)Essentially, these are saying the exact same thing. If you spend some time lurking on the R-users mailing list you will see that the first solution is the preferred methodology by experienced R programmers. It is slightly more compact and is easy to understand, if you understand how R uses vector logic. If you are coming to R from a "traditional" programming languge (don't know if SAS works this way or not) then this whole vector things will seem a little weird to you for a while. It is WELL worth your time to read something like "An Introduction to R" or simpleR to get a basic understanding of how R operates.
Maybe I should do some studying on vectors, because I'm a little unclear. Thanks much for all you guys's help!
gunksta
August 26th, 2009, 06:36 PM
I was asked to post the syntax from my previous entry in this thread. When I looked at it, it is clear that something went terribly awry when I made the tar.gz file. (Ark in KDE 4.2 has been a little flakey). Here's the syntax:
# Clean everything up.
rm(list=ls())
################################################## ###########
# This is one way to do it.
# Import the data.
# R converted the column, "Team" to a factor which made
# things more difficult when changing the values.
# as.is="Team" tells R to not convert these values to
# factors, which makes our lives easier.
runs.df<-read.csv("runs.csv", as.is="Team")
# Which rows are equal to "Texas"
change<-which(runs.df[,"Team"]=="Texas")
#Assign "TEX" to those rows equal to "Texas"
runs.df[change,"Team"]<-"TEX"
# It's worth noting that factors are a useful feature in R.
# If I was going to do much to my data, I would definitely
# convert Team to a factor variable once I was done
# Changing things up . . . . OR . . . . .
################################################## ########
# This is an anothe way to do the same thing.
# It's shorter and easier to read.
# Import the data.
runs2.df <- read.csv("runs.csv")
# Replace "Texas" with "TEX"
# Note that I did not bother with as.is="Team" here.
# This works on a factor, string, whatever.
runs2.df[,"Team"] <- sub("Texas","TEX",runs2.df[,"Team"])
# The second syntax is really easier to use and is what
# I should have recommended the first time.
If you'd like to run this syntax, you'll need a runs.csv file, which you can create with this:
"Team","W","L","G","AEQR","AEQRA","EQRPG","EQRAPG"
"Tampa Bay",52,44,96,542,429,5.65,4.47
"New York Yankees",58,37,95,540,453,5.68,4.77
"LAA",56,38,94,531,483,5.65,5.14
"BOS",55,39,94,500,426,5.32,4.53
"CLE",38,58,96,482,535,5.02,5.57
"TOR",47,49,96,473,442,4.93,4.6
"MIN",48,48,96,473,449,4.93,4.68
"PHI",54,39,93,472,449,5.08,4.83
"Texas",52,41,93,469,442,5.04,4.75
"Colorado",52,43,95,467,424,4.92,4.46
"LAD",61,34,95,464,368,4.88,3.87
"BAL",41,53,94,445,490,4.73,5.21
"MIL",48,47,95,443,480,4.66,5.05
"CHW",50,45,95,442,438,4.65,4.61
"ARZ",41,55,96,439,460,4.57,4.79
"Washington",28,67,95,430,493,4.53,5.19
"DET",49,44,93,412,424,4.43,4.56
"ATL",49,47,96,408,382,4.25,3.98
"OAK",40,54,94,407,429,4.33,4.56
"HOU",49,46,95,404,442,4.25,4.65
"STL",52,46,98,402,389,4.1,3.97
"SEA",51,44,95,401,377,4.22,3.97
"FLA",49,47,96,394,421,4.1,4.39
"NYM",44,50,94,393,412,4.18,4.38
"PIT",42,53,95,389,442,4.09,4.65
"CHC",48,45,93,378,388,4.06,4.17
"KC",37,57,94,376,429,4,4.56
"SD",37,59,96,369,461,3.84,4.8
"Cincinnati",44,50,94,362,428,3.85,4.55
"SF",51,44,95,351,374,3.69,3.94
I hope this helps.
Powered by vBulletin® Version 4.2.2 Copyright © 2024 vBulletin Solutions, Inc. All rights reserved.