Page 2 of 2 FirstFirst 12
Results 11 to 19 of 19

Thread: Conditional Statements in "R"

  1. #11
    Join Date
    Mar 2007
    Location
    Finland
    Beans
    256
    Distro
    Ubuntu 9.10 Karmic Koala

    Re: Conditional Statements in "R"

    Nice work Gunksta,
    I think though you should use "gsub" instead of "sub", because "gsub" replaces all occurrences and "sub" only the first one. Using substitution is a better way than setting new factor labels, as I suggested before, when you have this many levels and most of them are already in correct format.

    Also I would use runs2.df$Team, rather than runs2.df[,"Team"], but thats just a matter of taste of course.

    I do most of my data handling in R, because I tend work with a lot of files with similar format and in that way I can also easily redistribute the whole analysis with the original data file even for Windows users. Most of the time I just need to do some regular expressions to reformat the data and R is perfectly capable of doing that.

    @chino cochino
    In case you didn't realize why Gunksta's first suggestions fails I'm going to repeat what I said earlier : you were trying to modify a vector of factors which is (sometimes) different from working with strings. You can find out what the type of your variable with the class command. So using Gunkstas data and second example:

    Code:
    runs <- read.csv('runs.csv')
    #See the class
    class(runs$Team)
    [1] "factor"
    #Change the type to character (=string)
    runs$Team <- as.character(runs$Team)
    class(runs$Team)
    [1] "character"
    After the conversion the following should work:

    Code:
     runs$Team[runs$Team=="Texas"] <- "TEX"
    Actually most a lot of my problems that I used to have when I started using R were related to having the "wrong" variable type. Often doing a simple check using "class" gives you a hint on where the problem is. It is not uncommon for R to read in numeric variables as factors when using read.table either, which can also cause some hassle if you don't notice it.

  2. #12
    Join Date
    Jan 2007
    Location
    California
    Beans
    350
    Distro
    Ubuntu 12.10 Quantal Quetzal

    Re: Conditional Statements in "R"

    Suggest you look at the ifelse construction. It was designed to perform logical operations on elements of a vector. Although there are a number of excellent methods appropriate to your case, it is a nice tool to understand.
    Euler_fan
    Ubuntu User # 15369 at the Ubuntu Counter Project
    Woot for GPG

  3. #13
    Join Date
    Oct 2005
    Location
    Albany, NY
    Beans
    842
    Distro
    Ubuntu

    Re: Conditional Statements in "R"

    Quote Originally Posted by ahmatti View Post
    I think though you should use "gsub" instead of "sub", because "gsub" replaces all occurrences and "sub" only the first one.
    I tend to be very cautious with commands/tools like gsub. Perhaps overly so. I've been bitten in the tail more than once working with large data sets after applying a global substitution.

    Quote Originally Posted by ahmatti View Post
    I do most of my data handling in R, because I tend work with a lot of files with similar format and in that way I can also easily redistribute the whole analysis with the original data file even for Windows users. Most of the time I just need to do some regular expressions to reformat the data and R is perfectly capable of doing that.
    That is an excellent point. Using an external database isn't as portable and in many environments it is important to support legacy systems.
    Please Insert Funny Statement Here.

  4. #14
    Join Date
    Mar 2007
    Location
    Finland
    Beans
    256
    Distro
    Ubuntu 9.10 Karmic Koala

    Re: Conditional Statements in "R"

    Quote Originally Posted by gunksta View Post
    I tend to be very cautious with commands/tools like gsub. Perhaps overly so. I've been bitten in the tail more than once working with large data sets after applying a global substitution.

    That is an excellent point. Using an external database isn't as portable and in many environments it is important to support legacy systems.
    I made a mistake, sorry! I thought that sub only replaces the first occurrence in the vector, but it actually replaces the first occurrence in each element of the vector, so its perfectly fine here and safer than gsub. I agree that its very easy to get unwanted behavior with substitutions...

  5. #15
    Join Date
    Aug 2009
    Beans
    5

    Re: Conditional Statements in "R"

    Quote Originally Posted by gunksta View Post
    I should have asked for your syntax upfront. Would have saved a lot of time. The attached .r file shows two different ways to change "Texas" to "TEX". For consistency, I created a data file from chino cochino's posted syntax. On my computer, the syntax in this .r file works find. YMMV.

    The attached syntax is documented, but there are a couple of things I should point out. attach() is a tricky command to use. I tend to avoid it. It seems nice at first, but many consider it to be a pain in the tail.

    Finally, I should mention that R is a funky little language. It is _very_ good at somethings, and less so at others. R is perfectly good at doing this, but it's not necessarily the first tool I would reach for. I avoid doing complex data munging and data manipulation in R itself. There are times where some of R's structures and conveniences get in the way when doing data munging (in my opinion). For example, this task could be easily done with sed rather than with R since the data already exists as a text file.

    Here's my "typical" work-flow: I usually start with some kind of .csv or .txt file (tab-delimited). When life sucks, I'll start with an Access Database. From there I will do any basic alterations to the data, such as change "Texas" to "TEX" while the data is still in a plain text format. Once this is done, I import the data into PostgreSQL. Postgre does a terrific job managing the data. This is an unnecessary step with small data sets, but most of my data sets are quite a bit larger than this. R is easy to connect to Postgres and I will only import the data into R that I want to analyze. The rest of it can stay in Postgres. This works especially well since I have a separate Postgres server at work and I just run R on my aging laptop. It all works faster when I can throw multiple processors at the problem.

    Let me know if this works syntax works for you.

    Note: The syntax and the .csv are in the tar.gz attachment.
    Gunksta,
    I tried to open up the winzip file but after R loaded nothing came up. Sorry for sounding so elementary, but is there syntax required to open the file?

  6. #16
    Join Date
    Mar 2008
    Beans
    4,714
    Distro
    Ubuntu 9.10 Karmic Koala

    Re: Conditional Statements in "R"

    chino cochino, the runs.tar.gz file can be opened double-clicking on it using your file browser. This will create a directory called runs. Inside the runs directory you will find the .r file.

    If you'd like to open the tar.gz file from the terminal, the command is
    Code:
    tar xvzf runs.tar.gz
    Or, perhaps better, you can save this in your ~/.bashrc file:

    Code:
    # Extract files from any archive
    # Usage: op <archive_name>
    # Thanks to rezza at Arch Linux 
    # You may need to install some of these archiving formats. 
    op () {
         if [ -f $1 ] ; then
            case $1 in
                    *.tar.bz2)   tar xvjf $1 ;;
                    *.tar.gz)    tar xvzf $1 ;;
                    *.bz2)       bunzip2 $1 ;;
                    *.rar)       unrar x $1 ;;
                    *.gz)        gunzip $1 ;;
                    *.tar)       tar xvf $1 ;;
                    *.tbz2)      tar xvjf $1 ;;
                    *.tgz)       tar xvzf $1 ;;
                    *.zip)       unzip $1 ;;
                    *.Z)         uncompress $1 ;;
                    *.7z)        7z x $1 ;;
                    *)           xdg-open $1 ;;
            esac
         else
            echo "'$1' is not a valid file"
         fi
    }
    Then open a new terminal (so the change to .bashrc becomes effective) and then you can open just about anything with the short and sweet command
    Code:
    op runs.tar.gz

  7. #17
    Join Date
    Mar 2007
    Beans
    763

    Re: Conditional Statements in "R"

    cool function unutbu, thanks.

  8. #18
    Join Date
    Aug 2009
    Beans
    5

    Re: Conditional Statements in "R"

    Quote Originally Posted by ahmatti View Post
    Nice work Gunksta,


    @chino cochino
    In case you didn't realize why Gunksta's first suggestions fails I'm going to repeat what I said earlier : you were trying to modify a vector of factors which is (sometimes) different from working with strings. You can find out what the type of your variable with the class command. So using Gunkstas data and second example:

    Code:
    runs <- read.csv('runs.csv')
    #See the class
    class(runs$Team)
    [1] "factor"
    #Change the type to character (=string)
    runs$Team <- as.character(runs$Team)
    class(runs$Team)
    [1] "character"
    After the conversion the following should work:

    Code:
     runs$Team[runs$Team=="Texas"] <- "TEX"
    Actually most a lot of my problems that I used to have when I started using R were related to having the "wrong" variable type. Often doing a simple check using "class" gives you a hint on where the problem is. It is not uncommon for R to read in numeric variables as factors when using read.table either, which can also cause some hassle if you don't notice it.
    thanks, I'll try that.

  9. #19
    Join Date
    Oct 2005
    Location
    Albany, NY
    Beans
    842
    Distro
    Ubuntu

    Re: Conditional Statements in "R"

    I was asked to post the syntax from my previous entry in this thread. When I looked at it, it is clear that something went terribly awry when I made the tar.gz file. (Ark in KDE 4.2 has been a little flakey). Here's the syntax:

    Code:
    # Clean everything up.
    rm(list=ls())
    
    #############################################################
    # This is one way to do it.
    
    # Import the data.
    # R converted the column, "Team" to a factor which made 
    # things more difficult when changing the values.
    # as.is="Team" tells R to not convert these values to 
    # factors, which makes our lives easier.
    runs.df<-read.csv("runs.csv", as.is="Team")
    
    # Which rows are equal to "Texas"
    change<-which(runs.df[,"Team"]=="Texas")
    
    #Assign "TEX" to those rows equal to "Texas"
    runs.df[change,"Team"]<-"TEX"
    
    # It's worth noting that factors are a useful feature in R.
    # If I was going to do much to my data, I would definitely
    # convert Team to a factor variable once I was done 
    # Changing things up . . . . OR . . . . . 
    
    
    ##########################################################
    # This is an anothe way to do the same thing.
    # It's shorter and easier to read.
    
    # Import the data.
    runs2.df <- read.csv("runs.csv")
    
    # Replace "Texas" with "TEX"
    # Note that I did not bother with as.is="Team" here.
    # This works on a factor, string, whatever.
    runs2.df[,"Team"] <- sub("Texas","TEX",runs2.df[,"Team"])
    
    # The second syntax is really easier to use and is what
    # I should have recommended the first time.
    If you'd like to run this syntax, you'll need a runs.csv file, which you can create with this:

    "Team","W","L","G","AEQR","AEQRA","EQRPG","EQR APG"
    "Tampa Bay",52,44,96,542,429,5.65,4.47
    "New York Yankees",58,37,95,540,453,5.68,4.77
    "LAA",56,38,94,531,483,5.65,5.14
    "BOS",55,39,94,500,426,5.32,4.53
    "CLE",38,58,96,482,535,5.02,5.57
    "TOR",47,49,96,473,442,4.93,4.6
    "MIN",48,48,96,473,449,4.93,4.68
    "PHI",54,39,93,472,449,5.08,4.83
    "Texas",52,41,93,469,442,5.04,4.75
    "Colorado",52,43,95,467,424,4.92,4.46
    "LAD",61,34,95,464,368,4.88,3.87
    "BAL",41,53,94,445,490,4.73,5.21
    "MIL",48,47,95,443,480,4.66,5.05
    "CHW",50,45,95,442,438,4.65,4.61
    "ARZ",41,55,96,439,460,4.57,4.79
    "Washington",28,67,95,430,493,4.53,5.19
    "DET",49,44,93,412,424,4.43,4.56
    "ATL",49,47,96,408,382,4.25,3.98
    "OAK",40,54,94,407,429,4.33,4.56
    "HOU",49,46,95,404,442,4.25,4.65
    "STL",52,46,98,402,389,4.1,3.97
    "SEA",51,44,95,401,377,4.22,3.97
    "FLA",49,47,96,394,421,4.1,4.39
    "NYM",44,50,94,393,412,4.18,4.38
    "PIT",42,53,95,389,442,4.09,4.65
    "CHC",48,45,93,378,388,4.06,4.17
    "KC",37,57,94,376,429,4,4.56
    "SD",37,59,96,369,461,3.84,4.8
    "Cincinnati",44,50,94,362,428,3.85,4.55
    "SF",51,44,95,351,374,3.69,3.94

    I hope this helps.
    Please Insert Funny Statement Here.

Page 2 of 2 FirstFirst 12

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •