Hi
I'm working on genetic sequencing data, and want to calculate mean depth of coverage (value column) by chromosome location (chr and start columns) from multiple samples in R.
Each sample data set is in a text file. e.g.
Code:
#chr start end value
chr11 2466324 2466325 4
chr11 2466325 2466326 4
chr11 2466326 2466327 5
chr11 2466327 2466328 5
chr11 2466328 2466329 7
chr11 2466329 2466330 7
chr11 2466330 2466331 242
chr11 2466331 2466332 245
chr11 2466332 2466333 245
chr11 2466333 2466334 245
chr11 2466334 2466335 245
chr11 2466335 2466336 245
chr11 2466336 2466337 245
chr11 2466337 2466338 245
...
Another example:
Code:
#chr start end value
chr11 2466606 2466607 60
chr11 2466607 2466608 310
chr11 2466608 2466609 337
chr11 2466609 2466610 337
chr11 2466610 2466611 337
chr11 2466611 2466612 454
chr11 2466612 2466613 465
chr11 2466613 2466614 468
chr11 2466614 2466615 470
...
You'll see that the tables do not start in the same location (note start and end locations are different in first data row between 2 files). Nevertheless, the majority of the data refers to the same locations. If necessary, I could manually create 1 file with all the locations (chr / start) that are in all files.
I have ~240 files like those above, each with about 12,000-13,000 lines.
I have no experience of creating loops to import multiple files into R (I have been using bash to call an Rscript to calculate individual logs for this data) - and then am unsure how to import them into the correct rows.
These are the commands I am aware of:
Code:
# import 1 table from DepthFile
depthtable <- read.table(DepthFile, quote="")
# how can I import multiple files into this table?
# calculate means using aggregate table function
meantable <- aggregate.table(depthtable$V4,list(depthtable$V2,depthtable$V3),mean)
# export final table (meantable) to MeanFile
write.table(meantable, MeanFile, sep=" ", quote = FALSE, col.names = FALSE, row.names = FALSE)
I need the output file to be similar to above files. i.e. 4 columns of data (headers are irrelevant).
EDIT: I just thought - perhaps I can just concatenate the data files in bash and try the above... Will give that a go and feed back.
Bookmarks