While working with a large number of files for data processing, I used the following R commands for data processing. Given that everyone needs to split as well merge and append data – I am just giving some code on splitting data based on parameters , and appending data as well as merging data.
Splitting Data Based on a Parameter.
The following divides the data into subsets which contain either Male or anything else in different datasets.
Input and Subset
Note the read.table command assigns the dataset name X in R environment from the file reference (path denoted by ….)
x <- read.table(....)
rowIndx <- grep("Male", x$col)
write.table(x[rowIndx,], file="match")
write.table(x[-rowIndx,], file="nomatch")
Suppose we need to divide the dataset into multiple data sets.
X17 <- subset(X, REGION == 17) This is prefered to the technique -
attach(X)
X17 = X[REGION == 17,]
Output
For putting the files back to the Windows environment you can use-
write.table(x,file="",row.names=TRUE,col.names=TRUE,sep=" ")
Append
Lets say you have a large number of data files ( say csv files )
that you need to append (assuming the files are in same structure)
after performing basic operations on them.
>setwd("C:\\Documents and Settings\\admin\\My Documents\\Data")
Note this changes the working folder to folder you want it to be,
note the double slashes which are needed to define the path
>list.files(path = ".", pattern = NULL, all.files = FALSE, full.names = FALSE,recursive = FALSE, ignore.case = FALSE)
The R output would be something like below
[1] "calk.csv" "call.csv"
[2]"calm.csv" "caln.csv"
[3]"calo.csv" "calp.csv"
For appending one file repeatedly (like ten times) you can use the command
file.append("A", rep("B", 10))
For Refcards on learning R , the best ones are –
http://cran.r-project.org/doc/contrib/Short–refcard.pdf
and