This term project is an attempt to learn statictical modeling methods using various tools in R.
The code may not be exhaustive, but is sufficient for the reader to understand our ideas.
We predict different aspects in "day1" dataset using regtools package in R.
For information on label representation in dataset, go to: dataset documentation
day1 : the entire dataset for this task
dataset : day1 dataset, but only with relevant columns kept
predictors : the names of the columns to be used as predictors
toPredict : the name of the column to predict
intClasses : the columns that contain integer data,
: we use this to round off predictions for those columns
trainData : the dataset to train the models on
testData : the datatset to test the models' performance(s)
splitData(dataSet, splitRatio)
randomly splits the data in 'dataSet' into 'trainData' and 'testData' with ratio 'splitRatio
predictFromTo(featureCols, predcitCol)
predicts the column predictCol using columns featureCols
featureCols : the columns to be used as predictors
predictCol : the column to be predicted
predictUsingLm(SHOW = FALSE)
gets predictions using linear model (lm()) and handles all associated tasks
SHOW : controls whether a plot should be printed
predictUsingKNN(SHOW = FALSE)
gets predictions using clusetering model (basicKNN()) and handles all associated tasks
SHOW : controls whether a plot should be printed