As I recently tried several modeling techniques in R, I would like to share some of these, with a focus on linear regression.
Disclaimer: the code lines below work, but I would not suggest that they are the most efficient way to deal with this kind of data (as a matter of fact, all of them score slightly below 80% accuracy on the Kaggle datasets). Moreover, there are not always the most efficient way to implement a given model.
I see it as a way to quickly test several frameworks without going into details.
The column names used in the examples are from the Titanic track on Kaggle.
Generalized linear models
titanic.glm <- glm (survived ~ pclass + sex + age + sibsp, data = titanic, family = binomial(link=logit)) glm.pred <- predict.glm(titanic.glm, newdata = titanicnew, na.action = na.pass, type = "response") cat(glm.pred)`
- ‘cat’ actually prints the output
- One might want to use the na.action switch to be able to deal with incomplete data (as in the Titanic dataset) : na.action=na.pass
Mixed GAM (Generalized Additive Models) Computation Vehicle
The commands are a little less obvious:
library(mgcv) titanic.gam <- gam (survived ~ pclass ...