## Recipes for several model fitting techniques in R

As I recently tried several modeling techniques in R, I would like to share some of these, with a focus on linear regression.

Disclaimer: the code lines below work, but I would not suggest that they are the most efficient way to deal with this kind of data (as a matter of fact, all of them score slightly below 80% accuracy on the Kaggle datasets). Moreover, there are not always the most efficient way to implement a given model.

I see it as a way to quickly test several frameworks without going into details.

The column names used in the examples are from the Titanic track on Kaggle.

### Generalized linear models

```titanic.glm <- glm (survived ~ pclass + sex + age + sibsp, data = titanic, family = binomial(link=logit))
glm.pred <- predict.glm(titanic.glm, newdata = titanicnew, na.action = na.pass, type = "response")
cat(glm.pred)`
```
• cat’ actually prints the output
• One might want to use the na.action switch to be able to deal with incomplete data (as in the Titanic dataset) : na.action=na.pass

### Mixed GAM (Generalized Additive Models) Computation Vehicle

The commands are a little less obvious:

```library(mgcv)
titanic.gam <- gam (survived ~ pclass ...```

## Data analysis and modeling in R: a crash course

Let’s pretend you recently installed R (a software to do statistical computing), you have a text collection you would like to analyze or classify and some time to lose. Here are a few quick commands that could get you a little further. I also write this kind of cheat sheet in order to remember a set of useful tricks and packages I recently gathered and from which I thought they could help others too.

### Letter frequencies

In this example I will use a series of characteristics (or features) extracted from a text collection, more precisely the frequency of each letter from a to z (all lowercase). By the way, it goes as simple as that using Perl and regular expressions (provided you have a `\$text` variable):

```my @letters = ("a" .. "z");
foreach my \$letter (@letters) {
my \$letter_count = () = \$text =~ /\$letter/gi;
printf "%.3f", ((\$letter_count/length(\$text))*100);
}
```

### First tests in R

After having started R (‘R’ command), one usually wants to import data. In this case, my file type is TSV (Tab-Separated Values) and the first row contains only describers (from ‘a’ to ‘z’), which comes at hand later. This is done using the `read.table` command.

`alpha <- read.table("letters_frequency ...`