# Chapter 12 Generalized linear models

Generalized linear models have several similarities with the linear model introduced in the previous chapter. Specifically, generalized linear modeling is a framework for statistical analysis that includes the linear model as a special case.

To show this, we will first compare the function for generalized linear models, `glm()`

, to the function for linear models, `lm()`

. We will use the data from Careja, Elmelund‐Præstekær, Klitgaard, & Larsen (2016) on whether a policy is a retrenchment or not (our outcome) and whether we are looking at a right-wing government or not (as our independent variable). First, we save the data in the object `prw`

(and as always, remember to load the `tidyverse`

).

```
library("tidyverse")
prw <- read_csv("https://raw.githubusercontent.com/erikgahner/sps-prw/master/analysis/prw.csv")
```

We will run the linear model using both `lm()`

and `glm()`

and use `tidy()`

in the `broom`

package to save the parameters in `prw_lm`

and `prw_glm`

, respectively.

```
library("broom")
prw_lm <- prw %>% lm(retrenchment ~ rwgov, data = .) %>% tidy()
prw_glm <- prw %>% glm(retrenchment ~ rwgov, data = .) %>% tidy()
```

Next, we get the output from `prw_lm`

. Here, we see that the intercept is 0.229 and the coefficient for `rwgov`

is 0.0971.

`prw_lm`

```
## # A tibble: 2 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.229 0.0325 7.04 9.73e-12
## 2 rwgov 0.0971 0.0466 2.08 3.78e- 2
```

In `prw_glm`

below we can see that we get the exact same output as for the linear model. (You can also use `prw_lm == prw_glm`

to see that all information is the exact same in the two objects).

`prw_glm`

```
## # A tibble: 2 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.229 0.0325 7.04 9.73e-12
## 2 rwgov 0.0971 0.0466 2.08 3.78e- 2
```

We are using the `glm()`

function because it allows many of the most communly used generalized linear models.

## 12.1 Binary outcomes

If you have a binary outcome, you will most likely run a logit model with `glm()`

instead of a linear model with `lm()`

. However, when we run a logistic regression model with `glm()`

, we do not get easily interpretable unstandardized regression coefficients as with the `lm()`

model. Instead, `glm()`

return the coefficients in the form of logits that we would like to see as probabilities.

To estimate a simple logistic regression model we are going to use the `prw`

data imported into `R`

above and reproduce Model 1 in Table 2 in Careja et al. (2016). We are saving this model in the object `model_retrenchment`

. Here, to tell `R`

that we are interested in a binomial logistic regression, we simply add ‘`family = "binomial"`

’ to the function.

`model_retrenchment <- glm(retrenchment ~ rwgov, data = prw, family = "binomial") `

As for all models, we can use `summary()`

to get the output.

`summary(model_retrenchment)`

```
##
## Call:
## glm(formula = retrenchment ~ rwgov, family = "binomial", data = prw)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.8880 -0.8880 -0.7207 1.4976 1.7177
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.2155 0.1736 -7.000 2.56e-12 ***
## rwgov 0.4885 0.2361 2.069 0.0385 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 431.22 on 365 degrees of freedom
## Residual deviance: 426.89 on 364 degrees of freedom
## (52 observations deleted due to missingness)
## AIC: 430.89
##
## Number of Fisher Scoring iterations: 4
```

In the output, we see that there is a positive effect of `rwgov`

(a positive coefficient under `Estimate`

) and that the p-value is statistically significant at a .05-level (0.0385). Accordingly, we can make some of the same interpretations as when looking at OLS regression models.

However, we need to convert the logit output into probabilities. To do this, we need to convert the logit coefficients to odds and then to probabilities. The easiest way to do this is to use the `plogis()`

function. If we take the intercept in the model, -1.2155, and put it into plogis(), we get:

`plogis(-1.2155)`

`## [1] 0.2287293`

Accordingly, the predicted probability of a left-wing government pursuing a retrenchment policy is 0.2287, or, 22.87%. If we want to calculate the predicted probability for a right-wing government, we simply add the coefficient for `rwgov`

.

`plogis(-1.2155+0.4885)`

`## [1] 0.3258534`

Here we see that the predicted probability for right-wing governments is 0.3259, or 32.59%.

Luckily, there are easier ways to get these numbers. Specifically, we are going to use the `augment()`

function (from the `broom`

package) introduced in the previous chapter and specify that we would like to predict the response for the model. We then want to group the predicted probabilities by the coefficient of interest and get the average predicted probability.

```
augment(model_retrenchment, type.predict = "response") %>%
group_by(rwgov) %>%
summarize(fitted = mean(.fitted))
```

```
## # A tibble: 2 x 2
## rwgov fitted
## <dbl> <dbl>
## 1 0 0.229
## 2 1 0.326
```

As we can see, these predicted probabilities are the same as those obtained with `plogis()`

.

Last, this works similarly with continuous variables. To show this, we are going to use `year`

as the predictor variable and, in addition to the code above, plot a line with `ggplot2`

.

```
# Estimate model
glm(retrenchment ~ year, data = prw, family = "binomial") %>%
# Get predicted probabilities (response)
augment(type.predict = "response") %>%
group_by(year) %>%
summarize(fitted = mean(.fitted)) %>%
# Plot the predicted probabilities
ggplot(aes(year, fitted)) +
geom_line()
```

## 12.2 Other models

### 12.2.1 Event count

The model introduced above also works with count data such as Poisson regression. To do this, you simply specify the family as `poisson(link=log)`

.

`glm(OUTCOME ~ PREDICTOR, data = DATAFRAME, family = poisson(link=log)) `

### 12.2.2 Negative binomial regression

To conduct negative binomial regression models requires the `glm.nb()`

function in the library `MASS`

.

### 12.2.3 Ordinal outcomes

For ordinal outcomes, you should use the `polr()`

function (short for proportional odds logistic regression), which is also part of the `MASS`

package.