More posterior predictive checks: Let θA and θB be the average number of children of men in their 30
menchild30bach = scan('./menchild30bach.dat.txt')
menchild30nobach = scan('./menchild30nobach.dat.txt')
(a).
a = 2
b = 1
N = 5000
thetaA = rgamma(N, a + sum(menchild30bach), b + length(menchild30bach))
thetaB = rgamma(N, a + sum(menchild30nobach), b + length(menchild30nobach))
ynewA = rpois(N, thetaA)
ynewB = rpois(N, thetaB)
pos.predict = data.frame(
ynew = c(ynewA, ynewB),
dist = factor(rep(c('bach', 'nobach'), each = N), levels = c('bach', 'nobach')),
type = 'posterior'
ggplot(pos.predict, aes(x = ynew, group = dist)) +
geom_histogram() +
facet_grid(. ~ dist)
(b).
theta.diff = thetaB - thetaA
print("Theta difference:")
## [1] "Theta difference:"
print(quantile(theta.diff, c(0.025, 0.975)))
##
2.5% 97.5%
## 0.1459141 0.7353060
ynew.diff = ynewB - ynewA
print("Ynew difference:")
## [1] "Ynew difference:"
print(quantile(ynew.diff, c(0.025, 0.975)))
## 2.5% 97.5%
## -2 4
Since the confidence interval for θB - θA does not contain zero, we believe with at least approximately 95%
probability that the true posterior mean θB is greater than θA. However, the same does not hold true for
the means of the difference between two individuals sampled from this distribution, due to the increased
uncertainty in the posterior predictive distribution.
(c) .
emp.df = data.frame(
ynew = c(menchild30bach, menchild30nobach),
dist = factor(c(rep('bach', length(menchild30bach)), rep('nobach', length(menchild30nobach))), levels = c('bach', 'nobach')),
type = 'empirical'
total.df = rbind(pos.predict, emp.df)
ggplot(total.df, aes(x = ynew, y = ..density.., group = type, fill = type)) +
geom_histogram(position = 'dodge') +
facet_grid(. ~ dist)