More posterior predictive checks: Let θA and θB be the average number of children of men in their 30


menchild30bach = scan('./menchild30bach.dat.txt')

menchild30nobach = scan('./menchild30nobach.dat.txt')

(a).

a = 2

b = 1

N = 5000

thetaA = rgamma(N, a + sum(menchild30bach), b + length(menchild30bach))

thetaB = rgamma(N, a + sum(menchild30nobach), b + length(menchild30nobach))

ynewA = rpois(N, thetaA)

ynewB = rpois(N, thetaB)

pos.predict = data.frame(

ynew = c(ynewA, ynewB),

dist = factor(rep(c('bach', 'nobach'), each = N), levels = c('bach', 'nobach')),

type = 'posterior'

ggplot(pos.predict, aes(x = ynew, group = dist)) +

geom_histogram() +

facet_grid(. ~ dist)

 

 

 

(b).

theta.diff = thetaB - thetaA

print("Theta difference:")

## [1] "Theta difference:"

print(quantile(theta.diff, c(0.025, 0.975)))

##

2.5% 97.5%

## 0.1459141 0.7353060

ynew.diff = ynewB - ynewA

print("Ynew difference:")

## [1] "Ynew difference:"

print(quantile(ynew.diff, c(0.025, 0.975)))

## 2.5% 97.5%

## -2 4

Since the confidence interval for θB - θA does not contain zero, we believe with at least approximately 95%

probability that the true posterior mean θB is greater than θA. However, the same does not hold true for

the means of the difference between two individuals sampled from this distribution, due to the increased

uncertainty in the posterior predictive distribution.

 

(c) .

emp.df = data.frame(

ynew = c(menchild30bach, menchild30nobach),

dist = factor(c(rep('bach', length(menchild30bach)), rep('nobach', length(menchild30nobach))), levels = c('bach', 'nobach')),

type = 'empirical'

total.df = rbind(pos.predict, emp.df)

ggplot(total.df, aes(x = ynew, y = ..density.., group = type, fill = type)) +

geom_histogram(position = 'dodge') +

facet_grid(. ~ dist)

相关