%%
clear all
disp('Exposure to lead ')
% It is hypothesized that blood levels of lead tend to be higher for children whose parents work in a
% factory that uses lead in the manufacturing process. Researchers
% examined lead levels in the blood of 12 children whose parents
% worked in a battery manufacturing factory.
% The results for the ``case children'' X_{11}, X_{12}, ..., X_{1,12} are compared to
% the ``control'' sample X_{21}, X_{22}, ..., X_{2,15} consisting of 15 children selected randomly
% from the families where the parents did not work in a factory that
% uses lead. It is assumed that the measurements are independent and come from normal populations.
% The resulting sample means and sample standard deviations were
% bar X_1 = .015, s_1 = .004,
% bar X_2 = .006, and s_2 = .006.
%
% Obviously, the sample mean for the exposed children is higher than the sample mean
% in the control sample. But is this difference significant?
%
%
% For the lead exposure example, the null hypothesis to be tested is that there is no influence
% of exposure via parent's working place on the lead concentration, that is, the two populations means
% will will be the same,
%
% H_0: mu_1 = mu_2.
%
% Here the populations are defined as all children that are exposed or non-exposed.
% The alternative hypothesis H_1 may be either one- or two-sided. The two-sided alternative is
% simply H_1: mu_1 ~= mu_2, the population means are not equal and the difference can go either
% way.
% The choice of one-sided hypothesis should be guided by the problem setup, and sometimes by the
% observations. In the context of this example, it would not make sense to take one-sided
% alternative as H_1: mu_1 < mu_2 stating that the concentration in the exposed group is
% smaller than that in the control. In addition, bar X_1 = .015 and
% bar X_2 = .006 are observed.
% Thus, the sensible one-sided hypothesis in this context is, H_1: mu_1 > mu_2.
n1 = 12; X1bar = 0.015; s1 = 0.004;
n2 = 15; X2bar = 0.006; s2 = 0.006;
%Test H_0: sigma_1^2 = sigma_2^2 versus two sided alternative.
Fstat = s1^2/s2^2
% Fstat = 0.4444
%For two sided alternative the p-value is
pval = 2 * min(fcdf(Fstat, n1-1, n2-1), 1-fcdf(Fstat, n1-1, n2-1))
% pval = 0.1825
% Hypothesis H_0 not rejected
%%
%will assume pop variances are the same and use pooled std.
sp = sqrt( ((n1-1)*s1^2 + (n2-1)*s2^2 )/(n1 + n2 - 2) )
% sp =0.0052
df = n1 + n2 - 2 %%df = 25
tstat = (X1bar - X2bar)/(sp * sqrt(1/n1 + 1/n2)) %%tstat=1.9803
pvalue = 1 - tcdf(tstat, n1+n2-2) %%pvalue = 0.0294 approx 3%
LB = X1bar - X2bar - tinv(0.975, df)*sp * sqrt(1/n1 + 1/n2) %%LB=-0.00016
UB = X1bar - X2bar + tinv(0.975, df)*sp * sqrt(1/n1 + 1/n2) %%UB = 0.0082
% Find the power ahaist alternative H_1: mu1 - mu2 = 0.005.
power = 1-normcdf(norminv(1-0.05)-0.005/sqrt(s1^2/n1+s2^2/n2) )
%power= 0.8271
power = normcdf(norminv(0.05)+0.005/sqrt(s1^2/n1+s2^2/n2) )
%power= 0.8271
%Thus the power is about 83%.
%This is an approximation; the exact power is about 81%,
power=1-nctcdf(tinv(1-0.05,n1+n2-2),n1+n2-2,0.005/sqrt(s1^2/n1+s2^2/n2))
%power = 0.8084
%We plan to design a future experiment to test the same phenomenon. When
% data are collected and analyzed, we would like for the alpha= 5% test to achieve a
% power of 1-beta= 90% against the specific alternative H1 : mu1-mu2 = 0.005.
%What sample size will be necessary?
ssize = (s1^2 + s2^2)*(norminv(0.95)+norminv(0.9))^2/(0.005^2)
% ssize = 17.8128 approx 18 each