TITLE: Regression Graphics: Identifying Outliers
and Mixtures
SPEAKER: R. Dennis Cook
School of Statistics
University of Minnesota
DATE: February 5, 1999
LOCATION: Funger 310
TIME: 3:00 p.m.
------------------------------------------------------------
Regressions in practice can include outliers and mixtures.
Regression mixtures can occur if there is an omitted
categorical
predictor like gender, species or location, and
different regressions occur within each category.
It will be shown that the theory of regression graphics
based on central
dimension-reduction subspaces can be used to construct
graphical
solutions to long-standing regression problems of this
type.
Under weak conditions, the central subspace
automatically expands to incorporate outliers and regression
mixtures.
Thus, methods of estimating the central subspace
can be expected to identify such structures, without
specifying a model. Examples illustrating this new theory
will be
presented.
------------------------------------------------------------
--------------------------------------------------------------------------------
TITLE: Statistical Methods for Constructing Ozone Exposure
Metrics
SPEAKER: Erin Blankenship
Department of Statistics
North Carolina State University
DATE: February 19, 1999
LOCATION: 321 Funger Hall
TIME: 10:30 a.m.
--------------------------------------------------------------
Studying the effect of ozone on plant growth is crucial
to
understanding the impact of elevated ozone levels on
crop
production. Plant exposure-response studies are complicated
by the fact that ozone measurements (as well as covariate
measurements such as temperature and humidity) are often
measured frequently in time, e.g., hourly, whereas plant
response is measured much less frequently, e.g., at
the end
of a growing season, or over somewhat shorter growth
periods of at least several days. A generally accepted
strategy has been to reduce the ozone measurements during
a
growing season to a single metric, and to use this metric
as a predictor variable for plant response.
The commonly used exposure metrics depend on parameters
that generally have been determined without the use
of
statistical methods. Thus there is no statistically
meaningful measure of variability that can be assigned
to
these parameter values, and statistical comparisons
of
different parameter values are not possible. We use
statistical methods of estimating ozone exposure metric
parameters by casting the problem as a nonlinear regression
model in which the mean depends on regression parameters
as
well as exposure metric parameters. For fixed values
of the
exposure metric parameters, the regression parameters
are
obtained using standard estimation methods, e.g., maximum
likelihood or least squares. In this way a profile
likelihood (or sum of squares) for the exposure metric
parameters is obtained, thus reducing the dimension
of the
optimization problem.
The nonlinear modeling approach provides a convenient
framework for assessing variability and testing hypotheses.
However, the highly nonlinear nature of the exposure
metric
parameters combined with the small sample sizes typical
in
applications, make suspect the usual large-sample methods
of inference. The performance of several methods of
hypothesis testing are evaluated in a Monte Carlo study.
The use of a parametric bootstrap for improving the
performance of the tests is investigated.
---------------------------------------------------------------------
--------------------------------------------------------------------------------
TITLE: Structural Equation Modeling (SEM) For Non-Normal
and Correlated Samples. SPEAKER: Savas Papadopoulos
Rice University DATE: February 26, 1999 LOCATION: 307
Funger Hall TIME: 11:00 am ---------------------------------------------------------------------
A general latent-variable model is fitted on several
samples. It is non-linear in the parameters and contains
variables which account for non-normal variation, fixed
terms, and correlation between samples. Special cases,
widely used in social and behavioral sciences, are confirmatory
factor analysis, LISREL (LInear Structural RELationships),
measurement error models, and path analysis. It is shown
theoretically that the standard methods developed for
normal and independent samples can be applied to non-normal
and correlated samples when a particular parameterization
is followed. The results can be used, for example in
medical studies and economics, when correlated populations
are compared, or when one population is observed over
time. The theoretical results are supported by simulation
studies and are also applied to real data in a study
about the effectiveness of the Head Start Summer Program.
The advantages and the efficiency of the proposed method
in comparison to the standard methods are discussed.
---------------------------------------------------------------------
--------------------------------------------------------------------------------
TITLE: SEQUENTIAL CLASSIFICATION ON PARTIALLY ORDERED
SETS SPEAKER: Curtis Tatsuoka University of California
at Santa Barbara DATE: March 5, 1999 LOCATION: 307 Funger
Hall TIME: 11:00 am ----------------------------------------------------------------------
The problem of classifying observations to a state in
a finite partially ordered set (poset) using a sequence
of experiments is investigated. The distributions for
the experiments depend on which state is the underlying
true one. Each experiment partitions a poset classification
model according to how the elements in the poset share
a distribution for the experiment. Since these partitions
vary depending on the experiment, there can be a gain
in efficiency by sequentially selecting the experiments
to be observed. Classification performance is measured
in a decision-theoretic context that includes a cost
of observation. The main results include establishing
conditions under which the true state posterior probability
converges to one almost surely, and determining optimal
rates of convergence. Properties of various classes
of experiment selection rules are explored. An application
of this methodology is in the implementation of intelligent
tutoring systems. Some empirical results will be discussed.
Other potential applications include the development
of intelligent systems for medical diagnosis. --------------------------------------------------------------------
--------------------------------------------------------------------------------
TITLE: Estimating a Density from Contaminated Observations
SPEAKER: Christian H. Hesse
Department of Mathematics
University of Stuttgart
DATE: March 22, 1999
LOCATION: 321 Funger Hall
TIME: 3:30 p.m.
-----------------------------------------------------------------
We study a data-driven non-parametric procedure for
density estimation
based on observations that are contaminated by additive
measurement
errors. The assumptions placed on the density to be
estimated are mild,
and apart from continuity, do not include additional
smoothness
conditions. The procedure is shown to be asymptotically
optimal both in
the integrated squared error and mean integrated squared
error sense.
A simulation study examines its practical merit.
------------------------------------------------------------------
--------------------------------------------------------------------------------
TITLE: Variations on a theme of Markov: A tutorial
SPEAKER: Nozer D. Singpurwalla
Department of Department of Operations Research
The George Washington University
DATE: May 7, 1999
LOCATION: 307 Funger Hall
TIME: 4:00 p.m.
-----------------------------------------------------------------
In 1913, the great Russian probabilist A. A. Markov
analyzed Pushkin's
Eugene Onegin for the interchange of vowels and consonants
in the Russian
language using an idea that now bears his name - a "Markov
Chain".
Variations on Markov's theme result in: Markov Processes,
Semi-Markov
Processes, Markov Renewal Processes, Markov Modulated
Poisson Process,
Hidden Markov Models, and Markov Random Fields. What
do all these labels
mean and what is the connection between them?
The aim of this talk is to show how stochastic processes
with the
qualifier "Markov" can be systematically constructed.
Our interest in this
topic stems from the problem reliability assessment
under random
environments, in particular, the matter of software
testing. Those
familiar with this subject will find nothing that is
new, just a natural
way of looking at many things; those who don't, should
no more be
intimidated by the sugarcoating of old ideas with new
names.
----------------------------------------------------------------
--------------------------------------------------------------------------------
TITLE: Optimal Sequential Design for Reliability/Clinical
Trials
SPEAKER: Eric Slud
Statistics Program, Math Dept
University of Maryland, College Park
DATE: April 30, 1999
LOCATION: 307 Funger Hall
TIME: 3:00 pm
-----------------------------------------------------------------
In a variety of disciplines, (large-sample) statistical
hypothesis tests are performed in environments where
costs
of experimentation, costs of wrong decisions and opportunity
costs of delayed decisions all play a role in determining
statistical design. However, the designs often reflect
these
costs in only an informal and intuitive way. The purpose
of
this talk is to show how the problem of optimal sequential
designs in several contexts of clinical research, reliability
studies, and quality control can be posed and in many
cases
solved computationally as a Bayesian decision problem,
in
the asymptotic setting where statistics computed over
time
have Gaussian or Brownian-motion distributional behavior.
The main examples discussed in some detail are:
(a) two-batch designs (in clinical research or reliability)
where the size or duration of the second batch depends
on the
statistic-value observed for the first batch; and
(b) clinical trial designs in which accrual of new patients
can stop earlier than overall trial followup.
Work on (a) is joint with my Ph.D. student Eric Leifer,
and
work on (b) is joint with Tony Koutsoukos of Quintiles,
and
Larry Rubinstein of NCI.
References: GENERAL
Bayesian Decision Theory in books of T. Ferguson (1967)
and especially
J. Berger (1985) Statistical Decision Theory and Bayesian
Analysis, 2nd
ed.
The main idea of the talk is taken from
E. Slud (1994) Adaptive Repeated Significance Tests.
University of
Maryland Math. Dept. Tech. Rep.
TWO-BATCH DESIGNS:
Hald (JASA, 1975) & Eric Leifer thesis work
EARLY ACCRUAL STOPPING DESIGNS:
Group-sequential tests [Slud, BioStat Encycl. article]
Papers of Armitage, Pocock, O'Brien-Fleming, Wieand
and especially Slud &
Wei (JASA 1982) and Jennison (Biometrika 1987).
A. Koutsoukos, L. Rubinstein, & E. Slud (1998,
preprint) Early Accrual
Stopping Designs.
|