------------------------------------------------------------
TITLE: Robust Mixture Modeling and Applications
SPEAKER: David Scott
Department of Statistics
Rice University
DATE: January 21, 2000
LOCATION: Funger Hall 308
TIME: 11:00 a.m.
------------------------------------------------------------
We investigate the use of the popular nonparametric
integrated squared
error criterion in parametric estimation. Of particular
interest are the
problems of fitting normal mixture densities and linear
regression. The
algorithm is in the class of minimum distance estimators.
We discuss some
of its theoretical properties and compare it to maximum
likelihood.
The robustness of the procedure is demonstrated by example.
The criterion
may be applied in a wide range of models. Two case studies
are given: an
application to a series of yearly household income samples
as well as a
more complex application involves estimating an economic
frontier function
of U.S. banks where the data are assumed to be noisy.
Extensions to
clustering and discrimination problems follow.
--------------------------------------------------------------------------------
------------------------------------------------------------
TITLE: Inference for environmental effects based on
family data taking into account ascertainment and random
genetic effects
SPEAKER: Ruth Pfeiffer
National Cancer Institute Division of Cancer Epidemiology
and Genetics
DATE: January 28, 2000
LOCATION: Funger Hall 323
TIME: 3:00 pm
-----------------------------------------------------------
Genes that underlie complex diseases can sometimes
be detected by studying families with multiple cases.
If common environmental factors are also present, they
should be incorporated into these genetic analyses to
insure the studies have nominal statistical power. Our
interest lies in quantifying the effects of environmental
exposure on individual risk probabilities, given familial
and unmeasured genetic effects. We propose a two level
mixed-effects model that allows us to incorporate a
genetic component accounting for the different genetic
correlations among family members and to adjust for
ascertainment by conditioning on the number of cases
in the family. Conditional maximum likelihood analysis
based on this model is performed. For rare diseases,
we develop a simple approximation to the model. We show
that standard conditional logistic regression of case-control
data with matching on family that conditions on the
number of cases in the family, can yield biased estimates
of exposure effects if genetic correlations are ignored.
Conditions under which the conditional logistic approach
remains applicable are given.
--------------------------------------------------------------------------------
----------------------------------------------------
TITLE: Analyzing recurrent event data with informative
censoring
SPEAKER: Mei-Cheng Wang Department of Biostatistics
Johns Hopkins University
DATE: February 4, 2000
LOCATION: Funger 308 TIME: 11:00 am
----------------------------------------------------------------------
Recurrent event data are frequently encountered in
longitudinal follow-up studies. The non-informative
censoring assumption is usually required for the validity
of statistical methods for analyzing recurrent event
data. In many applications, however, censoring could
be caused by informative drop-out or death, and it is
unrealistic to assume the independence between the recurrent
event process and the censoring time. In this talk,
we consider recurrent events of the same type and allow
the censoring mechanism to be either informative or
non-informative. A multiplicative intensity model which
possesses desirable interpretations is used as the underlying
model. Statistical methods are developed for (i) nonparametric
estimation of the cumulative occurrence rate function,
(ii) kernel estimation of the occurrence rate function,
(iii) semiparametric estimation of regression parameters.
An analysis of the inpatient care data from the AIDS
Link to Intravenous Experiences cohort (ALIVE) is presented.
--------------------------------------------------------------------------------
------------------------------------------------------------
TITLE: Two Approaches to Testing for Disease Clustering
SPEAKER: Marco Bonetti
Harvard School of Public Health and
Dana-Farber Cancer Institute, Boston, MA
DATE: February 17, 2000
LOCATION: Funger 321
TIME: 3:30pm
-------------------------------------------------------------
We address the issue of testing for general clustering
when the
underlying population is not homogeneous. In particular,
we discuss
two new approaches: (a) one based on a geometric transformation
called the "Minkowski polytope"; and (b) one
based on the distribution
of the interpoint distance distribution between pairs
of observations.
We present some theoretical results, and some power
simulations based
on a well-known leukemia data set. For the second method
we show
how the location of the possible cluster(s) can be identified
via
a decomposition of the test statistic.
------------------------------------------------------------------
--------------------------------------------------------------------------------
TITLE: Analysis of accelerated failure time Model, a
useful
alternative to the Cox model.
SPEAKER: Zhezhen Jin
Dept. of Biostatistics
Harvard School of Public Health
DATE: February 25, 2000
LOCATION: Funger 307
TIME: 11:00 a.m.
----------------------------------------------------------------------
The accelerated failure time model is an appealing semi-parametric
model
which is a useful alternative to the Cox proportional
hazards model in
survival analysis. This model is a simple log-linear
model. In this talk,
first we review some existing methods for inferences
about the regression
coefficients. Almost all the methods in the literature
are rather complex
numerically. I will propose a simple and reliable method
to analyze
censored data under this model. The proposed method
can be easily
implemented using linear programming techniques with
existing software for
estimating the regression coefficients of the standard
linear model based
on the $L_1$ norm. The new procedure is illustrated
by analyzing a HIV-1
RNA data set from a study conducted by the AIDS Clinical
Trial Groups.
------------------------------------------------------------------------
--------------------------------------------------------------------------------
TITLE: A General Estimation Method using Spacings
SPEAKER: Kaushik Ghosh
Division of Statistics
George Washington University
DATE: February 29, 2000
LOCATION: Funger 321
TIME: 3:30
---------------------------------------------------------------------------
For most parametric estimation problems, the Maximum
Likelihood method is
justifiably popular because it possesses some very nice
properties,
especially in large samples. It is, however, known to
fail in some cases
since the likelihood function can become unbounded.
A very general method of estimation based on spacings,
i.e., gaps between
successive ordered observations is proposed as an alternative.
The method
produces a class of estimators, which are shown to be
consistent and
asymptotically normal. A special case of this is the
Maximum Product of
Spacings method that has already been discussed in the
literature. Results
of simulation studies investigating small-sample properties
of the
proposed estimators are also presented.
---------------------------------------------------------------------------
--------------------------------------------------------------------------------
TITLE: The Estimating Function Bootstrap
SPEAKER: Feifang Hu
Department of Statistics and Applied Probability
National University of Singapore
DATE: March 9, 2000
LOCATION: Staughton 301
TIME: 3:30
---------------------------------------------------------------------------
In this talk, we propose a general and simple bootstrap
method based on
estimating functions. The proposed method, called the
Estimating Function
(EF) Bootstrap, does not involve resampling the data.
Instead, by
resampling estimated terms in the estimating function,
it provides a way of
directly approximating the distribution of the estimating
function. The EF
Bootstrap has four important advantages:
(i) It often has substantial computational advantage
over more traditional
bootstrap methods.
(ii) It leads to a simple and natural studentized version
that can be
applied with little additional computation.
(iii) The studentized version is functionally invariant
under
reparametrization.
And (iv) the methods can be extended to apply to the
estimation of vector
and nuisance parameters without further computational
difficulty.
The EF bootstrap is compared by simulation with normal
approximations and
other bootstrap methods in a number of examples including
the common means
problem, linear and non linear regression and nonparametric
applications.
The approach performs at least as well as, and often
much better than, the
other methods. Asymptotic results are also obtained
to show that the
studentized versions of the EF Bootstrap yield higher
order approximations
for the whole vector parameter in a wide class of problems.
---------------------------------------------------------------------------
--------------------------------------------------------------------------------
TITLE: Inference on Tree-Valued Random Variables
SPEAKER: David L. Banks
Bureau of Transportation Statistics, DOT
DATE: March 31, 2000
LOCATION: Funger 323
TIME: 3:00
--------------------------------------------------------------------------
This talk describes a model for inference on classfication
and clustering
trees, such as those that arise from CART analysis or
from cluster
analysis. It synthesizes research reported in Shannon
and Banks (1999)
and Banks and Constantine (1998). The approach applies
to cases in which
one observes a random sample of independent trees, all
generated from a
similar mechanism, and wants to infer the central tree.
As an example,
suppose that emergency rooms in ten different hospitals
use their
admissions data to classify patients into high-risk
and low-risk
categories with respect to heart disease. Each hospital
produces an
independent tree, and the medical statistician wishes
to estimate the true
central tree that underlies these ten observations.
This application
raises related issues of stability of the inferred tree
and the impact of
multicollinearity.
--------------------------------------------------------------------------
--------------------------------------------------------------------------------
TITLE: An Optimizing Up-and-Down Design
SPEAKER: Nancy Flournoy
Department of Mathematics and Statistics
American University
DATE: April 7, 2000
LOCATION: Funger 322
TIME: 11:00am
--------------------------------------------------------------------------
I will present a treatment allocation procedure that
is motivated by
Kiefer-Wolfowitz's stochastic approximation procedure.
However, we take
responses to be binary, the possible treatment space
to be a lattice, and
the increment between consecutive "doses"
to be +/- 1 or 0. Let
P{success} be a unimodal function of dose. The Optimizing
Up-and-Down
procedure allocates treatments to pairs of subjects
in a way that causes
the treatment distribution to cluster around the treatment
with maximum
success probability. The procedure defines a random
walk, so well-known
theory is used to explicitly characterize the treatment
distribution.
As an estimator of the best dose, the mode of the empirical
treatment
distribution converges much faster than does the common
estimator using
stochastic approximation.
--------------------------------------------------------------------------
--------------------------------------------------------------------------------
TITLE: Mixed-Effects Multivariate Adaptive Splines Models:
An
Automated Procedure for Fitting Messy Longitudinal Data
and Growth Curves
SPEAKER: Heping Zhang
Division of Biostatistics
Yale University
DATE: April 19, 2000
LOCATION: Staughton Hall 301
TIME: 3:30pm
--------------------------------------------------------------------------
A mixed-effects multivariate adaptive splines model
is proposed to analyze
longitudinal or growth curves data that may or may not
have been collected
through a regular measurement schedule. The MASAL (an
acronym for
multivariate adaptive splines for the analysis of longitudinal
data)
algorithm by Zhang (1994, 1997, 1999) is used to determine
the nonparametric
fixed-effects in the mixed-effects multivariate adaptive
splines model. The
original MASAL algorithm requires the characterization
and specification of
the within subject autocorrelation structure, which
is usually a tedious
while not always rewarding process. In contrast, the
idea of mixed-effects
is introduced to the MASAL algorithm in this work, leading
to an automated
procedure for analysis of longitudinal and growth curves
introduced to the
MASAL algorithm in this work, leading to an automated
procedure for analysis
of longitudinal and growth curves data. To demonstrate
the great potential
of this new procedure, I re-analyzed a data set on the
effect of cocaine use
by pregnant women on the growth of their infants after
birth. The numerical
results are remarkable as opposed to a previously published
analysis by
Zhang (1999) in terms of the dissection of random effects.
--------------------------------------------------------------------------
--------------------------------------------------------------------------------
TITLE: MULTIVARIATE NONPARAMETRIC CONTROL CHARTS USING
SMALL
SAMPLES
SPEAKER: Aleka Kapatou
Department of Statistics
The George Washington University
DATE: April 21, 2000
LOCATION: Funger 322
TIME: 11:00am
--------------------------------------------------------------------------
A multivariate control chart can be used to simultaneously
monitor the means
of two or more correlated variables of a process. A
typical parametric chart
to monitor a process would involve the assumption that
the data follow a
multivariate normal distribution. If this assumption
cannot be made, a
multivariate control chart based on classical nonparametric
statistics could
be used. The nonparametric control charts that we propose
are based on sign
or signed rank statistics. Past sample information for
each variable is
retained through an exponentially weighted moving average
statistic (EWMA)
in order to increase the sensitivity of the charts to
detect small shifts
from the target. It is assumed that the target values
for the means and
certain correlations for the variables are either known
or can be estimated
well. The properties of the nonparametric charts are
evaluated using
simulation. The proposed charts are compared with the
multivariate EWMA
(MEWMA) chart which is based on the sample means.
--------------------------------------------------------------------------
--------------------------------------------------------------------------------
TITLE: Weighing the Risks and Benefits of Tamoxifen
to Prevent
Breast Cancer
SPEAKER: Mitchell Gail
Chief, Biostatistics Branch
Division of Cancer Epidemiology and Genetics
National Cancer Institute
DATE: April 28, 2000
LOCATION: Funger 322
TIME: 11:00am
--------------------------------------------------------------------------
The Breast Cancer Prevention Trial (BCPT) recently demonstrated
a
reduction of nearly 50% in the risk of breast cancer
in women who received
20 mg tamoxifen daily for about 4 years, compared to
women who received a
placebo. The risks of hip, Colles' and spine fractures
were also
reduced.. Unfortunately, tamoxifen was associated with
increased risks of
endometrial cancer, stroke, pulmonary embolus, deep
vein thrombosis and
cataract. The National Cancer Institute sponsored a
workshop in July,1998
to review the information on risks in the presence and
absence of
tamoxifen and to develop methods to weigh the risks
and benefits of
tamoxifen and to convey this information to women who
were seeking advice
as to its use. The risks and benefits of tamoxifen vary
by age and race
and by the initial risk of breast cancer. We shall review
the
information on risks and benefits, methods used to summarize
this
information and identify classes of women likely to
have a net benefit,
and methods for conveying this information to women
seeking advice and
information.
This is joint work with Joseph Costantino, John Bryant,
Robert Croyle,
Lawrence Freedman, Kathy Helzlsouer, Victor Vogel and
other Workshop
Participants.
--------------------------------------------------------------------------
--------------------------------------------------------------------------------
TITLE: Analysis of an algorithm for finding order statistics
via
analytic probability
SPEAKER: Hosam Mahmoud
Department of Statistics
The George Washington University
DATE: May 5, 2000
LOCATION: Funger 321
TIME: 11:00
--------------------------------------------------------------------------
Certain analytic techniques have proved effective in
the study of averages
for random discrete structures and algorithms. Typically
one considers a
generating function for the sequence of averages. One
then sets up a
functional equation (usually arising from a recurrence)
and solves it
asymptotically by a tool-kit that involves a variety
of integral
transforms. The behavior of the generating function
near its dominant
singularities captures the asymptotic nature of the
averages. (Typically,
non-dominant singularities contribute periodic oscillations.)
To address distributions instead of only average-case
analysis,
these have been extended to bivariate generating functions.
One still
considers dominant singularities, which are now functions
of a second
variable. The analysis is most informative in the neighborhood
of certain
values for this second variable. The analysis therefore
is viewed as a
perturbation.
We illustrate this route by a problem that arises in
sorting algorithms.
To find the limit distribution of a sum of dependent
random variables, the
analysis involves perturbation of Rice's integration
method which will be
explained in a tutorial introduction.
--------------------------------------------------------------------------
--------------------------------------------------------------------------------
TITLE: The Role of Statistics in the Mathematical Sciences
at the
NSF
SPEAKER: James Rosenberger
Program Director, Statistics and Probability
NSF
DATE: May 12, 2000
LOCATION: Funger 322
TIME: 11:00 am
--------------------------------------------------------------------------
This talk will provide an inside view of how the National
Science
Foundation operates, from the perspective of a "rotating"
program director.
In particular, I will describe how Statistics is funded
through the
Foundation, directly from the Statistics and Probability
program in the
Division of Mathematical Sciences and from the Methodology,
Measurement and
Statistics (MMS) program through the Social, Behavioral
and Economic
Directorate. Joint funding between programs also provides
a mechanism for
funding collaborative research. I will also mention
and describe special
programs such as REU, IGMS, MSPRF, CAREER, VIGRE, and
FRG. Finally,
cross-disciplinary programs this year which attracted
intense interest in
the research community were Information Technology Research
(ITR) and
Biocomplexity. Our role in these special activities
may offer the biggest
opportunities and challenges in the near future for
the statistics
community.
--------------------------------------------------------------------------------
The contact person is Reza Modarres at Reza@gwu.edu
or 202-994-6359.
|