Logo
Home
People Research Degree Programs Courses Seminar





























 
 
 
Seminar Announcements for Spring 2000
------------------------------------------------------------
TITLE: Robust Mixture Modeling and Applications
SPEAKER: David Scott
Department of Statistics
Rice University
DATE: January 21, 2000
LOCATION: Funger Hall 308
TIME: 11:00 a.m.

------------------------------------------------------------
We investigate the use of the popular nonparametric integrated squared
error criterion in parametric estimation. Of particular interest are the
problems of fitting normal mixture densities and linear regression. The
algorithm is in the class of minimum distance estimators. We discuss some
of its theoretical properties and compare it to maximum likelihood.
The robustness of the procedure is demonstrated by example. The criterion
may be applied in a wide range of models. Two case studies are given: an
application to a series of yearly household income samples as well as a
more complex application involves estimating an economic frontier function
of U.S. banks where the data are assumed to be noisy. Extensions to
clustering and discrimination problems follow.

--------------------------------------------------------------------------------

------------------------------------------------------------
TITLE: Inference for environmental effects based on family data taking into account ascertainment and random genetic effects
SPEAKER: Ruth Pfeiffer

National Cancer Institute Division of Cancer Epidemiology and Genetics

DATE: January 28, 2000
LOCATION: Funger Hall 323
TIME: 3:00 pm
-----------------------------------------------------------

Genes that underlie complex diseases can sometimes be detected by studying families with multiple cases. If common environmental factors are also present, they should be incorporated into these genetic analyses to insure the studies have nominal statistical power. Our interest lies in quantifying the effects of environmental exposure on individual risk probabilities, given familial and unmeasured genetic effects. We propose a two level mixed-effects model that allows us to incorporate a genetic component accounting for the different genetic correlations among family members and to adjust for ascertainment by conditioning on the number of cases in the family. Conditional maximum likelihood analysis based on this model is performed. For rare diseases, we develop a simple approximation to the model. We show that standard conditional logistic regression of case-control data with matching on family that conditions on the number of cases in the family, can yield biased estimates of exposure effects if genetic correlations are ignored. Conditions under which the conditional logistic approach remains applicable are given.

--------------------------------------------------------------------------------

----------------------------------------------------
TITLE: Analyzing recurrent event data with informative censoring

SPEAKER: Mei-Cheng Wang Department of Biostatistics Johns Hopkins University

DATE: February 4, 2000

LOCATION: Funger 308 TIME: 11:00 am

----------------------------------------------------------------------

Recurrent event data are frequently encountered in longitudinal follow-up studies. The non-informative censoring assumption is usually required for the validity of statistical methods for analyzing recurrent event data. In many applications, however, censoring could be caused by informative drop-out or death, and it is unrealistic to assume the independence between the recurrent event process and the censoring time. In this talk, we consider recurrent events of the same type and allow the censoring mechanism to be either informative or non-informative. A multiplicative intensity model which possesses desirable interpretations is used as the underlying model. Statistical methods are developed for (i) nonparametric estimation of the cumulative occurrence rate function, (ii) kernel estimation of the occurrence rate function, (iii) semiparametric estimation of regression parameters. An analysis of the inpatient care data from the AIDS Link to Intravenous Experiences cohort (ALIVE) is presented.


--------------------------------------------------------------------------------

------------------------------------------------------------

TITLE: Two Approaches to Testing for Disease Clustering
SPEAKER: Marco Bonetti
Harvard School of Public Health and
Dana-Farber Cancer Institute, Boston, MA
DATE: February 17, 2000
LOCATION: Funger 321
TIME: 3:30pm

-------------------------------------------------------------
We address the issue of testing for general clustering when the
underlying population is not homogeneous. In particular, we discuss
two new approaches: (a) one based on a geometric transformation
called the "Minkowski polytope"; and (b) one based on the distribution
of the interpoint distance distribution between pairs of observations.
We present some theoretical results, and some power simulations based
on a well-known leukemia data set. For the second method we show
how the location of the possible cluster(s) can be identified via
a decomposition of the test statistic.
------------------------------------------------------------------


--------------------------------------------------------------------------------


TITLE: Analysis of accelerated failure time Model, a useful
alternative to the Cox model.
SPEAKER: Zhezhen Jin
Dept. of Biostatistics
Harvard School of Public Health
DATE: February 25, 2000
LOCATION: Funger 307
TIME: 11:00 a.m.

----------------------------------------------------------------------
The accelerated failure time model is an appealing semi-parametric model
which is a useful alternative to the Cox proportional hazards model in
survival analysis. This model is a simple log-linear model. In this talk,
first we review some existing methods for inferences about the regression
coefficients. Almost all the methods in the literature are rather complex
numerically. I will propose a simple and reliable method to analyze
censored data under this model. The proposed method can be easily
implemented using linear programming techniques with existing software for
estimating the regression coefficients of the standard linear model based
on the $L_1$ norm. The new procedure is illustrated by analyzing a HIV-1
RNA data set from a study conducted by the AIDS Clinical Trial Groups.
------------------------------------------------------------------------


--------------------------------------------------------------------------------


TITLE: A General Estimation Method using Spacings
SPEAKER: Kaushik Ghosh
Division of Statistics
George Washington University
DATE: February 29, 2000
LOCATION: Funger 321
TIME: 3:30

---------------------------------------------------------------------------
For most parametric estimation problems, the Maximum Likelihood method is
justifiably popular because it possesses some very nice properties,
especially in large samples. It is, however, known to fail in some cases
since the likelihood function can become unbounded.

A very general method of estimation based on spacings, i.e., gaps between
successive ordered observations is proposed as an alternative. The method
produces a class of estimators, which are shown to be consistent and
asymptotically normal. A special case of this is the Maximum Product of
Spacings method that has already been discussed in the literature. Results
of simulation studies investigating small-sample properties of the
proposed estimators are also presented.
---------------------------------------------------------------------------


--------------------------------------------------------------------------------


TITLE: The Estimating Function Bootstrap
SPEAKER: Feifang Hu
Department of Statistics and Applied Probability
National University of Singapore
DATE: March 9, 2000
LOCATION: Staughton 301
TIME: 3:30

---------------------------------------------------------------------------
In this talk, we propose a general and simple bootstrap method based on
estimating functions. The proposed method, called the Estimating Function
(EF) Bootstrap, does not involve resampling the data. Instead, by
resampling estimated terms in the estimating function, it provides a way of
directly approximating the distribution of the estimating function. The EF
Bootstrap has four important advantages:
(i) It often has substantial computational advantage over more traditional
bootstrap methods.
(ii) It leads to a simple and natural studentized version that can be
applied with little additional computation.
(iii) The studentized version is functionally invariant under
reparametrization.
And (iv) the methods can be extended to apply to the estimation of vector
and nuisance parameters without further computational difficulty.

The EF bootstrap is compared by simulation with normal approximations and
other bootstrap methods in a number of examples including the common means
problem, linear and non linear regression and nonparametric applications.
The approach performs at least as well as, and often much better than, the
other methods. Asymptotic results are also obtained to show that the
studentized versions of the EF Bootstrap yield higher order approximations
for the whole vector parameter in a wide class of problems.
---------------------------------------------------------------------------


--------------------------------------------------------------------------------


TITLE: Inference on Tree-Valued Random Variables
SPEAKER: David L. Banks
Bureau of Transportation Statistics, DOT
DATE: March 31, 2000
LOCATION: Funger 323
TIME: 3:00

--------------------------------------------------------------------------
This talk describes a model for inference on classfication and clustering
trees, such as those that arise from CART analysis or from cluster
analysis. It synthesizes research reported in Shannon and Banks (1999)
and Banks and Constantine (1998). The approach applies to cases in which
one observes a random sample of independent trees, all generated from a
similar mechanism, and wants to infer the central tree. As an example,
suppose that emergency rooms in ten different hospitals use their
admissions data to classify patients into high-risk and low-risk
categories with respect to heart disease. Each hospital produces an
independent tree, and the medical statistician wishes to estimate the true
central tree that underlies these ten observations. This application
raises related issues of stability of the inferred tree and the impact of
multicollinearity.
--------------------------------------------------------------------------


--------------------------------------------------------------------------------


TITLE: An Optimizing Up-and-Down Design
SPEAKER: Nancy Flournoy
Department of Mathematics and Statistics
American University
DATE: April 7, 2000
LOCATION: Funger 322
TIME: 11:00am

--------------------------------------------------------------------------
I will present a treatment allocation procedure that is motivated by
Kiefer-Wolfowitz's stochastic approximation procedure. However, we take
responses to be binary, the possible treatment space to be a lattice, and
the increment between consecutive "doses" to be +/- 1 or 0. Let
P{success} be a unimodal function of dose. The Optimizing Up-and-Down
procedure allocates treatments to pairs of subjects in a way that causes
the treatment distribution to cluster around the treatment with maximum
success probability. The procedure defines a random walk, so well-known
theory is used to explicitly characterize the treatment distribution.
As an estimator of the best dose, the mode of the empirical treatment
distribution converges much faster than does the common estimator using
stochastic approximation.
--------------------------------------------------------------------------


--------------------------------------------------------------------------------


TITLE: Mixed-Effects Multivariate Adaptive Splines Models: An
Automated Procedure for Fitting Messy Longitudinal Data
and Growth Curves
SPEAKER: Heping Zhang
Division of Biostatistics
Yale University
DATE: April 19, 2000
LOCATION: Staughton Hall 301
TIME: 3:30pm

--------------------------------------------------------------------------
A mixed-effects multivariate adaptive splines model is proposed to analyze
longitudinal or growth curves data that may or may not have been collected
through a regular measurement schedule. The MASAL (an acronym for
multivariate adaptive splines for the analysis of longitudinal data)
algorithm by Zhang (1994, 1997, 1999) is used to determine the nonparametric
fixed-effects in the mixed-effects multivariate adaptive splines model. The
original MASAL algorithm requires the characterization and specification of
the within subject autocorrelation structure, which is usually a tedious
while not always rewarding process. In contrast, the idea of mixed-effects
is introduced to the MASAL algorithm in this work, leading to an automated
procedure for analysis of longitudinal and growth curves introduced to the
MASAL algorithm in this work, leading to an automated procedure for analysis
of longitudinal and growth curves data. To demonstrate the great potential
of this new procedure, I re-analyzed a data set on the effect of cocaine use
by pregnant women on the growth of their infants after birth. The numerical
results are remarkable as opposed to a previously published analysis by
Zhang (1999) in terms of the dissection of random effects.
--------------------------------------------------------------------------


--------------------------------------------------------------------------------


TITLE: MULTIVARIATE NONPARAMETRIC CONTROL CHARTS USING SMALL
SAMPLES
SPEAKER: Aleka Kapatou
Department of Statistics
The George Washington University
DATE: April 21, 2000
LOCATION: Funger 322
TIME: 11:00am

--------------------------------------------------------------------------
A multivariate control chart can be used to simultaneously monitor the means
of two or more correlated variables of a process. A typical parametric chart
to monitor a process would involve the assumption that the data follow a
multivariate normal distribution. If this assumption cannot be made, a
multivariate control chart based on classical nonparametric statistics could
be used. The nonparametric control charts that we propose are based on sign
or signed rank statistics. Past sample information for each variable is
retained through an exponentially weighted moving average statistic (EWMA)
in order to increase the sensitivity of the charts to detect small shifts
from the target. It is assumed that the target values for the means and
certain correlations for the variables are either known or can be estimated
well. The properties of the nonparametric charts are evaluated using
simulation. The proposed charts are compared with the multivariate EWMA
(MEWMA) chart which is based on the sample means.
--------------------------------------------------------------------------


--------------------------------------------------------------------------------


TITLE: Weighing the Risks and Benefits of Tamoxifen to Prevent
Breast Cancer
SPEAKER: Mitchell Gail
Chief, Biostatistics Branch
Division of Cancer Epidemiology and Genetics
National Cancer Institute
DATE: April 28, 2000
LOCATION: Funger 322
TIME: 11:00am

--------------------------------------------------------------------------
The Breast Cancer Prevention Trial (BCPT) recently demonstrated a
reduction of nearly 50% in the risk of breast cancer in women who received
20 mg tamoxifen daily for about 4 years, compared to women who received a
placebo. The risks of hip, Colles' and spine fractures were also
reduced.. Unfortunately, tamoxifen was associated with increased risks of
endometrial cancer, stroke, pulmonary embolus, deep vein thrombosis and
cataract. The National Cancer Institute sponsored a workshop in July,1998
to review the information on risks in the presence and absence of
tamoxifen and to develop methods to weigh the risks and benefits of
tamoxifen and to convey this information to women who were seeking advice
as to its use. The risks and benefits of tamoxifen vary by age and race
and by the initial risk of breast cancer. We shall review the
information on risks and benefits, methods used to summarize this
information and identify classes of women likely to have a net benefit,
and methods for conveying this information to women seeking advice and
information.

This is joint work with Joseph Costantino, John Bryant, Robert Croyle,
Lawrence Freedman, Kathy Helzlsouer, Victor Vogel and other Workshop
Participants.
--------------------------------------------------------------------------


--------------------------------------------------------------------------------


TITLE: Analysis of an algorithm for finding order statistics via
analytic probability
SPEAKER: Hosam Mahmoud
Department of Statistics
The George Washington University
DATE: May 5, 2000
LOCATION: Funger 321
TIME: 11:00
--------------------------------------------------------------------------
Certain analytic techniques have proved effective in the study of averages
for random discrete structures and algorithms. Typically one considers a
generating function for the sequence of averages. One then sets up a
functional equation (usually arising from a recurrence) and solves it
asymptotically by a tool-kit that involves a variety of integral
transforms. The behavior of the generating function near its dominant
singularities captures the asymptotic nature of the averages. (Typically,
non-dominant singularities contribute periodic oscillations.)
To address distributions instead of only average-case analysis,
these have been extended to bivariate generating functions. One still
considers dominant singularities, which are now functions of a second
variable. The analysis is most informative in the neighborhood of certain
values for this second variable. The analysis therefore is viewed as a
perturbation.
We illustrate this route by a problem that arises in sorting algorithms.
To find the limit distribution of a sum of dependent random variables, the
analysis involves perturbation of Rice's integration method which will be
explained in a tutorial introduction.
--------------------------------------------------------------------------


--------------------------------------------------------------------------------


TITLE: The Role of Statistics in the Mathematical Sciences at the
NSF
SPEAKER: James Rosenberger
Program Director, Statistics and Probability
NSF
DATE: May 12, 2000
LOCATION: Funger 322
TIME: 11:00 am
--------------------------------------------------------------------------
This talk will provide an inside view of how the National Science
Foundation operates, from the perspective of a "rotating" program director.
In particular, I will describe how Statistics is funded through the
Foundation, directly from the Statistics and Probability program in the
Division of Mathematical Sciences and from the Methodology, Measurement and
Statistics (MMS) program through the Social, Behavioral and Economic
Directorate. Joint funding between programs also provides a mechanism for
funding collaborative research. I will also mention and describe special
programs such as REU, IGMS, MSPRF, CAREER, VIGRE, and FRG. Finally,
cross-disciplinary programs this year which attracted intense interest in
the research community were Information Technology Research (ITR) and
Biocomplexity. Our role in these special activities may offer the biggest
opportunities and challenges in the near future for the statistics
community.

 




--------------------------------------------------------------------------------
The contact person is Reza Modarres at Reza@gwu.edu

or 202-994-6359.

 

 
 
 
   
Home Site Map