------------------------------------------------------------
Title: Model Selection via Out-of-Sample Forecast Error
Comparisons: Practice and Theory
SPEAKER: Dr. David F. Findley
Bureau of the Census
Statistical Research Division
DATE: January 26, 2001
LOCATION: Funger Hall 308
TIME: 11:00 a.m.
------------------------------------------------------------
Many monthly economic time series can be effectively
modeled with autoregressive moving average models that
include regressors for the effects of moving holidays
and/or other calendar effects or outliers. Such models
are fit to some Box-Cox transformation of the observed
data. The comparison of two competing models with different
transformations, different holiday interval lengths,
different outliers, or with autoregressive versus moving
average components, is a non-nested comparison--neither
model is special case of the other. There are no practical
hypothesis tests for such comparisons. In this situation,
a natural approach is to compare the models' abilities
to forecast the most recent data, after excluding this
data from the data span used to estimate model parameters.
We will show empirical results demonstrating the versatility
of a forecast comparison diagnostic that implements
this idea and is available in the Census Bureau's X-12-ARIMA
and X-12-Graph software. Then we will present a new
theoretical result that suggests some of the diagnostic's
observed behavior and does not require the assumption
that any model considered is correct.
------------------------------------------------------------
Title: Recursive Estimation for Misspecified MA(1) Models
SPEAKER: Mr. Jim Cantor
Department of Statistics
DATE: February 9, 2001
LOCATION: Funger Hall 308
TIME: 11:00 a.m.
------------------------------------------------------------
In this talk, new results for recursive parameter estimation
for misspecified models are presented. "Recursive"
means that the parameter estimate for the series at
time (and length) t is obtained as a function of the
parameter estimate at t-1 and the data value at time
t. "Misspecified" means that the model type
fitted does not match the system generating the data.
Specifically, we investigate the situation in which
a first order moving average model, MA(1), is fit to
data from a stationary first order autoregression, AR(1),
using two standard recursive estimation methods: pseudolinear
regression (PLR) and (monitored) recursive maximum likelihood
(RML). Using a minimum mean squared one-step-ahead forecast
error criterion, we show that PLR converges almost surely
but to a non- optimal parameter value. For monitored
RML, we show that the optimal parameter value is a cluster
point for the recursion almost surely. These results
represent the first rigorous analysis of PLR and monitored
RML in the misspecified model situation.
------------------------------------------------------------
Title: Random Walks on Wreath Products of Groups
Speaker: Professor Clyde Schoolfield
Harvard University
DATE: February 16, 2001
LOCATION: Funger Hall 321
TIME: 11:00 a.m.
------------------------------------------------------------
For a certain random walk on the symmetric group S_n
that is generated by random transpositions, Diaconis
and Shahshahani (1981) obtained bounds on the rate of
convergence to uniformity using group representation
theory. Similarly, we bound the rate of convergence
to uniformity for a random walk on the hyperoctahedral
group Z_2 | S_n that is generated by random signed transpositions.
Specifically, we determine that, to first order in n,
1/2 n log n steps are both necessary and sufficient
for total variation distance to become small. Moreover,
we show that our walk exhibits the so-called ``cutoff
phenomenon.'' We extend our results on this random walk
to the generalized symmetric groups Z_m | S_n and further
to the complete monomial groups G |S_n for any finite
group G. As an example, we will describe an application
of our results to mathematical biology.
------------------------------------------------------------
Title: Options and Discontinuity: An Asymptotic Decomposition
for Trading Algorithm
Speaker: Seongjoo Song
Department of Statistics, University of Chicago
Date: February 23, 2001
Location: Funger Hall 321
Time: 11:00 a.m.
------------------------------------------------------------
The problem of hedging contingent claims is well understood
in a complete financial market. In such a market, any
contingent claim can be replicated exactly by trading
available securities with large enough initial capital.
On the other hand, the risk of any option cannot be
hedged away completely when the market is incomplete.
There are many different causes of incompleteness. Among
them, discontinuity of the underlying asset price process
is a very important cause.
This is because the discontinuous model fits the data
better than any continuous model, and in particular
because it incorporates such very real phenomena as
crashes and devaluations, which can upset any trading
strategy. This paper studies the problem of option pricing
and hedging in the Presence of such discontinuities
by adopting an asymptotic approach, letting securities
prices converge to continuous processes. We then study
the first order error in this convergence. The first
order error term after we hedge an option with the classical
Black-Scholes strategy is decomposed into a part which
can be traded away and a part which is purely unreplicable.
First, I modify the Black-Scholes hedging strategy by
adding the replicable part of the first order error
and secondly, I adopt the mean-variance hedging method
by Duffie and Richardson(1991) and Schweizer(1992) for
the nonreplicable part. Under some regularity conditions,
the closed form solution is obtained for the hedging
strategy which minimizes the mean square of the hedging
error. Besides, I propose several approaches to price
a contingent claim and compared their performances.
In addition to assuming continuous time hedging, in
this setting, I also study the properties of hedging
at intervals, as the length of such intervals goes to
zero. Some results of simulation and real market data
application are also provided. In simulation, we see
that the new hedging strategy improves the classical
Black-Scholes hedging strategy up to 30\% in terms of
the mean square of hedging error, when the distribution
of log stock price is skewed.
------------------------------------------------------------
Title: A method of moments for random recursive structures
Speaker: Professor Hsien-Kuei Hwang
Academia Sinica
DATE: March 2, 2001
LOCATION: Funger Hall 308
TIME: 11:00 a.m.
------------------------------------------------------------
I will present a method of moments that is very useful
for random variables defined in some recursive manner.
Many examples including m-ary search trees and quickselect
will be used to describe the method. (The method of
moments is a "traditional" way of deriving
limit laws; its application, although primitive by modern
probability standards, has several advantages, especially
when applying to recursive random variables.)
------------------------------------------------------------
Title: Recent Advances in Ranked Set Sampling
Speaker: Professor Ram Tiwari
Department of Mathematics
DATE: March 9, 2001
LOCATION: Funger Hall 308
TIME: 11:00 a.m.
------------------------------------------------------------
The ranked set sampling procedure is a two-step sampling
scheme in which a subgroup of independently sampled
items are collected and ranked, but only one item from
the subgroup is chosen for complete measurement. The
item’s rank within the subgroup is noted, so the
final sample consists of independent order statistics.
If the subgroup sizes are identical (say n) and each
of the n different order statistics are sampled in equal
proportion, the ranked set sample is said to be balanced.
The talk consists of two parts. In the first part, we
consider the underlying population to be a member of
the location-scale families of symmetric distributions,
and derive unbiased estimators of the population mean
and variance. In the second part, we assume that the
underlying distribution is unknown and modeled nonparametrically,
and derive its Bayes estimator with respect to an ordered
Dirichlet distribution as prior.
------------------------------------------------------------
Title: Classifying Tumors and Assessing the Survival
of Tumor Patients using Microarray Gene Expression Data
Speaker: Danh Nguyen
University of California at Davis
DATE: March 14, 2001
LOCATION: Funger Hall 321
TIME: 11:00 a.m.
------------------------------------------------------------
The introduction of DNA microarray technology is a
technical advance in the biomedical research. Specifically,
the use of microarray technology, such as complementary
DNA (cDNA) and oligonucleotide arrays, allows simultaneous
monitoring of thousands of gene expressions per sample.
Data from microarray experiments presents a data analytical
or methodological challenge, since the number of variables
(genes) far exceed the number of samples. In this talk,
we explore the use of dimension reduction methods in
conjunction with classification methods for classifying
tumor types based on array gene expression data. The
primary dimension reduction methods considered is partial
least squares (PLS) and principal components analysis
(PCA). When survival times of patients are tracted it
is also of interest to estimate the survival probabilities
of patients following certain gene expression patterns
(profiles). We illustrate the methods to various microarray
gene expression data sets: (1) ovarian, (2) acute leukemia
(3), B-cell lymphoma and (4) colon data sets.
------------------------------------------------------------
Title: Computational Sequence Analysis: Genome and Statistical
Controversies
Speaker: Professor Pranab K. Sen
Department of Biostatistics and Statistics
DATE: March 16, 2001
LOCATION: Funger Hall 308
TIME: 11:00 a.m.
------------------------------------------------------------
Computational biology is an interdisciplinary field;
principles of molecular genetics govern computational
sequence analysis. For human GENOME sequences, we encounter
some nonstandard statistical models where high-dimensional
categorical data models crop up, often, without perceptible
quantitative undercurrents. As such, conventional (continuous
or discrete) multivariate analysis may encounter computational
as well as conceptual difficulties. Limitations of (conditional-,
partial-, profile-, pseudo-, and quasi-) likelihoods
are appraised in this context; without an acceptable
topology that defines neighborhoods, for statistical
modeling and analysis, there might not be enough incentive
to pursue a parametric (l9kelihood) approach. Bayesian
perspectives fare better, though there may be some concern
from validity and robustness considerations. Alternatives
that take into account underlying biological implications
to a greater (and parametrics to a lesser) extent are
appraised and advocated on a case by case basis.
------------------------------------------------------------
Title: The Statistics Department and The Biostatistics
Center: Past Collaborations and discussion of Future
Possibilities
Speaker: Dr. Sarah Fowler
Department of Statistics and Biostatistics Center of
the George Washington University
DATE: March 30, 2001
LOCATION: Funger Hall 308
TIME: 11:00 a.m.
-----------------------------------------------------------
The purpose of the presentation is to give a "social
history" of collaborations between Biostatistics
Center (BSC) faculty and regular faculty of the Department
of Statistics (DOS) and to stimulate discussion about
how to foster future collaborations. Topics to be covered
include: a description of the current BSC, its research
projects and activities; the grant development and review
process; and a chronology of BSC administration, teaching
and collaboration with DOS faculty from 1972 to the
present. In particular, the presentation will describe
the nature and productivity of 7 statistical methods
grants, and how the involvement of DOS regular faculty
and doctoral students in the BSC research projects has
lead to collaborations on statistical theory and methods
applicable to clinical trials.
------------------------------------------------------------
Title: Survey Sampling Methodology and Analysis of Complex
Survey Data
Speaker: Dr. Leyla Mohadjer, Westat Corporation
DATE: April 20, 2001
LOCATION: Funger Hall 308
TIME: 11:00 a.m.
------------------------------------------------------------
This talk will provide a general overview of recent
developments in survey research and methodology with
more emphasis on the specific areas I have worked on
recently. Most of the research and methodology in survey
sampling is developed in the twentieth century. By the
1970’s, major surveys were undertaken to meet
the needs of statistical agencies and researchers. The
field has expanded at a very rapid paste in the past
thirty years, especially as researchers and government
agencies have learned about the values of surveys in
achieving their goals. The main method of sample design
(stratification, clustering, multistage sampling, etc.)
is described in textbooks published in the 1950’s.
The recent developments include refinements and extensions
of these methods. The focus of the research is to derive
efficient sample designs and data collection procedures.
The talk will include a number of examples of recent
developments in survey sampling methodology.
One of the challenges facing survey practitioners is
survey nonresponse. There has been increasing concern
that nonresponse rate has been rising. That is, over
time, it has become more difficult to obtain cooperation.
Thus greater efforts occur in the field to increase
response rate. A sizable number of experiments are conducted
to test various approaches to improve response rate.
The talk will include descriptions of the results of
a couple of experiments conducted to improve survey
response rate. For data analysis, most standard techniques
used in statistical packages assume that observations
are independent and drawn using simple random sampling,
and that all sampled cases have participated in the
survey. From these assumptions, classical statistical
theory has developed a wide variety of estimators that
are valid under these conditions. These requirements
are often not met in sample surveys since it is usually
cost effective to select samples through a complex multi-stage
design (e.g., involving stratification, clustering of
units, and the use of several stages of selection) rather
than through simple random sampling. Once a sample departs
from simple random sampling, and in the presence of
nonresponse, however, new computational procedures are
required in order to take into account the impact of
survey design and nonresponse on statistical estimation.
The talk will include descriptions of a number of statistical
software packages currently available for analysis of
data from complex surveys.
-------------------------------------------------------------
Title: Middle-censoring and Applications
Speaker: Professor S. Rao Jammalamadaka
Department of Statistics, University of California at
Santa Barbara
DATE: April 18, 2001
LOCATION: Funger Hall 308
TIME: 11:00 a.m.
------------------------------------------------------------
In connection with survival analysis, there is considerable
literature which treats data that is censored from the
left, right or both. In this talk, we consider situations
where the data becomes unobservable if it falls inside
a random interval in the middle. This happens in clinical
trials and lifetime studies where a subject is temporarily
absent or withdrawn from the study and the event of
interest occurs during this period, so that the exact
time of occurrence cannot be observed. Both left and
right censoring are special cases of such "middle-censoring."
The nonparametric maximum likelihood estimator of the
survival function is derived in this context as a solution
to the self-consistency equation and its large-sample
properties discussed.
-------------------------------------------------------------
Title: Symmetry and the Covariance Structure of Ordered
Dependent Observations
Speaker: Dr. Marlos Viana
Eye Research Institute, The University of Illinois at
Chicago
DATE: April 26, 2001
LOCATION: Funger Hall 307
TIME: 3:00 p.m.
------------------------------------------------------------
In the analysis of data from bilateral biological processes
(e.g., vision, hearing) it is often required to model
the vector of ordered joint observations and its relation
to one or more covariates. In this talk we will discuss
the covariance structure of ordered dependent observations
under a class of permutation and block-permutation symmetric
covariance tructures. Applications include the analysis
of joint extreme (best, orst) observations from dependent
bilateral measurements. The covariance structure of
ordered, cyclically-symmetric dependent observations
will also be discussed. Applications include the analysis
of extreme observations from corneal curvature topographic
maps. Related reading are can be found at http://www.uic.edu/~viana/
-------------------------------------------------------------
Title: Analyzing gene expression data from microarrays:
a mixture-based approach
Speaker: Professor Francesca Chiaromonte
Department of Statistics, Pennsylvania State University
DATE: May 4, 2001
LOCATION: Funger Hall 321
TIME: 11:00 ap.m.
------------------------------------------------------------
The analysis of global gene expression data from microarrays
is breaking new ground in genetics research, while confronting
modelers and statisticians with critical issues related
to size, exploration, modeling and error management.
Clustering of expression profiles, as a means of identifying
functionally related and possibly co-regulated genes,
has been the focus of much literature to date. We use
a clustering scheme based on multivariate normal mixtures
that allows us to (i) robustify the analysis through
the introduction of a contamination term, (ii) blend
exploration and modeling through the use of free and
constrained means, and (iii) provide cluster membership
probabilities, as opposed to simple memberships, for
the genes. Maximum likelihood estimation of the parameters
is performed via EM algorithm. We present some preliminary
results on published data comparing k-means clustering
to mixture based clustering whose likelihood maximization
was initialized through k-means memberships.
--------------------------------------------------------------------------------
The contact person is Reza Modarres at Reza@gwu.edu
or 202-994-6359.
|