TITLE: Cook's Distance and Masking
SPEAKER: A. J. Lawrance
School of Mathematics and Statistics
University of Birmingham
DATE: January 9, 1998
LOCATION: 310 Funger Hall
TIME: 11:00 a.m.
------------------------------------------------------------
Cook's distance is a well-known statistic in regression
diagnostics for
assessing parameter estimate influence by case-deletion.
Initial remarks
on some lesser-known aspects in the wider context of
influence will form
the introduction. Since it is concerned with cases individually
it can
miss the influential effects of pairs and more generally
groups of cases;
such difficulties have been referred to as masking although
the definition
has been left rather open. Two approaches will be mentioned,
one in terms
of the more established joint influence and the arguably
preferable one in
terms of the notion of conditional influence, conditional
on the previous
deletion of cases. In respect of the former, a new version
of Cook's
distance appropriate for replicated data will be shown,
and also one for
'oppositely' replicated data. These yield some intuition
on the
distorting effects of joint influence relative to individual
influence.
Masking will be defined in terms of conditional influence
and Cook's
distance, and will indicate the circumstances in which
it can arise.
Exemplification by a constructed and a reported data
set will be
cited. Further work for goodness of fit and testing
influence may be
mentioned.
------------------------------------------------------------
--------------------------------------------------------------------------------
TITLE: Maximum Entropy, Likelihood and Uncertainty:
A
Comparison
SPEAKER: Amos Golan
Visiting Professor at the Economics Department
of the American University
DATE: January 30, 1998
LOCATION: 310 Funger Hall
TIME: 11:00 a.m.
--------------------------------------------------------------
A framework for comparing the maximum likelihood (ML)
and maximum
entropy (ME) approaches is developed. Two types of linear
models are
considered. In the first type, the objective is to estimate
probability
distributions given some moment conditions. In this
case the ME and ML
are equivalent. A generalization of this type of estimation
models to
incorporate noisy data is discussed as well. The second
type of models
encompasses the traditional linear regression type models
where the
number of observations is larger than the number of
unknowns and the
objects to be inferred are not natural probabilities.
After reviewing
the generalized ME estimator and the empirical likelihood
(or weighted
least squares) estimator, the two are contrasted and
compared with ML.
It is shown that, in general, ML type estimators use
less input
information and may be viewed, within the second type
models, as
expected log-likelihood estimators. In terms of informational
ranking,
if the objective is to estimate with minimum a-priori
assumptions, then
the generalized ME estimator is superior to other estimators.
Two
detailed examples, reflecting the two types of models,
are discussed.
The first example deals with estimating the first order
Markov process
from noisy data. In the second example the empirical
(natural) weights
of each observation, together with the other unknowns,
are the subject
of interest.
---------------------------------------------------------------
--------------------------------------------------------------------------------
CANCELLED
TITLE: A Reduction Paradigm for Multivariate Laws SPEAKER:
Francesca Chiaromonte International Institute For Applied
Systems Analysis Laxenburg, Austria DATE: February 12,
1998 LOCATION: 220 Funger Hall TIME: 4:00 p.m. --------------------------------------------------------------
A reduction paradigm is a theoretical framework which
provides a definition of structure for multivariate
laws, and allows to simplify their representation and
statistical analysis. The main idea is to decompose
a law as the superposition of a structural term and
a noise, so that the latter can be neglected without
loss of information on the structure. When the structural
term is supported by a lower-dimensional affine subspace,
an exhaustive dimension reduction is achieved. We describe
the reduction paradigm that results from selecting white
noises, and convolution as superposition mechanism.
--------------------------------------------------------------------
--------------------------------------------------------------------------------
TITLE: Statistical Problems Common to Legal and Medical
Applications
SPEAKER: Boris Freidlin
Emmes Corporation
DATE: February 20, 1998
LOCATION: 320 Funger Hall
TIME: 11:00 a.m.
-----------------------------------------------------------------
When a complaint of discrimination is made an employer
may respond by
hiring or promoting more minorities. From a legal viewpoint,
the practices
in effect during the time period prior to the complaint
are more relevant
for determining liability than those of the post-charge
period. Thus, the
pattern of interest in a fair hiring case is underrepresentation
before
the charge with a change to fair or possible over-hiring
of minorities at
a time point after the charge but prior to the trial.
We present two
adaptations of procedures based on the cusums to obtain
an appropriate
test for this problem. Several data sets that were submitted
to courts in
the US are analyzed by the proposed methods. We obtain
the p-values of the
proposed statistics by simulation. Recent improvements
in Bonferroni's
inequality are utilized to derive a tight upper bound
for these p-values
when data follow the binomial model.
In some statistical applications the precise model
or distribution
underlying the data may not be known, however a family
of scientifically
plausible alternative models can be specified. Gastwirth
(1966, 1985)
proposed a Maximin Efficiency Robust Test (MERT) approach
to constructing
a procedure appropriate for a range of the possible
alternative models.
Podgor et al. (1996) obtained efficiency robust scores
for analysis of
contingency tables. In survival analysis, Tarone (1981)
constructed a
robust procedure by taking the maximum of the two tests
optimal for two
alternative models. Lee (1996) used the maximum of several
tests to
analyze survival data. We determine the power of the
MERT and Max
procedures in both survival and dose response settings.
From the null
correlation matrix of the optimal tests for the alternative
models we
derive guidelines for selecting a robust procedure.
Several biomedical
studies involving survival or categorical data are reanalyzed
to
demonstrate the applicability of the robust procedures.
------------------------------------------------------------------
--------------------------------------------------------------------------------
TITLE: Some Cracks in the Empire of Chance
SPEAKER: Nozer D. Singpurwalla
Operations Research Department
George Washington University
DATE: March 13, 1998
LOCATION: 220 Funger Hall
TIME: 5:00 p.m.
-----------------------------------------------------------------
To address one of the most basic problems of prediction
the speaker
visits "The Palace of Relative Frequencies"
and discovers closets full of
skeletons. He exits fast and returns to "The Temple
of Bayesian Brahmins"
only to be nagged by the thought of how to bet on a
Greek alphabet!
Should he now take a random walk between his ancenstral
home and his new
found Shangrila?
INCENTIVE: The Dean of the School of Engineering promises
to offer any
member of the audience who can nail the speaker a bottle
of wine (colour
and quality unspecified). The Chairman of the Department
of Statistics
promises to double the stakes if the author can be royally
nailed!
----------------------------------------------------------------
--------------------------------------------------------------------------------
TITLE: A Formal Approach to Word Statistics
SPEAKER: Mireille Regnier
INRIA, France
DATE: March 27, 1998
LOCATION: 208 Funger Hall
TIME: 10:30 a.m.
-----------------------------------------------------------------
Evaluation of the frequency of occurrences of a given
set of patterns in
a given text has numerous applications and has been
extensively studied
recently.
We provide a unified framework based on formal languages
and generating functions for this evaluation. It adapts
to
various constraints and allows to extend previous results.
We assume successively that the patterns may, then may
not, overlap.
We derive asymptotic and exact formulae for the moments
in a Markovian
model. We show that our formulae, that occasionally
simplify previous
results, are computable at low cost on a symbolic computation
system.
It makes them useful for practical applications, such
as the search of the
so-called contrast words in DNA sequences.
------------------------------------------------------------------
--------------------------------------------------------------------------------
TITLE: Correlation, Dependence and other permissible
relations.
SPEAKER: Samuel Kotz
George Washington University
Editor in Chief. Encyclopedia of Statistical Sciences.
DATE: April 10, 1998
LOCATION: 320 Funger Hall
TIME: 11:00 a.m.
-------------------------------------------------------------------
New examples of bivariate distributions which are uncorrelated
but highly
dependent are presented. Various measures of dependence
between two (or
more) random variables are discussed. A constructive
approach to
generating distributions with pre-assigned dependence
is proposed.
The lecture is elementary and students with limited
background are
welcome.
-------------------------------------------------------------------
--------------------------------------------------------------------------------
TITLE: Can the global financial markets be outperformed
using
fundamental multiple-factor forecasting models?
SPEAKER: Jose Mario Quintana
Managing Director
CDC Investments, New York
DATE: April 17, 1998
LOCATION: 320 Funger Hall
TIME: 11:00 a.m.
------------------------------------------------------------------------
According to the collective wisdom, comprised of practitioners
and
academicians, the answer is No. More specifically, they
argue that
Multiple-Factor Models cannot deal with the complexities
inherent in
global financial markets, and the use of these models
for constructing
unconstrained Mean-Variance (MV) efficient portfolios,
as prescribed by
Modern Portfolio Theory, is impractical. The reaction
has ranged from a
disregard of econometric models, an imposition of "clever"
constraints
on the MV portfolio optimization, to, finally, the development
of the
Post-Modern Portfolio Theory. However, the question
above does not have
concrete meaning until a forecasting model is specified.
This
presentation will argue that the answer is No for old-fashion
textbook
multiple-factor models, but it is Yes for modern sophisticated
dynamic
(stochastic) multiple factor models. The answer is not
merely academic;
real money, as opposed to paper money, has been on the
line for several
years.
-------------------------------------------------------------------------
--------------------------------------------------------------------------------
TITLE: Sequential Density Estimation to Bound L_1 Error.
SPEAKER: Subrata Kundu
Department of Statistics
The George Washington University
DATE: April 24, 1998
LOCATION: 320 Funger Hall
TIME: 11:00 a.m.
-----------------------------------------------------------------------
The problem of estimating an unknown density f with
bounded Mean
Integrated Absolute Error(MIAE) is considered. Purely
sequential and
two-stage procedures for bounding the MIAE are proposed.
It is
shown that these procedures are asymptotically optimal.
An application in a classification problem is also considered.
--------------------------------------------------------------------------------
The contact person is Reza Modarres at Reza@gwu.edu
or 202-994-6359.
|