January 16, 2009
Title: Robust Peters-Belson Type Estimators of Measures of Disparity and their Applications in Employment Discrimination Cases
Speaker: Hiro Hikawa, Department of Statistics, George Washington University
Abstract:
In discrimination cases concerning equal pay, the Peters-Belson (PB) regression method is used to estimate the pay disparities
between minority and majority employees after accounting for major covariates (e.g., seniority, education). Unlike the standard
approach, which uses a dummy variable to indicate protected group status, the PB method first fits a linear regression model for
the majority group. The resulting regression equation is then used to predict the salary of each minority employee by using their
individual covariates in the equation. The difference between the actual and the predicted salaries of each minority employee
estimates the pay differential for that minority employee, which takes into account legitimate job-related factors. The average
difference estimates a measure of pay disparity. In practice, however, a linear regression model may not be sufficient to capture
the actual pay-setting practices of the employer. Therefore, we use a locally weighted regression model in the PB approach as
a specific functional form of the relationship between pay and relevant covariates is no longer needed. The statistical properties
of the new procedure are developed and compared to those of the standard methods. The method also extends to the case
with a binary (1-0) response, e.g., hiring or promotion. Both simulation studies and re-analysis of actual data show that, in
general, the locally weighted PB regression method reflects the true mean function more accurately than the linear model,
especially when the true function is not a linear or logit (for a 1-0 response) model. Moreover, only a small loss of efficiency
is incurred when the true relation follows a linear or logit model.
Date: Friday, January 16, 2008
Time: 11:00-12:00 noon
Location: Monroe Hall, Room 113 (2115 G Street, NW, Washington, DC 20052)
January 30, 2009
Title:
Avoiding Lawsuits with a Bayesian Approach to Product Engineering
Speaker: Robert F. Bordley, General Motors
Abstract:
Mathematical programming --- which involves optimizing an objective
function subject to various constraints --- has long recognized that the
coefficients in both the constraints and the objective function are often
uncertain. Standard expected utility analysis can easily resolve problems
when the uncertainties only appear in the objective function. But utility
approaches have, in the past, not been considered useful when uncertainties
appear in the constraints and the objective function. Instead two alternative
approaches are used to these `stochastic programming problems'. The first
approach treats any violation of the constraints as rectifiable in the future at
some cost. Given this assumption, the stochastic optimization problem can
be formulated as an unconstrained multi-stage optimization problem which
can be solved with expected utility theory (even though it is not common to
do so). The second approach does not assume that violation of the
constraints can be rectified at some cost. This approach is widely used in,
for example, reliability-based design optimization where engineers must
determine design specifications for physical structures (planes, vehicles,
buildings, etc.) which, if they fail to withstand certain stresses, could lead to
the loss of human life. This approach maximizes an objective function
subject to an upper bound (generally one in a thousand) on the probability
of the constraints being violated. Unfortunately it has been shown that this
approach (called chance-constrained programming), is inconsistent with
utility theory and can lead to a negative value of information.
Date: Friday, January 30, 2009
Time: 11:30-12:30pm
Location: Duques Hall, Room 453 (2201 G Street, NW, Washington, DC 20052)
February 13, 2009
Title: Random Partition Models Indexed with Covariates
Speaker:
Peter Müller,
Department of Biostatistics, MD Anderson Cancer Center, University of Texas
Abstract:
We propose a model for covariate-dependent clustering, i.e., we develop a
probability model for random partitions that is indexed by covariates. The
motivating application is inference for a clinical trial. As part of the desired
inference we wish to define clusters of patients. Defining a prior probability
model for cluster memberships should include a regression on patient
baseline covariates. We build on product partition models (PPM). We define
an extension of the PPM to include the desired regression. This is achieved
by including in the cohesion function a new factor that increases the
probability of experimental units with similar covariates to be included in
the same cluster. We discuss an application to clinical trial design. The
proposed model is used to implement borrowing of strength across nonexchangeable
sub-populations.
Date: Friday, February 13, 2009
Time: 3:30-4:30 pm
Location: Duques Hall, Room 552 (2201 G Street, NW, Washington, DC 20052)
February 20, 2009
Title: Record Linkage Modeling
Speaker:
Michael Larsen,
Iowa State University, Department of Statistics and Center for Survey Statistics & Methodology
Abstract:
Record linkage, or exact file matching, consists of bringing together records in two or more files on the same population.
Files are linked for the purposes of creating a larger database, enabling analyses that would otherwise not be possible,
and counting the population. When unique, error-free identification codes are not available on both files, then record
linkage can be accomplished through probabilistic methods. The U.S. Census Bureau uses record linkage in population
undercount estimation. The National Center for Health Statistics uses record linkage to match surveys to the National
Death Index (NDI) for studies of mortality and morbidity. This talk discusses advances in record linkage theory related
to these efforts. The models allow estimation of error rates and decision making about match/nonmatch status of pairs
of records. Methods of record linkage that enforce one-to-one matching between individuals have been implemented.
Generally the files being linked at census and NCHS have been unduplicated, so that one-to-one matching is required.
Bayesian methods that allow variability across blocks and incorporate one-to-one matching into statistical models have
been studied. Advances have been made in analysis of files created through record linkage, including some accounting
of potential matching errors. The work on record linkage has direct relevance for methods of preserving confidentiality
in publicly released databases.
Date: Friday, February 20, 2009
Time: 2:15-3:15 pm
Location: Department of Statistics (2140 Pennsylvania Ave, NW, Washington, DC 20052)
February 27, 2009
Title: High Dimensional Statistics in Genomics: Some New Problems and Solutions
Speaker:
Hongzhe Li,
Department of Biostatistics and Epidemiology, University of Pennsylvania
Abstract:
Large-scale systematic genomic datasets have been generated to inform our biological
understanding of both the normal workings of organisms in biology and disrupted
processes which cause human disease. The integrative analysis of these datasets,
which has become an increasingly important part of genomics and systems biology
research, poses many interesting statistical problems, largely driven by the complex
inter-relationships between high-dimensional genomic measurements. In this talk, I
will present three problems in genomics research that require the development of new
statistical methods: (1) identification of active transcription factors in microarray
time-course experiments; (2) identification of subnetworks that are associated with
some clinical outcomes; and (3) identification of the genetic variants that explain
higher-order gene expression modules. I will present several regularized estimation
methods to address these questions and demonstrate their applications using real
data examples. I will also discuss some theoretical properties of these procedures.
Date: Friday, February 27, 2009
Time: 11:00-12:00 pm
Location: Monroe Hall, Room 113 (2115 G Street, NW, Washington, DC 20052)
March 6, 2009
Title:
Combinatorial Patterns for Probabilistically Constrained Optimization Problems
Speaker:
Miguel Lejeune,
Department of Decision Sciences, George Washington University
Abstract:
We propose a new framework for the solution of probabilistically constrained
optimization problems by extending some recent developments in combinatorial
pattern theory. The method involves the binarization of the probability
distribution and the generation of a consistent partially defined Boolean
function (pdBf) representing the combination (F,p) of the binarized
probability distribution F and the enforced probability level p. We
represent the pdBf representing (F,p) as a disjunctive normal form taking
the form of a collection of combinatorial patterns. We propose a new integer
programming-based method for the derivation of combinatorial patterns and
present several methods allowing for the construction of a disjunctive
normal form that defines necessary and sufficient conditions for the
probabilistic constraint to hold. The obtained disjunctive forms are then
used to generate deterministic reformulations of the original stochastic
problem. The method is implemented for the solution of a numerical problem.
Extensions to the present study are discussed.
Date: Friday, March 6, 2009
Time: 3:30-4:30 pm
Location: Duques Hall, Room 552 (2201 G Street, NW, Washington, DC 20052)
March 13, 2009
Title: Sequential Predictive Regressions and Optimal Portfolio Returns
Speaker:
Nicholas Polson,
Professor of Econometrics and Statistics, Graduate School of Business, University of Chicago
Abstract:
This paper analyzes sequential learning in the context of predictive regression models. To do this,
we develop new particle based methods for sequential learning about parameters, state variables,
hypotheses, and models. This sequential perspective allows us to quantify how investor's views
about predictibility and models varies over time, and naturally mimics the learning problem
encountered in practice. We consider learning about predictibility using dividend/payout data
and models that incorporate drifting coefficients and stochastic volatility. We analyze the
time-variation of parameter estimates and model probabilities, using both the traditional cash
dividends measure and a measure taking into account share repurchases and issuances. We also
analyze the economic benefits of using these models by considering optimal portfolio allocation
problems.
Date: Friday, March 13, 2009
Time: 11:00-12:00 pm
Location: Duques Hall, Room 553 (2201 G Street, NW, Washington, DC 20052)
April 3, 2009
Title: Inferring likelihoods and climate system characteristics from climate
models and multiple tracers
Speaker:
Murali Haran,
Department of Statistics, Penn State University
Abstract:
To understand the current state of the climate system and to predict
its future behavior, it is critical to have good estimates of key
climate system parameters. Since these climate parameters are very
difficult to measure directly, we have to infer their values based on
two sources of information --- spatial data on `tracers' that
indirectly provide information about these parameters, and output from
complex climate computer models run at several climate parameter
settings. These climate models are computationally expensive and can
take weeks or months to run at each setting. I will discuss an
inferential approach that uses Gaussian processes to emulate the
climate models, thereby establishing a connection between the climate
parameters and the multiple tracers. Using a spatial model, it is then
possible to carry out statistical inference for the climate
parameters, while accounting for various sources of variability and
dependence. I will describe how our methods propose to address a few
of the many challenges involved in this research including
computational obstacles posed by the size of the data and the need to
simultaneously model potentially non-linear relationships between
tracers while accounting for spatial dependence in the observations.
This is joint work with K.S.Bhat (Statistics, Penn State), and
R.Tonkonojenkov and K.Keller (Geosciences, Penn State)
Date: Friday, April 3, 2009
Time: 11:00-12:00 pm
Location: Monroe Hall, Room 113 (2115 G Street, NW, Washington, DC 20052)
April 10, 2009
Title:General Classes of Skewed Link Function for Binary Response Data
Speaker:
Dipak Dey,
Department of Statistics, University of Connecticut
Abstract:
(Joint with Xia Wang)
The choice of the links is one of most critical issues involved in modeling binary data
as substantial bias in the mean response estimates can be yielded if the link could be misspecified.
The objective of this study is to introduce a flexible skewed link function for modeling categorical
data. The commonly used complementary log-log (Cloglog) link is prone to link misspecification
because of its positive and fixed skewness. We propose a new link function based on the generalized
extreme value (GEV) distribution. The GEV link has a very wide range of skewness, which is purely
decided by its shape parameter. Using Bayesian methodology, we can automatically detect the skewness
in the data along with the model fitting by the GEV link. Various theoretical properties are examined
and explored in details. We compare the logit, the probit, the Cloglog and the GEV links under different
scenarios. The possibility of applying this link to the large p, small n cases is also discussed.
The deviance information criterion measure is used for guiding model selection when comparing different
links. The results are further extended to incorporate spatial structure. The methodologies are
exemplified through a bank transaction data and a species abundance data with spatial variation.
Date: Friday, April 10, 2009
Time: 11:00-12:00 pm
Location: Monroe Hall, Room 113 (2115 G Street, NW, Washington, DC 20052)
April 17, 2009
Title: Analysis of Cohort Studies with Multivariate, Partially Observed
Disease Classification Data
Speaker:
Nilanjan Chatterjee,
Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute,
National Institute of Health
Abstract:
Complex diseases, like cancer, can often be classified into subtypes using various pathological
and molecular traits of the disease. In this article, we develop methods for analysis of disease
incidence in cohort studies incorporating data on multiple disease traits using a two-stage
semi-parametric Cox proportional hazard regression model that allows one to examine the
heterogeneity in the effect of the covariates by the levels of the different disease traits.
For inference in the presence of missing disease traits, we propose a generalization of an
estimating-equation (EE) approach for handling missing cause of failure in competing-risk data.
We prove asymptotic unbiasedness of such an EE method under general missing-at-random (MAR)
assumption and propose a novel influence-function based sandwich variance estimator. The methods
are illustrated using simulation study and a real data application involving the Cancer Prevention
Study (CPS-II) nutrition cohort.
Date: Friday, April 17, 2009
Time: 11:00-12:00 pm
Location: Monroe Hall, Room 113 (2115 G Street, NW, Washington, DC 20052)
The series hosts a seminar about twice a month on current research
topics. The seminar often features an invited guest speaker and
occasionally local faculty members, students or others affiliated with
the department. The usual time of the seminar is 11:00am on Fridays.
Professors Hosam Mahmoud (hosam@gwu.edu) and
Jonathan Stroud (stroud@gwu.edu)
are the Seminar Series Coordinators.
|