Logo
Home
People Research Degree Programs Courses Seminar


 
 

Seminar Announcements for Spring 2009


January 16, 2009

Title: Robust Peters-Belson Type Estimators of Measures of Disparity and their Applications in Employment Discrimination Cases

Speaker: Hiro Hikawa, Department of Statistics, George Washington University

Abstract:

In discrimination cases concerning equal pay, the Peters-Belson (PB) regression method is used to estimate the pay disparities between minority and majority employees after accounting for major covariates (e.g., seniority, education). Unlike the standard approach, which uses a dummy variable to indicate protected group status, the PB method first fits a linear regression model for the majority group. The resulting regression equation is then used to predict the salary of each minority employee by using their individual covariates in the equation. The difference between the actual and the predicted salaries of each minority employee estimates the pay differential for that minority employee, which takes into account legitimate job-related factors. The average difference estimates a measure of pay disparity. In practice, however, a linear regression model may not be sufficient to capture the actual pay-setting practices of the employer. Therefore, we use a locally weighted regression model in the PB approach as a specific functional form of the relationship between pay and relevant covariates is no longer needed. The statistical properties of the new procedure are developed and compared to those of the standard methods. The method also extends to the case with a binary (1-0) response, e.g., hiring or promotion. Both simulation studies and re-analysis of actual data show that, in general, the locally weighted PB regression method reflects the true mean function more accurately than the linear model, especially when the true function is not a linear or logit (for a 1-0 response) model. Moreover, only a small loss of efficiency is incurred when the true relation follows a linear or logit model.

Date: Friday, January 16, 2008

Time: 11:00-12:00 noon

Location: Monroe Hall, Room 113 (2115 G Street, NW, Washington, DC 20052)


January 30, 2009

Title: Avoiding Lawsuits with a Bayesian Approach to Product Engineering

Speaker: Robert F. Bordley, General Motors

Abstract:

Mathematical programming --- which involves optimizing an objective function subject to various constraints --- has long recognized that the coefficients in both the constraints and the objective function are often uncertain. Standard expected utility analysis can easily resolve problems when the uncertainties only appear in the objective function. But utility approaches have, in the past, not been considered useful when uncertainties appear in the constraints and the objective function. Instead two alternative approaches are used to these `stochastic programming problems'. The first approach treats any violation of the constraints as rectifiable in the future at some cost. Given this assumption, the stochastic optimization problem can be formulated as an unconstrained multi-stage optimization problem which can be solved with expected utility theory (even though it is not common to do so). The second approach does not assume that violation of the constraints can be rectified at some cost. This approach is widely used in, for example, reliability-based design optimization where engineers must determine design specifications for physical structures (planes, vehicles, buildings, etc.) which, if they fail to withstand certain stresses, could lead to the loss of human life. This approach maximizes an objective function subject to an upper bound (generally one in a thousand) on the probability of the constraints being violated. Unfortunately it has been shown that this approach (called chance-constrained programming), is inconsistent with utility theory and can lead to a negative value of information.

Date: Friday, January 30, 2009

Time: 11:30-12:30pm

Location: Duques Hall, Room 453 (2201 G Street, NW, Washington, DC 20052)


February 13, 2009

Title: Random Partition Models Indexed with Covariates

Speaker: Peter Müller, Department of Biostatistics, MD Anderson Cancer Center, University of Texas

Abstract:

We propose a model for covariate-dependent clustering, i.e., we develop a probability model for random partitions that is indexed by covariates. The motivating application is inference for a clinical trial. As part of the desired inference we wish to define clusters of patients. Defining a prior probability model for cluster memberships should include a regression on patient baseline covariates. We build on product partition models (PPM). We define an extension of the PPM to include the desired regression. This is achieved by including in the cohesion function a new factor that increases the probability of experimental units with similar covariates to be included in the same cluster. We discuss an application to clinical trial design. The proposed model is used to implement borrowing of strength across nonexchangeable sub-populations.

Date: Friday, February 13, 2009

Time: 3:30-4:30 pm

Location: Duques Hall, Room 552 (2201 G Street, NW, Washington, DC 20052)


February 20, 2009

Title: Record Linkage Modeling

Speaker: Michael Larsen, Iowa State University, Department of Statistics and Center for Survey Statistics & Methodology

Abstract:

Record linkage, or exact file matching, consists of bringing together records in two or more files on the same population. Files are linked for the purposes of creating a larger database, enabling analyses that would otherwise not be possible, and counting the population. When unique, error-free identification codes are not available on both files, then record linkage can be accomplished through probabilistic methods. The U.S. Census Bureau uses record linkage in population undercount estimation. The National Center for Health Statistics uses record linkage to match surveys to the National Death Index (NDI) for studies of mortality and morbidity. This talk discusses advances in record linkage theory related to these efforts. The models allow estimation of error rates and decision making about match/nonmatch status of pairs of records. Methods of record linkage that enforce one-to-one matching between individuals have been implemented. Generally the files being linked at census and NCHS have been unduplicated, so that one-to-one matching is required. Bayesian methods that allow variability across blocks and incorporate one-to-one matching into statistical models have been studied. Advances have been made in analysis of files created through record linkage, including some accounting of potential matching errors. The work on record linkage has direct relevance for methods of preserving confidentiality in publicly released databases.

Date: Friday, February 20, 2009

Time: 2:15-3:15 pm

Location: Department of Statistics (2140 Pennsylvania Ave, NW, Washington, DC 20052)


February 27, 2009

Title: High Dimensional Statistics in Genomics: Some New Problems and Solutions

Speaker: Hongzhe Li, Department of Biostatistics and Epidemiology, University of Pennsylvania

Abstract:

Large-scale systematic genomic datasets have been generated to inform our biological understanding of both the normal workings of organisms in biology and disrupted processes which cause human disease. The integrative analysis of these datasets, which has become an increasingly important part of genomics and systems biology research, poses many interesting statistical problems, largely driven by the complex inter-relationships between high-dimensional genomic measurements. In this talk, I will present three problems in genomics research that require the development of new statistical methods: (1) identification of active transcription factors in microarray time-course experiments; (2) identification of subnetworks that are associated with some clinical outcomes; and (3) identification of the genetic variants that explain higher-order gene expression modules. I will present several regularized estimation methods to address these questions and demonstrate their applications using real data examples. I will also discuss some theoretical properties of these procedures.

Date: Friday, February 27, 2009

Time: 11:00-12:00 pm

Location: Monroe Hall, Room 113 (2115 G Street, NW, Washington, DC 20052)


March 6, 2009

Title: Combinatorial Patterns for Probabilistically Constrained Optimization Problems

Speaker: Miguel Lejeune, Department of Decision Sciences, George Washington University

Abstract:

We propose a new framework for the solution of probabilistically constrained optimization problems by extending some recent developments in combinatorial pattern theory. The method involves the binarization of the probability distribution and the generation of a consistent partially defined Boolean function (pdBf) representing the combination (F,p) of the binarized probability distribution F and the enforced probability level p. We represent the pdBf representing (F,p) as a disjunctive normal form taking the form of a collection of combinatorial patterns. We propose a new integer programming-based method for the derivation of combinatorial patterns and present several methods allowing for the construction of a disjunctive normal form that defines necessary and sufficient conditions for the probabilistic constraint to hold. The obtained disjunctive forms are then used to generate deterministic reformulations of the original stochastic problem. The method is implemented for the solution of a numerical problem. Extensions to the present study are discussed.

Date: Friday, March 6, 2009

Time: 3:30-4:30 pm

Location: Duques Hall, Room 552 (2201 G Street, NW, Washington, DC 20052)


March 13, 2009

Title: Sequential Predictive Regressions and Optimal Portfolio Returns

Speaker: Nicholas Polson, Professor of Econometrics and Statistics, Graduate School of Business, University of Chicago

Abstract:

This paper analyzes sequential learning in the context of predictive regression models. To do this, we develop new particle based methods for sequential learning about parameters, state variables, hypotheses, and models. This sequential perspective allows us to quantify how investor's views about predictibility and models varies over time, and naturally mimics the learning problem encountered in practice. We consider learning about predictibility using dividend/payout data and models that incorporate drifting coefficients and stochastic volatility. We analyze the time-variation of parameter estimates and model probabilities, using both the traditional cash dividends measure and a measure taking into account share repurchases and issuances. We also analyze the economic benefits of using these models by considering optimal portfolio allocation problems.

Date: Friday, March 13, 2009

Time: 11:00-12:00 pm

Location: Duques Hall, Room 553 (2201 G Street, NW, Washington, DC 20052)


April 3, 2009

Title: Inferring likelihoods and climate system characteristics from climate models and multiple tracers

Speaker: Murali Haran, Department of Statistics, Penn State University

Abstract: To understand the current state of the climate system and to predict its future behavior, it is critical to have good estimates of key climate system parameters. Since these climate parameters are very difficult to measure directly, we have to infer their values based on two sources of information --- spatial data on `tracers' that indirectly provide information about these parameters, and output from complex climate computer models run at several climate parameter settings. These climate models are computationally expensive and can take weeks or months to run at each setting. I will discuss an inferential approach that uses Gaussian processes to emulate the climate models, thereby establishing a connection between the climate parameters and the multiple tracers. Using a spatial model, it is then possible to carry out statistical inference for the climate parameters, while accounting for various sources of variability and dependence. I will describe how our methods propose to address a few of the many challenges involved in this research including computational obstacles posed by the size of the data and the need to simultaneously model potentially non-linear relationships between tracers while accounting for spatial dependence in the observations.
This is joint work with K.S.Bhat (Statistics, Penn State), and R.Tonkonojenkov and K.Keller (Geosciences, Penn State)

Date: Friday, April 3, 2009

Time: 11:00-12:00 pm

Location: Monroe Hall, Room 113 (2115 G Street, NW, Washington, DC 20052)


April 10, 2009

Title:General Classes of Skewed Link Function for Binary Response Data

Speaker: Dipak Dey, Department of Statistics, University of Connecticut

Abstract: (Joint with Xia Wang) The choice of the links is one of most critical issues involved in modeling binary data as substantial bias in the mean response estimates can be yielded if the link could be misspecified. The objective of this study is to introduce a flexible skewed link function for modeling categorical data. The commonly used complementary log-log (Cloglog) link is prone to link misspecification because of its positive and fixed skewness. We propose a new link function based on the generalized extreme value (GEV) distribution. The GEV link has a very wide range of skewness, which is purely decided by its shape parameter. Using Bayesian methodology, we can automatically detect the skewness in the data along with the model fitting by the GEV link. Various theoretical properties are examined and explored in details. We compare the logit, the probit, the Cloglog and the GEV links under different scenarios. The possibility of applying this link to the large p, small n cases is also discussed. The deviance information criterion measure is used for guiding model selection when comparing different links. The results are further extended to incorporate spatial structure. The methodologies are exemplified through a bank transaction data and a species abundance data with spatial variation.

Date: Friday, April 10, 2009

Time: 11:00-12:00 pm

Location: Monroe Hall, Room 113 (2115 G Street, NW, Washington, DC 20052)


April 17, 2009

Title: Analysis of Cohort Studies with Multivariate, Partially Observed Disease Classification Data

Speaker: Nilanjan Chatterjee, Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institute of Health

Abstract: Complex diseases, like cancer, can often be classified into subtypes using various pathological and molecular traits of the disease. In this article, we develop methods for analysis of disease incidence in cohort studies incorporating data on multiple disease traits using a two-stage semi-parametric Cox proportional hazard regression model that allows one to examine the heterogeneity in the effect of the covariates by the levels of the different disease traits. For inference in the presence of missing disease traits, we propose a generalization of an estimating-equation (EE) approach for handling missing cause of failure in competing-risk data. We prove asymptotic unbiasedness of such an EE method under general missing-at-random (MAR) assumption and propose a novel influence-function based sandwich variance estimator. The methods are illustrated using simulation study and a real data application involving the Cancer Prevention Study (CPS-II) nutrition cohort.

Date: Friday, April 17, 2009

Time: 11:00-12:00 pm

Location: Monroe Hall, Room 113 (2115 G Street, NW, Washington, DC 20052)


The series hosts a seminar about twice a month on current research topics. The seminar often features an invited guest speaker and occasionally local faculty members, students or others affiliated with the department. The usual time of the seminar is 11:00am on Fridays.

Professors Hosam Mahmoud (hosam@gwu.edu) and Jonathan Stroud (stroud@gwu.edu) are the Seminar Series Coordinators.

 
 
 
   
Home Site Map >