Title: Polya Process and Random Sprouts
Speaker: Professor Hosam Mahmoud
Department of Statistics, George Washington University
Date: 01/28/2005
Location: Monroe Hall (2115 G Street NW), Room B02
Time: 11:00-12:00 noon.
Abstract : We investigate the Polya process, which underlies an urn of white and blue balls growing in real time. A partial differential equation governs the evolution of the process. For urns with (forward or backward) diagonal ball addition matrix the partial differential equation is amenable to asymptotic solution. In the case of forward diagonal we find a solution via the method of characteristics; the numbers of white and blue balls, when scaled appropriately, converge in distribution to independent Gamma random variables. The method of characteristics becomes a bit too involved for the backward diagonal process, except in degenerate cases, where we have Poisson behavior. In nondegenerate cases, constant limits are found via the method of moments, and matrix formulation involving a Leonard pair.
Moreover, an average-case analysis for all tenable Polya processes, without any constraints, is given. We present applications of the Polya process to the sprout, a random tree growing in real time. The sprout is proposed as a model for the growth of the Internet. The tree size is analyzed via a special associated two-color Polya process. In addition to the usage of this average-case analysis in evaluating sprouts, we also give a heuristic interpretation of the result for Polya urns, which might be a first step toward understanding several nonclassic urn models, as those with nonconstant row sum and those with multiple eigenvalues.
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Title: Estimating the Cumulative Risk of a False-Positive Test in a Repeated Screening Program
Speaker: Jian-Lun Xu, Ph.D
Biometry Research Group, National Cancer Institute.
Date: 02/11/2005
Location: Monroe Hall (2115 G Street NW), Room B02
Time: 11:00-12:00 noon.
Abstract : The goal of screening tests for chronic disease such as cancer is early detection and treatment with a consequent reduction in mortality from the disease. Screening tests, however, might produce false-positive and false-negative diagnoses. With an increasing number of screening tests, it is clear that the risk of a false-positive screen, a finding with potentially significant emotional, financial and health costs, also increases. Elmore et al. (1998), Christiansen et al. (2000) and Gelfand and Wang (2000) investigated this problem under the somewhat unrealistic assumption that the choice of making the decision to drop out at the k-th screen does not depend upon the results of earlier (k-1) screens. In this paper we obtain sufficient and necessary conditions for their assumption to hold and use one of them to provide a method for testing the validity of the assumption. A new model which does not depend on their assumption is introduced. The maximum likelihood estimator of the cumulative risk of receiving a false-positive screen under the new model is derived and its asymptotic normality is proved. We apply our testing method and the new model to data from the breast cancer screening trial of the Health Insurance Plan of Greater New York.
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Title: Database Integration in Biological Studies
Speaker: Professor Fengzhu Sun
Molecular and Computational Biology Program, Department of Biological Sciences
University of Southern California Department of Statistics,Los Angeles, CA 90089-1183
Date: 02/25/2005
Location:Monroe Hall (2115 G Street NW), Room 206
Time: 4:00-5:00 pm.
Abstract : The work in our laboratory involves integration of different databases to solve biological problems of interest. Our philosophy is that different databases will give us some but not all the information about the biological problems. By combining different problems intelligently, we are able to obtain a more complete picture of the problems of interest. We will use the following two examples to show our points: a) Protein function prediction combining different data sources, and b) Understanding lethality.
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Title: Combining Information from Independent Sources through Confidence Distributions
Speaker: Professor Minge Xie
Department of Statistics, Rutgers University
Date: 03/11/2005
Location:Monroe Hall (2115 G Street NW), Room 206
Time: 4:00-5:00 pm.
Abstract : This paper develops new methodology, together with related theories, for combining information from independent studies through confidence distributions. A formal definition of a confidence distribution and its asymptotic counterpart (i.e., asymptotic confidence distribution) are given and illustrated in the context of combining information. Two general combination methods are developed: the first along the lines of combining p-values; the second by multiplying and normalizing confidence densities. The paper also develops adaptive combining methods which should be of practical interest. The key point of the adaptive development is that the methods attempt to combine only the correct information, downweighting or excluding studies containing little or wrong information about the true parameter of interest. The combination methodologies are illustrated through several examples in a variety of applications. One of the examples re-analyzes a data set studied by Efron (1993) and it shows that the adaptive CD combination method provides a quite accessible and simple frequentist alternative to the empirical Bayes methods proposed in Efron (1993, 1996). (This is a joint work with Kesar Singh and Bill Strawderman.)
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Titles: 1) Estimating Treatment Effects in an Unbalanced Two Period Cross-over Trial with Poisson Outcomes
2) Microbial Diversity in Local Communities
Speakers: Professors Daniel Lunn, Department of Statistics, Worcester College, Oxford
and Mary Lunn, Department of Statistics, St Hugh's, College, Oxford
Date: THURSDAY, March 24th, 2005
Location: Hal of Governmentl (710 21st Street, NW), Room 310
Time: 11:00-12:00 noon.
Abstracts :
1) In testing a new drug for epilepsy, a group of doctors unfortunately chose to do a cross-over trial with Poisson outcomes. There are obvious problems with analysing the data, which were compounded by the fact that the data set is small (because the trial was expensive), two different types of epileptic seizure were being studied with cases who suffered from one or both types, and there are some missing values. It was hardly surprising that attempts to use generalised estimating equations produced unsatisfactory models, as did mixed effects modelling. However, fitting a Bayesian random effects model not only proved to be successful, but also provided a mechanism, unavailable to any other approach, for handling the two types of seizure by means of suitably chosen prior distributions.
2) An understanding of species distribution and abundance in microbial communities is vital in many areas of engineering and biological sciences. There are enormous problems of scale, both in population size and in diversity. We consider two aspects of the problem. First we consider what if anything can be said of species diversity in those studies where a small observed sample of the community consists entirely of singletons, that is to say no species is repeated, and what this implies in general since there are many clone libraries which contain very few repetitions. We are able to draw some conclusions about the minimum levels of species diversity. Second we use a neutral community model and its implementation by stochastic differential equations to develop steady state predictions of diversity for a local microbial community . Examples are given, fitting the model to data.
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Title: Maximum Likelihood Estimation of Haplotype Effects and
Haplotype-Environment Interactions in Genetic Association Studies
Speaker: Danyu Lin, Ph.D., Dennis Gillings Distinguished Professor
Department of BioStatistics,University of North Carolina
Date: April 1, 2005
Location:Monroe Hall (2115 G Street NW), Room B02
Time: 11:00-12:00 noon.
Abstract : A haplotype is a specific sequence of nucleotides on a single chromosome.
The population associations between haplotypes and disease phenotypes
provide critical information about the genetic basis of complex human
diseases. Standard genotyping techniques cannot distinguish the two
homologous chromosomes of an individual so that only the unphased genotype
(i.e., the combination of the two homologous haplotypes) is directly
observable. Statistical inference about haplotype-phenotype associations
based on unphased genotype data presents a very interesting and difficult
missing-data problem, especially when the sampling depends on the disease
status. We provide a comprehensive and rigorous treatment of this problem.
All commonly used study designs, including cross-sectional, case-control
and cohort studies, are considered. The phenotype can be a disease
indicator, a quantitative trait or a potentially censored time to disease
variable. The effects of haplotypes on the phenotype are formulated
through flexible regression models, which can accommodate a variety of
genetic mechanisms and gene-environment interactions. We construct
appropriate likelihoods, which usually involve high-dimensional nuisance
parameters. The identifiability of the parameters, and the consistency,
asymptotic normality and efficiency of the maximum likelihood estimators
are established. Efficient and reliable numerical algorithms are
developed. Simulation studies show that the likelihood-based procedures
perform well in practical settings. An application to the Finland-United
States Investigation of NIDDM Genetics Study is provided. Areas in need of
further development are discussed.
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Title: MtDNA Reference Database for the Domestic Dog
Speaker: Dr. Marc Allard
Louis Weintraub Associate Professor, Biological Sciences, The George Washington University,
Date: April 15, 2005
Location:Monroe Hall (2115 G Street NW), Room B02
Time: 11:00-12:00 noon.
Abstract :
Dog hair and thus dog mitochondrial DNA (mtDNA) is an additional source of mtDNA evidence that is present at many crime scenes. Taking advantage of this evidence is relatively straightforward as human hair has commonly been used in many investigations. The ability to identify canines from biological samples found at crime scenes could prove invaluable in terms of convicting or eliminating potential suspects. While canine mtDNA from dog hair has been used successfully in some criminal investigations, comparing the hair found at the crime scene to
potential suspects in the case is the extent of its current capabilities. We are in the process of creating a reference database that is widely available to the forensic community.
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Title: Modeling and Spatial Prediction of Pre-Settlement Patterns of Forest Distribution using Witness Tree Data
Speaker: Stephen L. Rathbun
Department of Statistics,Penn State University
Date: April 22, 2005
Location:Monroe Hall (2115 G Street NW), Room 206
Time: 4:00-5:00 pm.
Abstract : Prior to European settlement, land surveys were conducted throughout the United States. These surveys include records of witness trees at grid intersections, providing quantitative information on pre-settlement forest composition and species-site relationships. Such information can provide insight into environmental factors influencing the distributions of each tree species, free from European influences. Assuming that the locations trees of each species are realized from independent inhomogeneous Poisson processes whose respective log intensities are linear functions of environmental covariates (i.e., elevation, land form, and province), the species observed at the survey-grid intersections are independently sampled from generalized logistic regression model. A model for all 68 species found in the survey would be highly over-parameterized, so only the distribution of the most common species, longleaf pine, will be considered at this time. To assess the impact of environmental factors not included in the model, a hidden Gaussian Markov random field shall be added as a random effect. A Markov Chain Monte Carlo algorithm is developed for Bayesian inference on model parameters, and Bayes posterior prediction of the distribution of longleaf pine in southeastern Alabama.
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Title: Probability of Negation of Cruise Missiles
Speaker: Dr. Nigel Siva
SPARTA, Inc. siva@sparta.com,
Date: April 29, 2005
Location:Monroe Hall (2115 G Street NW), Room B02
Time: 11:00-12:00 noon.
Abstract : Probability of Negation (PN) of an enemy missile depends upon its path from its launch point to its intended target. Since Ballistic Missile (BM) trajectories can be predicted uniquely, once the BM’s trajectory is known, then its PN can be calculated in terms of the probabilities of success in the three major functions: Sensor, Battle Management (BM/C4I) and Weapon (interceptors). In contrast, the Cruise Missile (CM) route between its launch point and its intended target is preplanned by the enemy, based upon his perception of the defense’s performance. Here a method is presented to predict the enemy’s most probable CM–route to attack the defense.
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Title: Runs in Bernoulli trials: how much success should we expect?
Speaker: Professor Robert Smythe
Department of Statistics, Oregon State University
Date: 05/05/2005
Location:Monroe Hall (2115 G Street NW), Room 105
Time: 4:00-5:00 pm.
Abstract : Consider a (long) sequence of independent Bernoulli trials. We consider first the length of the longest success run. The modern history of this problem goes back to the celebrated Erdos-Renyi Theorem in 1970; we will review some applications of this result, and some generalizations, to DNA sequences. We look also at a related problem which seems to have attracted much less attention: the length of the longest run of successes OR failures in a sequence of Bernoulli trials, or more generally, the longest run of any type in a sequence of multinomial trials.
The series hosts a seminar about twice a month on current
research topics. The seminar often features an invited
guest speaker and occasionally local faculty members,
students or others affiliated with the department. The
usual time of the seminar is 11:00 a.m. on Fridays. Professor Kaushik Ghosh (E-mail : ghosh@gwu.edu)
is the Seminar Series Coordinator.
--------------------------------------------------------------------------------
The contact person is Dr. Kaushik Ghosh at ghosh@gwu.edu or 202-994-6889.
|