# Joint UBC/SFU Graduate Student Seminar in Statistics

## Topic

Joint UBC/SFU Graduate Student Seminar in Statistics

## Details

**Joslin Goh**

Title: Prediction and Calibration Using Outputs From Multiple Simulators

Abstract: Deterministic simulators are widely used to describe physical processes in lieu of physical observations. In some cases, more than one computer simulator can be used to explore the physical system. Through the combination of field observations and simulated outputs, predictive models are developed for the real physical system. The resulting model can be used to perform sensitivity analysis for the system, solve inverse problems and make predictions. The proposed approach is Bayesian and will be illustrated through applications in predictive science at the Centre for Radiative Shock Hydrodynamics at the University of Michigan.

**Seong-Hwan Jun**

Title: Entangled Monte Carlo

Abstract: We propose a novel method for scalable parallelization of SMC algorithms, Entangled Monte Carlo simulation (EMC). EMC avoids the transmission of particles between nodes, and instead reconstructs them from the particle genealogy. In particular, we show that we can reduce the communication to the particle weights for each machine while efficiently maintaining implicit global coherence of the parallel simulation. We explain methods to efficiently maintain a genealogy of particles from which any particle can be reconstructed. We demonstrate using examples from Bayesian phylogenetic that the computational gain from parallelization using EMC significantly outweighs the cost of particle reconstruction. The timing experiments show that reconstruction of particles is indeed much more efficient as compared to transmission of particles.

**Zheng Sun**

Title: EDF Tests for Ordered Categorical Data

Abstract: In this talk, we consider a general class of EDF (Empirical Distribution Function) tests for ordered categorical data (ordered contingency tables), that is when the cells have a natural ordering, for example, letter grades on exams. Asymptotic distributions are found under the null hypothesis

H_0: each row follows the same distribution.

Asymptotic distributions under some contiguous alternatives are also found and asymptotic power of these tests can be calculated. A theorem is proved connecting the cases when parameters are known with those when parameters must be estimated.

Components of these test statistics are examined and the first 4 components can be interpreted as tests that are aimed at specific alternatives: location, scale, skewness and kurtosis.

We compare powers of the EDF tests with many competing tests including tests derived from the Neyman Pearson Lemma. EDF tests compare favourably.

A example data set is analyzed.

**Dr. Ruben Zamar**

Title: Robustness and Other Things

Abstract: Data quality is typically affected by the presence of outliers and other forms of data contamination. It may also be affected by missing data, data duplication, etc. From a broad perspective I am interested in the study of the detrimental effect of poor data quality on statistical inference, and in developing appropriate alternative methods to address these problems. The purpose of this talk is to give students a broad picture of my research interests and some current research projects. "Other things" in the title refers to other related topics I am interested in, such as cluster analysis, model selection, bootstrap and data mining.

**Dr. Joan Hu**

Title: Statistical Analysis for Forest Fire Control

Abstract: This talk discusses statistical issues arising from forest fire control. We start with brief background information to motivate the statistical problems. Models and inference procedures are then proposed. A set of Canadian forest fire data is used throughout the talk for illustration.

This is an on-going project jointly with W. John Braun.

**Jabed Hossain Tomal**

Title: Ensembling Descriptor Sets using Phalanxes of Variables to Rank Activity of Compounds in QSAR Studies

Abstract: In QSAR studies, molecular descriptors are used to model biological activity of compounds. The statistical model aims to rank rare actives early in a list of compounds. The classifier “random forest” has been found highly accurate in QSAR studies. To enhance its performance in terms of predictive ranking, we propose an ensemble method by grouping variables together. The variables in a group (we call phalanx) are good to put together, whereas the variables in different groups (phalanxes) are good to ensemble. Finally, our method aggregates the phalanxes. There exist several molecular descriptor sets in QSAR studies, and a particular set might do well in ranking activity of compounds for some assays, and fail to do well for other assays. We have considered four assays and five descriptor sets for each. We apply the ensemble of phalanxes to each descriptor set and further ensemble across the five descriptor sets we generated. The performance of our ensemble is compared with random forest. Specifically, random forest was applied to each of the five descriptor sets and to the pool of descriptor sets. We found our method superior to any of the random forests using two rigorous evaluation procedures.

**Shirin Golchi**

Title: Monotone Interpolation: Sampling from a Constrained Gaussian Posterior

Abstract: Gaussian process (GP) models are popular tools for non-parametric modelling and function estimation. They are commonly used in the area of computer experiments where a finite number of function evaluations are available from a simulator and the underlying functin is to be estimated using a statistical model while interpolating the given points. However, in the case that extra information such as monotonicity of the underlying function is available, it is not straight- forward to incorporate the constraints in a GP model. I will talk about the constrained posterior distribution together with a recipe to sample from it.

**Vincenzo Coia**

Title: A New Sieve Model for Extreme Values

Abstract: Although rare, extreme events leave a lasting impact on our lives and the world in general. It is therefore important to determine the potential magnitude and frequency of such events, especially when these extremes are dangerous. We focus on the case when these extreme values are heavy tailed. Extreme Value Theory provides a theoretical

basis for extrapolating and making inference into these heavy tails; however, there is room for improvement in the extrapolation methods. One modification to the heavy tail is to add an upper truncation; we propose a modification which "progressively truncates" the tail with permeable filters like a sieve. The techniques are then applied to the largest Atlantic hurricanes and the largest black sea bass in Buzzard's Bay. We find that, in most cases, the sieve model provides the best fit, followed by the truncated model.

The UBC Statistics Department and the SFU Statistics and Actuarial Science Department jointly host one seminar per term at a central location in Vancouver. These seminars are intended to be informal, and at a level accessible to graduate students. The goal is to create a cohesive community of statisticians in the GVRD, and, in particular, to increase the interaction among faculty and students at UBC and SFU.

## Additional Information

Saturday, September 29th

- 9:00-9:30 Coffee and pastries at Blenz (508 West Hastings Street) across the street from the seminar location
- 9:30-9:45 Head to room 7000 (Earl & Jennie Lohn Policy Room) at SFU Harbour Centre (555 West Hastings Street)
- 9:45-10:10 Joslin Goh, SFU, Title: Prediction and Calibration Using Outputs From Multiple Simulators
- 10:10-10:35 Seong-Hwan Jun, UBC, Title: Entangled Monte Carlo
- 10:35-11:00 Zheng Sun, SFU, Title: Upcoming
- 11:00-12:00 Dr. Ruben Zamar, UBC, Title: Upcoming
- 12:00-14:00 Lunch at Rogue Kitchen and Wetbar (601 West Cordova Street)
- 14:00-15:00 Dr. Joan Hu, SFU, Title: Statistical Analysis for Forest Fire Control
- 15:00-15:25 Jabed Hossain Tomal, UBC, Title: Ensembling Descriptor Sets using Phalanxes of Variables to Rank Activity of Compounds in QSAR Studies
- 15:25-15:50 Shirin Golchi, SFU, Title: Upcoming
- 15:50-16:15 Vincenzo Coia, UBC, Title: A New Sieve Model for Extreme Values

Fall2012_SFU_UBC_Joint_Graduate_Students_Seminar_in_Statistics_Report.pdf

Workshop.Webarchive.zip

**Scientific, Seminar**

**September 29, 2012**

**-**