Modern statistics problems, from areas such as evolutionary biology, medical imaging, and shape analysis, increasingly deal with data sampled from spaces that are singular but naturally stratified; that is, the spaces behave nicely at most points, but at certain points the smooth structure becomes degenerate, such as when the space is composed of two or more intersecting smooth pieces. Key examples of stratified spaces are shape spaces (representing equivalence classes of point configurations under operations such as rotation, translation, scaling, projective transformations, or other non-linear transformations) and tree spaces (representing metric phylogenetic trees on fixed sets of taxa). Generalizing these two examples leads to algebraic varieties and polyhedral complexes, respectively. Applications require knowledge of the asymptotics of distributions on such spaces.
Developments in this "stratified statistics" take their cue from more classical geometric statistics, where data points are sampled from smooth manifolds, or from neighborhoods of embedded manifolds. Now, however, interesting algebraic geometry and combinatorics join the mix as methods for controlling behavior near strata of lower dimension, where the sample space can be singular nearby. Asymptotics on such spaces are governed not only by their local structure, but also by global topology (of the space and the data). Thus there has been increasing interest in the recently emerged method of statistical persistent homology.
First results from the systematic study of nonparametric statistics on data sampled from stratified spaces include central limit theorems (CLTs) that illustrate nonclassical behavior, particularly when the mean lies on a lower stratum. The related asymptotics in this surprisingly common circumstance can depend in a crucial way on global geometry. Other first results include concrete combinatorial constructions of sample spaces. Interpretations for these results, particularly nonclassical CLTs, are immediately useful in specific applied problems from phylogenetics, brain imaging, and human binocular vision, but they raise fundamental pure mathematical questions relating curvature to asymptotics of probability distributions in non-smooth settings. Many of these investigations were initiated by a Working Group at the Statistical and Applied Mathematical Sciences Institute (SAMSI) program on analysis of object data.
This MBI workshop aims to stimulate progress and cross-fertilization in the rapidly moving areas of theoretical and applied stratified statistics by gathering a mix of researchers with interests in biology, geometry, combinatorics, topology, probability, and statistics. The hope is to develop stratified methods to solve problems arising from investigations on existing biological and medical data sets, particularly those involving trees and more general shapes.