REUF on the Mathematics of Data 2016
July 18-22, 2016 at the Information Initiative at Duke, Durham, North Carolina
This workshop, sponsored by the NSF and the Departments of Mathematics and Statistical Science at Duke, will focus on the recently developing fields that lie at the interface of pure mathematics and big data analytics; for example, manifold learning, topological data analysis, and wavelet theory. No previous experience with big data analytics is necessary, and technical assistance with scientific programming will be readily available throughout the week. In addition, 60 undergraduates will be concurrently working on similar problems in iiD’s successful Data+ program, and the REUF faculty participants will have ample opportunity to interact with these students and imagine possibilities for future undergraduate projects.
During the workshop, faculty will work in one of three research groups:
- Geometric Insights in Machine Learning and Statistics: The idea of using geometry in statistical inference and data analysis has a long and storied history. There has been a recent resurgence in using ideas from geometry in data analysis. The motivation has been that while data are often high-dimensional, the underlying signal or structure is low dimensional and approaches borrowing ideas form geometry can extract this signal. In machine learning this perspective has been termed “Manifold Learning.” The utility of geometry in dimension reduction, improving optimization algorithms, as well modeling complex data will be developed with real applications and theoretical analysis.
- Information Theory, Combinatorics, and Abstract Algebra: Combinatorics is the art of counting, and we count events in probability spaces when we measure information and make inferences. When we transmit information, we separate messages to avoid errors, and the art of communication leads to geometry, and to packing problems in particular.
- Topological Data Analysis: The goal of topological data analysis (TDA) is to describe the shape of data (often rendered as a high-dimensional point cloud, or as a collection of functions) in a multi-scale, coordinate-free manner. This description often comes in the forms of extracted features which, when combined with statistical and/or machine-learning methods, can provide novel insight in a wide variety of application areas. The basic tools are borrowed from algebraic topology and then adapted to become more robust and faster.
Preference will be given to faculty who teach and advise substantial numbers of underrepresented minority students, students with disabilities, and first-generation college students. Participants will receive a travel allowance, lodging accommodation, and per diem.