All graduate Fellows in the program will be involved with one of three major NSF projects; each of these projects are at the forefront of scientific promise as well as technical challenge for data analytics: the Legacy Survey of Space and Time at the Vera C. Rubin Observatory (LSST), the Advanced Laser Interferometer Gravitational-wave Observatory (LIGO), and Earthscope.
Earthscope is a vast multi-institutional research project for deep geoscientific exploration of the whole North American continent, as well as our entire Earth, to better understand the materials it is made of, how it was assembled, and how it works — including its recurring earthquakes and active volcanoes. Earthscope scientists use state-of-the-art instruments and methods to collect data from seismic waves, crustal movements, Earth’s magnetic field, rock and soil samples, and images obtained from aircraft and satellites. Earthscope researchers at Northwestern analyze and help acquire Earthscope data at continental and regional scales. A primary research goal is to image the three-dimensional structure of the Earth’s crust and mantle beneath North America, many hundreds of kilometers deep, and to infer its bearing on the processes and dynamics that affect continents currently, in the past, and in the future. Imaging the Earth’s interior from full waveforms of hundreds of crustal profiles, thousands of seismic wave trains, hundreds of thousands of surface wave dispersion curves, and millions of body-wave delay times, is a challenging task because of unknown noise, unmodeled physical phenomena, and under-“illumination” by the seismic waves that are preferentially generated at tectonic plate boundaries and recorded on continents, all near and on the Earth’s surface.
LIGO, or the Advanced Laser Interferometer Gravitational-Wave Observatory project, is a set of two instruments designed to detect gravitational waves (GW) passing through the Earth (there is also a similar GW detector in Pisa, Italy, called Advanced Virgo). GWs are produced by a variety of high-energy astrophysical events, such as merging black holes; they will provide us with a new window into the Universe. With its sensitivity improved by a factor of 10 over the initial LIGO detectors, the number of candidate astrophysical sources that can be seen by LIGO will increase by a factor of 1000. Even with these improvements, however, it will still be a challenge to extract the GW signatures from the very noisy data. To separate the signal from the noise, data from multiple detectors around the world with uncorrelated noise are analyzed simultaneously. The data from each detector is checked for consistency against theoretical models for the expected signal. The detections from each instrument are then checked for consistency in time; an actual gravitational wave, as opposed to most noise, will cause coincident detections across the network with time delays corresponding to the time it takes light to travel between detectors. Even after all of these checks, occasional noise fluctuations can still mimic gravitational wave signals, and could possibly be mistaken for an astrophysical signal.
LSST, or the Legacy Survey of Space and Time at the The Vera C. Rubin Observatory, is an 8.4-meter optical telescope currently under construction in Chile. LSST is designed to conduct a ten-year survey of the dynamic universe, mapping the entire visible sky every few nights. The design of LSST is driven by several science themes, ranging from studying dark matter and dark energy to exploring transient objects in the Universe. With each part of the visible sky imaged every few nights over a ten-year period, LSST will provide a new way to gather information about the optical universe, enabling the study of even very distant objects, changing in both space and time. With 15 TB of raw data collected over a single night of operation, the LSST collaboration faces numerous data mining challenges that will need to be addressed collaboratively by a community of scientists, including astronomers, statisticians, and computer scientists. For LSST, rapid and accurate object classification will be critical as the starting point for analysis, but identifying the key observed features on which to base classification is a challenge in itself. It will be necessary to explore and assess a wide range of classification algorithms available to find the optimal solutions. Over the lifetime of the project, it is anticipated that LSST will produce 500 PB of cumulative processed data, which will also introduce new challenges for data storage and management.