Big Data Research

Nov 29 2014 | By Jesse Adams

Columbia’s Institute for Data Sciences and Engineering is launching three groundbreaking research clusters at the interface of data science and the natural sciences. Funded in part by the Gordon and Betty Moore Foundation and the Alfred P. Sloan Foundation, the intensive collaborations chart new frontiers of biological oceanography, high-dimensional data analysis, and exploring the origins and destiny of our world.

Established in 2012, the Institute, a leading hub of cutting-edge interdisciplinary research, brings together over 150 affiliated faculty from nine University schools. The new research clusters expand the Institute’s range of expertise, driving new advances and enhancing opportunities for doctoral students across disciplines to include a broader spectrum of questions and insights in their work.

“This support from Moore-Sloan is helping us achieve our goal for greater diverse interdisciplinary collaborations with the natural sciences at Columbia,” says Institute Director Kathleen McKeown, the Henry and Gertrude Rothschild Professor of Computer Science.

One research cluster, “High Dimensional Data Analysis of Microscopy Images,” is being led by Abhay Narayan, assistant professor of physics, and John Wright, assistant professor of electrical engineering. They will use modern techniques of data and image analysis to search and identify basic patterns making up a complex image of a material at the nanoscale. Identifying a material’s basic building blocks will enable them to relate a material’s structure to its functional properties.

“We are grateful to Moore-Sloan for supporting data-driven science,” Wright says, “and for this program, already inspiring us to think about new computational models and tools, which should be useful both in microscopy and in other domains of science and engineering.”

“The Birth of the Universe and the Fate of the Earth: One Trillion UV Photons Meet Stan,” led by David Schiminovich, associate professor of astronomy, and Andrew Gelman, professor of statistics and political science, will examine the vast data set comprised of every photon observed and recorded by the Galaxy Evolution Explorer (GALEX) ultraviolet space telescope over 10 years. It uses a novel probabilistic modeling tool called Stan to unlock what the GALEX findings can reveal about dying white dwarf stars. “This project,” Schiminovich says, “has the potential to transform the process by which astronomers confront models with their data.”

The third project, “Mining an Ocean of Data: Application of Modern Statistical Methods for Addressing Biological Oceanography Questions,” directed by Joaquim Goes, research professor at Lamont-Doherty Earth Observatory, and Rahul Mazumder, assistant professor of statistics, will analyze data from satellites and autonomous floats to deepen understanding of oceanic properties and identify phytoplankton from space.

“The oceans are vast, and extracting meaningful information for climate research continues to represent a huge challenge,” Goes says. “This collaboration with statistical scientists essentially allows us to sail in uncharted waters.”

Also this fall, the Institute celebrated another milestone. In addition to research proceeding on numerous fronts, it has received official approval for its new MS program in data science. The new degree is uniquely structured to provide the foundational education and real hands-on knowledge to address the shortage of data scientists across multiple sectors.