27–28 May 2021
online
Europe/Copenhagen timezone
Transferring innovative methods across scientific boundaries...

Training an interpretable ML algorithm with only a dab of real data: An extragalactic perspective

27 May 2021, 14:40
20m
"Classic" talk Images Afternoon 1

Speaker

Aritra GHOSH

Description

In the last decade, convolutional neural networks (CNNs) have revolutionized the field of image processing and have become increasingly popular among astronomers for morphological analysis of galaxies. This push has been driven by the fact that they are the perfect alternative to the traditional techniques of obtaining morphological classifications --- expert visual classification, citizen science projects, and fitting light profiles, none of which is easily scalable to large data volumes.

However, most previous applications of CNNs to morphological analysis have required a large training set of real galaxies with pre-determined classifications. However, if CNNs are to become the method of choice for analyzing unclassified data from future surveys, this necessitates an algorithm that does not require a large pre-classified training set of real galaxies from the same survey. The challenge of training a machine learning algorithm to classify brand new data, which has not been manually/previously looked at, is not unique to astronomy and is applicable to many other scientific fields which use large amounts of data such as the biomedical sciences.

In this talk, I will outline how we have successfully trained a Bayesian CNN called Galaxy Morphology Network (GaMorNet) with a very small amount of real data and used it to extract morphological parameters of galaxies at a variety of redshifts from different surveys. We first trained GaMorNet on a large simulation suite of galaxies and then used a small amount of real data to perform transfer-learning/domain adaptation. We have already demonstrated that a preliminary classification-version of GaMorNet (Ghosh et. al. 2020) can be successfully applied to data from different surveys with misclassification rates of $\leq 5\%$. We have also used GaMorNet to study the morphology and quenching of $\sim100,000$ ($z\sim0$) SDSS and $\sim20,000$ ($z\sim1$) CANDELS galaxies using morphology-separated color-mass diagrams. Using the GaMorNet classifications, we find that bulge- and disk-dominated galaxies have distinct color-mass diagrams with separate evolutionary pathways. For both datasets, disk-dominated galaxies peak in the blue cloud, across a broad range of masses, consistent with the slow exhaustion of star-forming gas. In contrast, bulge-dominated galaxies are mostly red, with much smaller numbers down toward the blue cloud, suggesting rapid quenching and fast evolution across the green valley. GaMorNet is one of the very few publicly available CNNs in astronomy, complete with trained models.

I will also outline in this talk why GaMorNet is not a black-box and how the representations learned by the network are highly amenable to visual interpretation. We have used a combination of different CNN visualization techniques to investigate and shed light on GaMorNet’s decision-making process, making our results interpretable, reproducible, and robust.

Primary author

Aritra GHOSH

Co-author

Prof. C. M. URRY (Yale University)

Presentation materials