what-i-am-doing

A timeline of my current and past activities, projects, and endeavors.

July 2025 - Present

Student Researcher at Google DeepMind

Current
July 2025 - Present
Joined the AnthroKrishi team at Google DeepMind as a student researcher, focusing on field and agricultural body segmentation from raw satellite data (RSD) using vision-language models (vLLMs). I was responsible for integrating Gemini for directly segmenting RSD and developing evaluation pipelines to assess model performance against established benchmarks.

Representative Publications

Show me more

I joined Google DeepMind as a student researcher, at AnthroKrishi Team. The main goal of my project was to see if we can directly use vLLMs (e.g., Gemini, Gemma3, etc) for segmentation without any additional image/pixel decoder. This is an important distinction as there are several methods that leverage decoder head over LLM tokens for segmentation. These methods offer the luxury of using pixel-level segmentation loss directly to fine-tune the model. However, the problem becomes much harder and raises some tough questions such as - how do you represent a segmentation mask using tokens?, do you use polygons to represent each mask?, what about the limit on token length?

There are few methods along this line. A prominent one being Text4Seg, however, these methods fail to scale for high resolution RSD images. I was responsible for solving this problems using several innovations such as leveraging multi-scale segmentation masks and GRPO-based post-training. These changes led to an improvement of ~16% on RSD data compared to existing baselines.

2024 - 2025

Safe and Unbiased generation in Generative Models

2024 - 2025
During this period, I worked on solving problems like 'Unlearning' and 'Debiasing' in generative models like GANs and Diffusion Models.
Show me more

I came across this idea of 'Unlearning' in 2023 through discussion with my friend Subhodip Panda (go checkout his work!), and I immediately got excited about its potential applications in generative models. At that time, this was still a very nascent area of research, and people were just starting to explore it in discriminative models. There were very few primitive attempts in generative models. I saw an opportunity to contribute by exploring how unlearning techniques could be integrated into SoTA GAN architectures like StyleGANs. One of the ideas I had was - to unlearn a particular concept, do we need to first learn it? This led us to do some toy experiment where we adapted GANs on a concept we want to unlearn and then perform linear extrapolation in parameter space in the opposite direction. This gave us strong evidence of unlearning being feasible in GANs by exploiting parameter space semantics. This observation finally culminated into our paper on unlearning in GANs.

We tried to extend this idea further on diffusion models. However, this failed miserably. My hypothesis is that the 'parameter-space semantics' are much more reasonable in single-step samplers like GANs unlike diffusion models which are multi-step samplers. However, while reading on this, another problem that I found interesting was "fair generation" or "unbiased generation" in diffusion models. Usually, diffusion models exhibit a lot of biases towards certain concepts. After some simple mathematical analysis, we were able to pin-point exactly why this happens. And this insight gave us a simple solution on how we can solve this problem.

2022 - 2024

Using EBMs to solve test time problems

2022 - 2024
During this period I worked on understanding how energy-based models (EBMs) can be used to 'bridge' two distributions and how this 'bridge' could help solve problems at test time like domain adaptation / generalization.
Show me more

At the start of my Ph.D., I was fascinated by the idea of energy-based models (EBMs). In particular, it was exciting to know that we can learn the explicit (un-normalized) density function of the data using MLE/Contrastive Divergence. However, what fascinated me was not the training procedure, but the sampling process using MCMC langevin dynamics, especially the Latent Energy Transport paper. It showed that one can transverse between two domains by using EBMs and its iterative sampling. This motivated me to first understand why exactly this should lead to 'translation'. Usually MCMC/LD allow you to sample from a distribution, but it was not clear why this should lead to 'translation'? To this end, we went on to propose a Cycle Consistent EBM which is explicitly trained to satisfy boundary conditions which satisfies the (vague) definition of 'translation'.

Further, this 'translation' or 'bridge' between two distribution gives us samples from vicinal distributions for free! This leads to a natural question - can we make use of these vicinal distributions for data augmentation and hence, for domain generalization? - the answer is yes! In hindsight, this idea need not be restricted to EBMs, any bridge process (which can be built using diffusio or flow models) should be capable of doing this.

Aug 2021

Start of Ph.D. at IISc

Joined ECE department @ IISc as a Ph.D. student.

Mar 2021 - Aug 2021

Research @ IISc/IIT Delhi

Mar 2021 - Aug 2021
Worked on two problems: (i) Few-shot adaptation of GANs on OOD data and (ii) Minority oversampling technique using VAEs for imbalanced data classification.
Show me more

This was the transition period between my undergraduate studies and the start of my Ph.D. program. Here, I was exposed to modern deep learning research and publications aimed at the so-called A* conferences. My work primarily focussed on regularizing latent space of GANs and VAEs for improved generalization and robustness. My major focus was exploring if we can use pre-trained GANs to adapt to small OOD (but related) datasets (think of FFHQ and emojis dataset). The solution we came up was a simple inference-time optimization where a simple MLP was prepended to the GAN and trained while sampling for multiple steps. A follow-up work sought to reduce the inference-time of this method. To do this, we proposed a hyper-network that was trained to predict the optimal MLP parameters for a given latent vector.

2021

End of B.Tech

Graduated from EE department @ IITP.

2017 - 2021

Research @ IIT Patna

2017 - 2021
Worked on indoor localization using wireless RSSI data using classical MLE methods as well as deep learning methods.
Show me more

My undergraduate years at IIT Patna were transformative, sparking my deep interest in math/research. In particular, I worked on location estimation in indoor environment using RSSI signals. Particularly, RSSI data reach any device through wireless medium (think of how Wi-Fi signals reach your phone). The RSSI signals undergo various transformations and noise additions as they propagate through the environment, this is also referred to as fading making the localization task challenging yet fascinating. I worked on localization using generic fading models using an MLE based approach. Here is a list of fading models we explored and related subtleties:

  1. Rayleigh Fading: Proposed MLE for Rayleigh fading model with simultaneous parameter estimates and an Adaptive Mini-Batch gradient ascent method to quickly maximize the log-likelihood to find the location estimate.
  2. κ-μ Fading: Proposed an approximate MLE for κ − μ fading model and an Adaptive Order based like-lihood maximization using a look-up table to localize a smart device.
  3. η-μ Fading: Proposed a weighted approximation for MLE of η − μ fading model which can use multiple Bessel function approximations to localize a smart device.
  4. α-μ Fading: Proposed a lightweight RSS localization method by utilizing MLE of α − μ small-scale fading model.
Aug 2017

Start of B.Tech at IIT Patna

Joined EE department @ IITP as undergraduate student.