I am passionate about self-supervised learning, multi-modality, and generative models in computer vision and beyond.
Currently, I am pursuing my Ph.D. at the Computer Vision Group at TUM supervised by Prof. Daniel Cremers under the lead of Dr. Xi Wang.
Before that, I completed my my M.Sc. in Mathematics in Data Science and my B.Sc. in Mathematics with a minor in Computer Science at TUM.
Vision-Language models need a lot of paired training data. Can we match vision and language without any supervision? Our work shows that it could be indeed feasible.
We show that existing graph neural networks struggle with graphs at different resolutions. We propose a modification of the message passing paradigm to overcome this issue.
We use Laplace approximation to learn expressive priors for neural networks. This improves the uncertainty estimation and PAC-Bayes generalization bounds.