I am a final-year Ph.D. candidate working on Vision–Language and Multimodal AI, with a focus on Multimodal Large Language Models.
My research focuses on the design, training, and evaluation of large-scale multimodal systems for image–text understanding and multimodal reasoning, including retrieval-augmented generation. I have published multiple peer-reviewed papers at top-tier venues, including CVPR and ICLR.
I am part of the AImageLab research group and work under the supervision of Professor Rita Cucchiara . I was recently an Applied Scientist Intern at Amazon Science (Cambridge, UK), working on large-scale multimodal models.

In Conference on Computer Vision and Pattern Recognition, 2025

In International Conference on Learning Representations 2025

In British Machine Vision Conference 2024

In Conference on Computer Vision and Pattern Recognition Workshops, 2024

In Findings of the Association for Computational Linguistics, 2024

In International Conference on Pattern Recognition, 2024

In International Journal of Computer Vision, 2025

In British Machine Vision Conference 2025

In International Conference on Computer Vision, 2025
In European Conference on Computer Vision and Pattern Recognition, 2024

In Sensors MDPI, 2023

In IEEE Intelligent Systems, 2024
A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning.
I'm always open to discussing research ideas, potential collaborations, or opportunities to apply AI in innovative ways.
Email Me