Farhan Ishmam

Transparent Button

Hi, I'm Farhan, an incoming PhD student at KSoC, UofU. I'm currently working as a lecturer at IUT-CSE and a part-time research intern at Yaana Technologies. Prior to that, I obtained my Bachelor's degree from IUT-CSE.

Currently, I am exploring domain adaptation and continual learning for Vision-Language Models (VLMs), specifically its application in computer use. Broadly speaking, my research interests span:

Continual & Lifelong learning for VLMs
Agent-centric Benchmarks
Multimodal Adaptation & Transfer Learning

News and Updates

Jan 25: BanTH, got accepted in Findings of NAACL, 2025!
Dec 24: BnSentMix, got accepted in LoResLM at COLING 2025!
Oct 24: Visual Robustness Benchmark for VQA is accepted at WACV 2025. I’m grateful to my wonderful teammates and advisors at IUT-CSE.
Oct 24: Check out our pre-prints on Bangla VQA system, ChitroJera, and multi-label hate speech detection on transliterated Bangla, BanTH.
Oct 24: FourierKAN outperforms MLP on Text Classification Head Fine-tuning got accepted in FITML at NeurIPS 2024. Shoutout to my teammate Abdullah Al Imran.
Sep 24: Our work on Bangla back-transliteration, BanglaTLit got accepted in Findings of EMNLP, 2024. Great work from team Penta!
Aug 24: Check out the preprints on Fourier-KAN Text Classification and BnSentMix: Code-Mixed Bengali Sentiment Analysis, available on arXiv.
Jul 24: New preprint on my undergrad thesis, Visual Robustness Benchmark for VQA, is available on arXiv.
May 24: Finalists at Robi Datathon 3.0, Bangladesh’s largest data analysis event with 3,500+ participants. Another competition with team Penta!
May 24: Participated in the EXIST-2024 shared task with my amazing team from Penta Global.
Jan 24: Our VQA Survey got accepted in Information Fusion.

Research Highlights

Visual Robustness Benchmark for Visual Question Answering (VQA)

Md Farhan Ishmam*, Ishmam Tashdeed*, Talukder Asir Saadat*, Md Hamjajul Ashmafee, Abu Raihan Mostofa Kamal, Md Azam Hossain

Can VQA models maintain their performance in real-world scenarios, especially when prone to realistic visual corruptions, e.g. noise, blur?
How can we evaluate the robustness of VQA models when subjected to such common corruptions?
We propose a large-scale benchmark and robustness evaluation metrics to evaluate VQA models before deployment.
We quantify aspects of performance drop, e.g. rate, range, mean. We found that models with higher accuracy are not necessarily more robust.

BanglaTLit: A Benchmark Dataset for Back-Transliteration of Romanized Bangla

Md Fahim*, Fariha Tanjim Shifat*, Md Farhan Ishmam*, Deeparghya Dutta Barua, Fabiha Haider, Md Sakib Ul Rahman Sourove, Farhad Alam Bhuiyan

How can we enhance the representation of Romanized Bangla for automatic back-transliteration using seq2seq models?
We propose large-scale pre-training corpus and Bangla back-transliteration datasets for fully fine-tuning language encoders and seq2seq models.
We aggregate representations from transliterated Bangla encoders with seq2seq models [Dual Encoders->Aggregation->Decoder architecture] to achieve SOTA on Bangla back-transliteration.

From Image to Language: A Critical Analysis of Visual Question Answering (VQA) Approaches, Challenges, and Opportunities

Md Farhan Ishmam, Md Sakib Hossain Shovon, Muhammad Firoz Mridha, Nilanjan Dey

Comprehensive survey on VQA datasets, methods, metrics, challenges, and research opportunities.
New taxonomy that systematically categorizes VQA literature and multimodal learning tasks.
Novel real-world applications of VQA in domains e.g. assistive technology, education, and healthcare.