Transparent Button

Hi, I'm a Lecturer and final year Masters student at IUT-CSE. I'm also working as a part-time research intern at Yaana Technologies. Prior to that, I obtained my Bachelor's in Computer Science and Engineering (CSE) from IUT and was an AI researcher at Penta Global.

Currently, I am exploring modality alignment methods of Vision-Language models, low-resource language enhancement using the linguistic foundations shared with high-resource languages (i.e. cross-lingual data), and cost-minimization strategies of Large Language Models (LLMs) through frugal representations. Broadly speaking, my research interests span:

  • Visual Question Answering (VQA) & Multimodal Learning
  • Interplay between modalities and how information can be conveyed across them, e.g., in layered architectures through modality adapters, and alignment strategies.

  • Multicultural & Low Resource Natural Language Processing (NLP)
  • Addressing western bias and under-represented cultures in NLP; language-specific challenges, especially in script-mixing and language mixing settings; enhancing low-resource languages using cross-lingual data.

  • Resource Efficiency & Frugal Representation
  • Methods to reduce visual/textual token count while maintaining performance and thereby, minimizing inference cost of LLMs.

News and Updates

Research Highlights

Visual Robustness Benchmark for Visual Question Answering (VQA)

Md Farhan Ishmam*, Ishmam Tashdeed*, Talukder Asir Saadat*, Md Hamjajul Ashmafee, Abu Raihan Mostofa Kamal, Md Azam Hossain

  • Can VQA models maintain their performance in real-world scenarios, especially when prone to realistic visual corruptions, e.g. noise, blur?
  • How can we evaluate the robustness of VQA models when subjected to such common corruptions?
  • We propose a large-scale benchmark and robustness evaluation metrics to evaluate VQA models before deployment.
  • We quantify aspects of performance drop, e.g. rate, range, mean. We found that models with higher accuracy are not necessarily more robust.

BanglaTLit: A Benchmark Dataset for Back-Transliteration of Romanized Bangla

Md Fahim*, Fariha Tanjim Shifat*, Md Farhan Ishmam*, Deeparghya Dutta Barua, Fabiha Haider, Md Sakib Ul Rahman Sourove, Farhad Alam Bhuiyan

  • How can we enhance the representation of Romanized Bangla for automatic back-transliteration using seq2seq models?
  • We propose large-scale pre-training corpus and Bangla back-transliteration datasets for fully fine-tuning language encoders and seq2seq models.
  • We aggregate representations from transliterated Bangla encoders with seq2seq models [Dual Encoders->Aggregation->Decoder architecture] to achieve SOTA on Bangla back-transliteration.

From Image to Language: A Critical Analysis of Visual Question Answering (VQA) Approaches, Challenges, and Opportunities

Md Farhan Ishmam, Md Sakib Hossain Shovon, Muhammad Firoz Mridha, Nilanjan Dey

  • Comprehensive survey on VQA datasets, methods, metrics, challenges, and research opportunities.
  • New taxonomy that systematically categorizes VQA literature and multimodal learning tasks.
  • Novel real-world applications of VQA in domains e.g. assistive technology, education, and healthcare.