Transparent Button

Hi, I'm a Lecturer at IUT-CSE and a researcher at Penta Global Limited. Before joining IUT, I completed my Bachelor's in Computer Science and Engineering (CSE) from IUT and was a researcher at AMIRL.

My research explores the interaction between different modalities, such as vision and language, and different linguistic variations, such as transliteration, code-mixing, and dialects. Currently, I am working with Dr Mohammad Ali Moni (UQ) on LLM applications in the biomedical domain and Mir Rayat Imtiaz Hossain (UBC) on assistive technologies for color blindness. My research interests span:

  • Visual Question Answering (VQA) and Multimodal Learning
  • Interplay between visual and textual modalities, how information is conveyed between modalities, practical usability and potential applications of multimodal systems.

  • Low Resource Natural Language Processing (NLP)
  • Intricacies of less-explored languages, challenges of word and script mixing between languages, and how high-resource languages can enhance NLP capabilities of low-resource languages.

  • Evaluation and Frugal Learning of Large Language Models (LLMs)
  • Knowledge profiling of LLMs, how LLMs can maintain comparable performance on reduced input through techniques like token attribution.

  • Multicultural and Inclusive NLP
  • Addressing Western bias of NLP models, challenges in multicultural settings, and the complexities of detecting and mitigating diverse forms of hateful content, such as sexism.

News and Updates

Research Highlights

Visual Robustness Benchmark for Visual Question Answering (VQA)

Md Farhan Ishmam*, Ishmam Tashdeed*, Talukder Asir Saadat*, Md Hamjajul Ashmafee, Abu Raihan Mostofa Kamal, Md Azam Hossain

  • Large-scale benchmark comprising 213,000 augmented images to challenge the robustness of VQA models against realistic visual corruptions.
  • Novel robustness evaluation metrics that can be aggregated into a unified metric adaptable for multiple use cases.
  • Experiments reveal the interplay between factors, such as model size, performance, and robustness, when subjected to real-world corruption effects.

From Image to Language: A Critical Analysis of Visual Question Answering (VQA) Approaches, Challenges, and Opportunities

Md Farhan Ishmam, Md Sakib Hossain Shovon, Muhammad Firoz Mridha, Nilanjan Dey

  • Comprehensive survey on VQA datasets, methods, metrics, challenges, and research opportunities.
  • New taxonomy that systematically categorizes VQA literature and multimodal learning tasks.
  • Novel real-world applications of VQA in domains e.g. assistive technology, education, and healthcare.

BanglaTLit: A Benchmark Dataset for Back-Transliteration of Romanized Bangla

Md Fahim*, Fariha Tanjim Shifat*, Md Farhan Ishmam*, Deeparghya Dutta Barua, Fabiha Haider, Md Sakib Ul Rahman Sourove, Farhad Alam Bhuiyan

  • First large-scale automated Bangla transliteration, BanglaTLit, with over 42.7k samples.
  • A romanized Bangla pre-training corpus, BanglaTLit-PT, with over 245.7k samples.
  • Novel T5-based dual encoder architecture achieving SOTA on BanglaTLit.