Hi, I'm a Lecturer at IUT-CSE and a researcher at Penta Global Limited. Before joining IUT, I completed my Bachelor's in Computer Science and Engineering (CSE) from IUT and was a researcher at AMIRL.
My research explores the interaction between different modalities, such as vision and language, and different linguistic variations, such as transliteration, code-mixing, and dialects. Currently, I am working with Dr Mohammad Ali Moni (UQ) on LLM applications in the biomedical domain and Mir Rayat Imtiaz Hossain (UBC) on assistive technologies for color blindness. My research interests span:
- Visual Question Answering (VQA) and Multimodal Learning
- Low Resource Natural Language Processing (NLP)
- Evaluation and Frugal Learning of Large Language Models (LLMs)
- Multicultural and Inclusive NLP
Interplay between visual and textual modalities, how information is conveyed between modalities, practical usability and potential applications of multimodal systems.
Intricacies of less-explored languages, challenges of word and script mixing between languages, and how high-resource languages can enhance NLP capabilities of low-resource languages.
Knowledge profiling of LLMs, how LLMs can maintain comparable performance on reduced input through techniques like token attribution.
Addressing Western bias of NLP models, challenges in multicultural settings, and the complexities of detecting and mitigating diverse forms of hateful content, such as sexism.
News and Updates
- Sep 24: BanglaTLit got accepted in Findings of EMNLP, 2024.
- Aug 24: New preprints on Fourier-KAN Text Classification and Code-Mixed Bengali Sentiment Analysis are available on arXiv.
- Jul 24: New preprint on Visual Robustness Benchmark for VQA is available on arXiv.
- May 24: Finalists at Robi Datathon 3.0, Bangladesh’s largest data analysis event with 3,500+ participants.
- May 24: Multiple papers accepted in EXIST-2024.
- Jan 24: VQA Survey is accepted in Information Fusion.
Research Highlights
Visual Robustness Benchmark for Visual Question Answering (VQA)
Md Farhan Ishmam*, Ishmam Tashdeed*, Talukder Asir Saadat*, Md Hamjajul Ashmafee, Abu Raihan Mostofa Kamal, Md Azam Hossain
- Large-scale benchmark comprising 213,000 augmented images to challenge the robustness of VQA models against realistic visual corruptions.
- Novel robustness evaluation metrics that can be aggregated into a unified metric adaptable for multiple use cases.
- Experiments reveal the interplay between factors, such as model size, performance, and robustness, when subjected to real-world corruption effects.
From Image to Language: A Critical Analysis of Visual Question Answering (VQA) Approaches, Challenges, and Opportunities
Md Farhan Ishmam, Md Sakib Hossain Shovon, Muhammad Firoz Mridha, Nilanjan Dey
- Comprehensive survey on VQA datasets, methods, metrics, challenges, and research opportunities.
- New taxonomy that systematically categorizes VQA literature and multimodal learning tasks.
- Novel real-world applications of VQA in domains e.g. assistive technology, education, and healthcare.
BanglaTLit: A Benchmark Dataset for Back-Transliteration of Romanized Bangla
Md Fahim*, Fariha Tanjim Shifat*, Md Farhan Ishmam*, Deeparghya Dutta Barua, Fabiha Haider, Md Sakib Ul Rahman Sourove, Farhad Alam Bhuiyan
- First large-scale automated Bangla transliteration, BanglaTLit, with over 42.7k samples.
- A romanized Bangla pre-training corpus, BanglaTLit-PT, with over 245.7k samples.
- Novel T5-based dual encoder architecture achieving SOTA on BanglaTLit.