Hi, I'm a Lecturer and final year Masters student at IUT-CSE. I'm also working as a part-time research intern at Yaana Technologies. Prior to that, I obtained my Bachelor's in Computer Science and Engineering (CSE) from IUT and was an AI researcher at Penta Global.
Currently, I am exploring modality alignment methods of Vision-Language models, low-resource language enhancement using the linguistic foundations shared with high-resource languages (i.e. cross-lingual data), and cost-minimization strategies of Large Language Models (LLMs) through frugal representations. Broadly speaking, my research interests span:
- Visual Question Answering (VQA) & Multimodal Learning
- Multicultural & Low Resource Natural Language Processing (NLP)
- Resource Efficiency & Frugal Representation
Interplay between modalities and how information can be conveyed across them, e.g., in layered architectures through modality adapters, and alignment strategies.
Addressing western bias and under-represented cultures in NLP; language-specific challenges, especially in script-mixing and language mixing settings; enhancing low-resource languages using cross-lingual data.
Methods to reduce visual/textual token count while maintaining performance and thereby, minimizing inference cost of LLMs.
News and Updates
- Oct 24: Check out our pre-prints on Bangla VQA system, ChitroJera, and multi-label hate speech detection on transliterated Bangla, BanTH.
- Oct 24: FourierKAN outperforms MLP on Text Classification Head Fine-tuning got accepted in FITML at NeurIPS-2024. Shoutout to my teammate Abdullah Al Imran.
- Sep 24: Our work on Bangla back-transliteration, BanglaTLit got accepted in Findings of EMNLP, 2024. Great work from team Penta!
- Aug 24: Check out the preprints on Fourier-KAN Text Classification and Code-Mixed Bengali Sentiment Analysis, available on arXiv.
- Jul 24: New preprint on my undergrad thesis, Visual Robustness Benchmark for VQA, is available on arXiv. I’m grateful to my wonderful teammates and advisors at IUT-CSE.
- May 24: Finalists at Robi Datathon 3.0, Bangladesh’s largest data analysis event with 3,500+ participants. Another competition with team Penta!
- May 24: Participated in the EXIST-2024 shared task with my amazing team from Penta Global.
- Jan 24: Our VQA Survey got accepted in Information Fusion.
Research Highlights
Visual Robustness Benchmark for Visual Question Answering (VQA)
Md Farhan Ishmam*, Ishmam Tashdeed*, Talukder Asir Saadat*, Md Hamjajul Ashmafee, Abu Raihan Mostofa Kamal, Md Azam Hossain
- Can VQA models maintain their performance in real-world scenarios, especially when prone to realistic visual corruptions, e.g. noise, blur?
- How can we evaluate the robustness of VQA models when subjected to such common corruptions?
- We propose a large-scale benchmark and robustness evaluation metrics to evaluate VQA models before deployment.
- We quantify aspects of performance drop, e.g. rate, range, mean. We found that models with higher accuracy are not necessarily more robust.
BanglaTLit: A Benchmark Dataset for Back-Transliteration of Romanized Bangla
Md Fahim*, Fariha Tanjim Shifat*, Md Farhan Ishmam*, Deeparghya Dutta Barua, Fabiha Haider, Md Sakib Ul Rahman Sourove, Farhad Alam Bhuiyan
- How can we enhance the representation of Romanized Bangla for automatic back-transliteration using seq2seq models?
- We propose large-scale pre-training corpus and Bangla back-transliteration datasets for fully fine-tuning language encoders and seq2seq models.
- We aggregate representations from transliterated Bangla encoders with seq2seq models [Dual Encoders->Aggregation->Decoder architecture] to achieve SOTA on Bangla back-transliteration.
From Image to Language: A Critical Analysis of Visual Question Answering (VQA) Approaches, Challenges, and Opportunities
Md Farhan Ishmam, Md Sakib Hossain Shovon, Muhammad Firoz Mridha, Nilanjan Dey
- Comprehensive survey on VQA datasets, methods, metrics, challenges, and research opportunities.
- New taxonomy that systematically categorizes VQA literature and multimodal learning tasks.
- Novel real-world applications of VQA in domains e.g. assistive technology, education, and healthcare.