Towards transforming the Indian language technology ecosystem

Title of the talk: Towards transforming the Indian language technology ecosystem

Speaker: Mitesh M Khapra, Associate Professor in the Department of Computer Science and Engineering and Head of the AI4Bharat Research Lab at IIT Madras

Date: 20 September 2024

Link to the report: Click here

Talk summary: In this talk, Mitesh M Khapra reflected on his team’s journey towards transforming the landscape of Indian language technology. He delved into their engineering-heavy approach in addressing the initial scarcity of data for Indian languages, while gradually establishing the necessary human resources to gather high-quality data on a larger scale through Bhashini. The objective was to share insights into developing high quality open-source technology for Indian languages. This involves curating extensive data from the internet, constructing multilingual models for transfer learning, and crafting high-quality datasets for fine-tuning and evaluation. He then transitioned into how their experiences can benefit the broader AI community, particularly as India aspires to create large language models (LLMs) for Indic languages.


Speaker bio: Mitesh M Khapra is an Associate Professor in the Department of Computer Science and Engineering at the Indian Institute of Technology Madras (IITM). He heads the AI4Bharat Research Lab at IIT Madras which focuses on building datasets, tools, models, and applications for Indian languages. His research work has been published in several top conferences and journals including TACL, ACL, NeurIPS, TALLIP, EMNLP, EACL, and AAAI. He has also served as Area Chair or Senior PC member in top conferences such as ICLR, ACL, and AAAI. Prior to IIT Madras, he was a Researcher at IBM Research India for four and a half years, where he worked on several interesting problems in the areas of statistical machine translation, cross language learning, multimodal learning, argument mining, and deep learning. Prior to IBM, he completed his PhD and MTech from IIT Bombay in January 2012 and July 2008, respectively. His PhD thesis dealt with the important problem of reusing resources for multilingual computation. During his PhD, he was a recipient of the IBM PhD Fellowship (2011) and the Microsoft Rising Star Award (2011). He is also a recipient of the Google Faculty Research Award (2018), the IITM Young Faculty Recognition Award (2019), the Prof B Yegnanarayana Award for Excellence in Research and Teaching (2020), and the Srimathi Marti Annapurna Gurunath Award for Excellence in Teaching (2022).