Workshop on AI applications for Indian Music
The Department of Heritage Science and Technology, IIT Hyderabad is organizing a two day workshop on AI Applications for Indian Music on 5th and 6th, April 2025 at IIT Hyderabad. Organized on the heels of ICASSP 2025 conference in Hyderabad, the workshop will include talks by invited experts, hands-on tutorials and a hackathon. After a set of talks by invited speakers to introduce different topics on AI applications for Indian music, the participants will have a chance to interact with them and generate ideas. This will follow hands-on tutorials and a buildathon where the participants can form teams to brainstorm and build demos and AI applications for Indian music.
Venue
Lecture Hall Complex (LHC),
Indian Institute of Technology Hyderabad,
Kandi, Sangareddy, Telangana - 502284
Map Location: https://maps.app.goo.gl/nvQUmnaGdaiKomMB7
Directions to reach IIT Hyderabad: https://www.iith.ac.in/about/aboutiith/#reach
Please note the room number within the Lecture Hall Complex
05 April: Room LHC-07
06 April: Room LHC-09
Schedule
05 April 2025 (Sat)
08:55 - 09:00 : Welcome
09:00 - 09:40 : Studying a Musical Repertoire with Computational Approaches: The Case of Carnatic Music (Prof. Xavier Serra)
09:40 - 10:20 : Carnatic Music Processing: A Culture-specific Approach (Prof. Hema Murthy)
10:20 - 11:00 : Uncertainty Estimation for Music Analysis (applied to Hindustani Music) (Prof. Vipul Arora)
11:00 - 11:20 : Coffee Break
11:20 - 12:00 : Harmonic Convergence: Orchestrating the Synergy of Human Intuition and Machine Intelligence (Prof. Kaustuv Kanti Ganguli)
12:00 - 12:40 : Vibration Characteristics of Strings in Indian Stringed Musical Instruments (Prof. Ashok Kumar Mandal)
12:40 - 13:00 : AI in Music Generation: Trends and Perspectives from the Industry (Siddharth Bhardwaj)
13:00 - 14:00 : Lunch Break
14:00 - 14:30 : AI Lutherie: generative AI for live performance and improvisation (Manaswi Mishra)
14:30 - 15:00 : Svara Performance in Carnatic Music and its Implications on Symbolic Transcription (Thomas Nuttall)
15:00 - 15:30 : Music Source Separation for Carnatic music: Challenges with existing datasets and some possible solutions (Adithi Shankar)
15:30 - 16:00 : Rhythm and Percussion Pattern Analysis in Indian Art Music (Gowriprasad R)
16:00 - 16:20 : Coffee Break
16:20 - 17:00 : Music Hackathon logistics and brainstorming (Siddharth Bhardwaj, Manaswi Mishra)
17:00 - 23:59 : Music hackathon continues offline . . .
06 April 2025 (Sun)
00:00 - 15:00 : Music hackathon continues offline . . .
15:00 - 17:00 : Music Hackathon presentations and demos (Hackathon teams)
Registration and attendance
The workshop is free to attend and open to everyone, but registration is mandatory.
If you have already registered for the workshop, your place is secured and you can attend the workshop. Participants are requested to make their own arrangements for travel and accommodation. All participants who have signed up for the hackathon will receive an invite to join a Discord server where we plan to discuss and brainstorm hack ideas and form potential teams in the days leading upto the workshop. We invite you to join the group and actively participate on these offline brainstorming channels.
For In-Person Registrations: click here or scan the QR code. Due to an overwhelming positive response to the workshop, we are no longer able to accept new confirmed registration requests. If you haven't already registered for the workshop and are still interested, please sign up on this waitlist (click here or scan the adjacent QR code) to express your interest and we will contact you to inform if we are able to accommodate your attendance.
Scan to join waitlist
Speaker Information
Prof. Xavier Serra
Professor, Universitat Pompeu Fabra (UPF), Barcelona, Spain; Director, Music Technology Group, UPF
Abstract of the talk: In this talk, I will try to summarize the research efforts carried out at the Music Technology Group of the Universitat Pompeu Fabra in applying computational methodologies to analyze and understand diverse music traditions, with a focus on Carnatic music. I will start by discussing the process of identifying relevant research questions from which we can obtain impactful results, highlighting the collaborative process with musicians, scholars, and technologists within the Carnatic music community. This presentation will address various facets of our work, including the curation and creation of culturally rich datasets, the development of tailored signal processing and machine learning techniques, and the unique challenges we encountered in capturing the nuanced characteristics of Carnatic music. I will discuss specific tasks such as audio source separation and melodic analysis, showcasing the methodologies and tools designed to address the complex, ornamented structures of Carnatic music. In the talk, I will also cover evaluation frameworks that measure the relevance and accuracy of our findings, as well as the development of practical applications that support both academic research and broader community engagement. Finally, I will reflect on the broader implications of this research, considering its impact on the preservation and understanding of non-Western musical repertoires and the field of computational musicology.
Biography of the Speaker: Professor at the Universitat Pompeu Fabra in Barcelona, where he leads the Music Technology Group within the Department of Engineering. He earned his PhD in Computer Music from Stanford University in 1989, focusing on the spectral processing of musical sounds, a foundational work in the field. His research spans computational analysis, description, and synthesis of sound and music signals, blending scientific and artistic disciplines. Dr. Serra is very active in the fields of Audio Signal Processing, Sound and Music Computing, Music Information Retrieval, and Computational Musicology at the local and international levels, being involved in the editorial board of several journals and conferences and giving lectures on current and future challenges of these fields. He received an Advanced Grant from the European Research Council for the CompMusic project, promoting multicultural approaches in music information research. Currently, he directs the UPF-BMAT Chair on AI and Music, dedicated to fostering Ethical AI initiatives that can empower the music sector.
Prof. Hema Murthy
Professor Emeritus, IIT Madras, Chennai, India
Abstract of the talk: Indian art music is based on the “gayaki” style, which means that even instruments try to reproduce the vocal style. The singing voice affords tremendous flexibility and thus leads to significant ornamentation. Carnatic music is centered around the composition or “kriti,” where the melody, lyrics, and rhythm play an important role. In this talk, I will present our efforts on the “computational analysis” of Carnatic music. The computational analysis, ranging from “sruti” detection, melodic analysis, concert, and kriti segmentation, to “typical phrase determination,” leading to “raga recognition” will be presented. The talk will also cover a critical analysis of the relationship between “janaka” and “janya” ragas as also an attempt to augment the prescriptive notation with an “automatic descriptive notation,” that enables reproduction of the melody. The percussion accompaniment in Indian music is also replete with improvisation. I will briefly discuss some of our efforts on percussion analysis, giving a prelude to the talk by R. Gowriprasad. Many of the aspects are also relevant to Hindustani Music. Wherever possible, we also draw examples from Hindustani Music. Some recent efforts on the effect of “Dhrupad Music on the brain” will be discussed if time permits.
Biography of the Speaker: Hema A Murthy has about 38 years of experience working with speech technologies. Her research expands to Large Language Models and genome information signal processing. She has participated in Speaker Identification, Speaker Verification, Spoken Language Identification, anti-spoofing challenges (both physical and logical attacks), and text-to-speech synthesis benchmarks. She has also worked on the front-end for ASR, while she was at SRI. This resulted in a significant improvement in WER in the ASR benchmark. She led the consortium on text-to-speech synthesis, a project funded by MeitY (2009-2017). She received the “IBM Faculty Award” in 2006 for her efforts in “Speech Synthesis in Indian Languages.” In 2012, “Screen Readers for the Visually Challenged in six Indian languages” was released by the Minister of State (Ministry of Human Resources Development). She was also involved in the CompMusic project from 2012-2017, where the objective was to build MIR systems for Indian art Music, in particular Carnatic Music. She also led the speech effort in the pilot projects on “speech to speech translation,” funded by PSA, GoI, and MeiTY GoI from 2021-2022. She currently leads a consortium of 23 institutions (a project started in 2022) under the National Language Translation Mission on “Speech Technologies in Indian Languages.” Many of the technologies developed have been commercialized. She also works on brain signals, in particular EEG signals, in collaboration with Mriganka Sur of MIT, Shrikanth Narayanan of USC, and Rajeswari Aghoram of JIPMER. A current SPARC project titled “EEG and Cognition” is underway. She was also part of the Centre for Computational Brain Research at IIT Madras, until her retirement, where she continues as an Emeritus (Honorary) Professor. A key contribution is her focus on the development of hybrid technologies where signal processing and machine learning are used in tandem to reduce the footprint of machine learning algorithms. She is a Fellow of the INAE (Indian National Academy of Engineering), a Fellow of ISCA (International Speech Communication Association), a Fellow of the Asia-Pacific AI Association, and a Senior Area Editor of IEEE Trans ASLP.
Prof. Vipul Arora
Associate Professor, IIT Kanpur, India
Abstract of the talk: While majority of research today is aiming at developing highly accurate systems, uncertainty estimation takes the modest approach of quantifying the inaccuracies of the systems. Uncertainty estimation entails that in addition to the regular output (classification, regression, etc.), the system also provides its confidence in the output. This helps in downstream decision making tasks, and in active learning for efficient data labeling. I will present our recent works on uncertainty estimation for music transcription, including problems such as melody estimation, raga identification and ornamentation detection.
Biography of the Speaker: Vipul Arora is an Associate Professor at the department of Electrical Engineering at IIT Kanpur, India. He received his B. Tech. (2009) and PhD (2015) degrees in Electrical Engineering from IIT Kanpur. He worked as a postdoc at Oxford University and a research scientist at Amazon Alexa in Boston, Massachusetts. His research interests include human-machine learning, audio processing (speech recognition, music information retrieval and audio search) and AI for Computational Physics. He was awarded the PK Kelkar Fellowship in 2024.
Prof. Kaustuv Kanti Ganguli
Associate Professor, Zayed University, UAE
Abstract of the talk: In the rapidly evolving landscape of computational musicology, we stand at a fascinating crossroads where human perception intertwines with machine-driven analysis. This convergence offers unprecedented opportunities to unravel the complexities of musical structures, particularly in rich non-Eurogenetic traditions such as Indian art music. By harmonizing human cognition with artificial intelligence, we can decode the intricate artifacts of audio signal processing, revealing new dimensions in our understanding of music. This approach not only enhances our appreciation of musical nuances but also challenges us to rethink the boundaries between human creativity and computational analysis. As we navigate this confluence, we must consider the profound implications for music education, composition, and appreciation. How can we leverage machine learning to augment human musical intuition? What new insights into musical cognition can emerge from this synthesis? By exploring these questions, we open doors to innovative pedagogical tools, more nuanced music recommendation systems, and perhaps even new forms of musical expression. The future of music analysis lies not in choosing between human expertise and artificial intelligence but in orchestrating a symphony where both play in perfect harmony, each enhancing the other's strengths and compensating for limitations.
Biography of the Speaker: Dr. Kaustuv Kanti Ganguli is an Associate Professor of Artificial Intelligence at Zayed University and a Scholar at New York University Abu Dhabi Scholar, spearheading computational musicology and machine learning research. His innovative work bridges AI and music, focusing on Arabian Gulf and South Indian repertoires. Dr. Ganguli develops AI models that enhance music understanding, preservation, and education by combining engineering approaches with human cognition. Kaustuv’s interests include Raga/Makam characterization, multi-sensory perception, and crossmodal correspondence that collectively foster a deeper appreciation for diverse musical traditions through the lens of artificial intelligence.
Prof. Ashok Kumar Mandal
Assistant Professor, NIT Jamshedpur, India
Abstract of the talk: In this workshop, I will discuss some unique constructional features of Indian plucked stringed musical instruments. A large number of less inharmonic overtones and long sustained sounds with frequency and amplitude modulations are the distinctive characteristics of these instruments. Often, these sound characteristics are attributed to the finite bridge of the instrument. I will discuss three models of vibration which we have developed to understand the effect of finite bridges in plucked stringed instruments. In these models, we have considered the smooth wrapping and unwrapping of the string around the finite bridge. This phenomenon introduces nonlinearity into the system, resulting in the intermodal energy distribution. We will show that this energy gets distributed equally among the modes of vibration of the ideal as well as the real string. We will discuss the effect of finite bridges on better energy interaction with the top plate of the instrument. Finally, we will discuss a vibration model for the plucked string instruments with a bridge supported by a membrane.
Biography of the Speaker: Ashok Kumar Mandal is an Assistant Professor in Mechanical Engineering at NIT Jamshedpur. He earned his Ph.D. from IIT Kanpur, specializing in vibration modeling of sitar strings, an M.Tech. in Machine Design from IIEST Shibpur, and a B.Tech. from Jalpaiguri Government Engineering College.
His research focuses on the mechanics of Indian stringed musical instruments, including vibration analysis, coupled dynamics, and acoustic characteristics. He is currently guiding a Ph.D. thesis and has supervised two M.Tech. theses in this field. He has published extensively on nonlinear mechanics, vibrations, and control, with funded projects from AICTE, DST, and SERB. Dr. Mandal has also contributed to industrial and agrotech research.
Siddharth Bhardwaj
Co-founder and CTO, beatoven.ai
Abstract of the talk: Coming soon
Biography of the Speaker: Coming soon
Manaswi Mishra
MIT Media Lab, USA
Abstract of the talk: In our world of rapidly accelerating synthetic media, the outputs of generative AI often leave us feeling frustrated, amused, and even manipulated. Early examples of creative AI tools struggle to go beyond imitating styles and patterns, producing a context-less blend of borrowed aesthetics from the datasets they’re trained on. This race to the statistically average flattened aesthetic, misunderstands the core goals of creative expression. In contrast, audio developers and musical instrument builders (digital and physical) understand the importance of providing a toolkit of exploration, intentional serendipity and discovery to a new age of artists performing with AI. In this session, I will share lessons from AI Lutherie for live performances in Opera, Symphony, Indian Art-Music performances, improvisations and Installations to workshop guidelines for New Interfaces for ‘AI’ musical Interface design.
Biography of the Speaker: Manaswi Mishra is a Lego Pappert fellow and a current PhD candidate researcher in the Opera of the Future research group at the MIT Media Lab, Massachusetts Institute of Technology. His research explores strategies and frameworks for a new creative age of composing, performing and learning music using A.I. centered around bespoke human intent. His research on creating novel A.I. musical instruments and using A.I. to extend live musical culture can be seen in the development and performance of Operas like VALIS (2023, premiered in Boston), FLOW Symphony (2024, premiered in Seoul Arts Center, South Korea), Wellbeing of the World Symphony (2025, premiering in Slovenia), Dallas Symphony Orchestra (creation of the Johnson Education Center) and exhibitions across the world (IFA Stuttgart Germany, Burning Man ‘23, Kirkland Gallery Harvard, Live Code Boston, Algorave India) etc. In addition, his work on ethical A.I. Music performance and copyright law has been published and exhibited in the MIT Press, Harvard Tech Review, Washington Post, Boston Globe, Conferences of Computational Creativity, ISEA Brisbane, Copyright Society 2023, expert panels by US Copyright Office, Bloomberg Law etc. Manaswi is also active in organizing conferences, workshops and hackathons across India through Music Tech Community India, ADCx India, Algorave India etc.
Thomas Nuttall
Music Technology Group, UPF, Barcelona
Abstract of the talk:
Svaras in Carnatic Music are often performed with gamaka that can include oscillatory, sliding and other characteristic melodic movements. As a result, a svara is just as likely to involve pitch movement as it is to consist of a single static pitch. Furthermore, gamaka is likely to change depending on the context in which the svara is performed. In this talk, we discuss svara performance in Carnatic Music, some of the factors that might influence it, and the implications this has for melodic analysis task such as symbolic transcription.
Biography of the Speaker:
Thomas Nuttall is a PhD candidate in the Music Technology Group (MTG) at the Universitat Pompeu Fabra (UPF), Barcelona, developing computational tools to assist with musicological research. He dedicates most of his time to melodic analysis in Indian Art Music but has previously worked with Arab-Andalusian Music and on AI for automated rhythm generation. He is also a practicing musician involved in two Barcelona-based groups: (1) Mukti, alongside Jyoti Narang, combining North Indian folk singing with European dance/guitar traditions and (2) Awa pa los Calvos, an 8-piece band combining flamenco with funk/electronica. Prior to academia, he worked on machine learning teams at Channel 4 Television, Glovo, Typeform, and Crowdlinker, focussing on user-facing personalization and recommendation products.
Personal website - https://thomasgnuttall.github.io/about/
Github - https://github.com/thomasgnuttall/
Adithi Shankar
Music Technology Group, UPF, Barcelona
Abstract of the talk:
Music Source Separation (MSS) is the task of separating a piece of music into its corresponding sources. With the advent of Deep Learning, several models have been developed to address the challenge of MSS. These models are generally trained with multi-track datasets of the western genre like MUSDB18 or MedleyDB and do not generalise well for specific art forms like Carnatic Music. The datasets also lack sources like the violin and mridangam that you will traditionally see in Carnatic music. Hence, separating these art from specific sources is not possible with existing State of the Art Models. Multi-track datasets for Carnatic music like Saraga Carnatic, Saraga Audiovisual are recorded live hence the individual sources suffer bleeding from the other stems present in the performance. When trained with these datasets, models, alongside learning to separate the sources, also learn the inherent bleeding present in each source. As part of the talk, we will introduce a bleeding aware MSS system that can separate the singing voice while considerably mitigating the bleeding. We will also brief the challenges of separating the violin, the melodic accompaniment source and a training methodology that achieves better separation of the violin with less distortion.
Biography of the Speaker:
Adithi Shankar is a current doctoral student at Music Technology Group in UPF Barcelona. Adithi’s work is a reflection of her interest in music and science. She works mainly on building deep learning models for music source separation with a special focus on underrepresented music traditions. She is also investigating how video information, such as facial and body movements, can contribute to improving source separation techniques.
Gowriprasad R
IIT Madras, Chennai, India
Abstract of the talk: India's rich musical heritage is shaped by its two primary art music traditions—Hindustani Music (HM) and Carnatic Music (CM). This talk presents a comparative rhythm analysis of HM and CM, focusing on their primary percussion instruments, the tabla and mridangam. The talk will highlight the historical and cultural contexts of these instruments and their roles in accompaniment and solo performances. We discuss key Music Information Retrieval (MIR) research conducted on rhythm analysis in Indian Art Music (IAM), outlining existing methodologies, challenges, and gaps. The aim is to aid in understanding the tasks involved in rhythm analysis from a computational perspective and contribute to advancing rhythm analysis in IAM.
Biography of the Speaker: Gowriprasad R is a PhD student under Prof. Hema A. Murthy at IIT Madras. His research focuses on audio signal processing applied to Indian percussion rhythm analysis, with additional work in EEG signal analysis. He has been part of organizing and teaching teams for workshops on computational musicology and EEG processing. He has expertise in developing domain-appropriate signal processing and machine learning methods.
Contact
If you have any queries about the workshop, please contact
Prof. Suhail Rizvi
suhailr@bme.iith.ac.in
Ajay Srinivasamurthy
ajays.murthy@gmail.com