In communities across New York State, some of the world’s brightest minds are pursuing groundbreaking and life-saving discoveries in partnership with research and innovation centers backed by NYSTAR, Empire State Development’s Division of Science, Technology, and Innovation. NYSTAR’s Profiles in Research series will share the stories of incredible researchers who are creating positive change and fueling technology-led economic growth statewide.
As a child growing up in China, Zhiyao Duan loved music. He studied the violin and later picked up the euphonium, a brass instrument, as a college student majoring in automation.
That’s when he started thinking about artificial intelligence and its potential applications in music. He became interested in polyphonic music transcription — which, in simple terms, involves a computer program turning a composition with multiple melodies into written sheet music.
Armed with a master’s degree in automation and a deep curiosity to merge his love of music with his academic pursuits, Duan came to the United States and earned his doctorate at Northwestern University in electrical engineering and computer science. Soon after arriving at the University of Rochester as an assistant professor, however, he discovered one of the realities of American academic research and found it necessary to make a career pivot.
“In the U.S. it’s very difficult to get funding to support music research because the [federal] government is very pragmatic, I would say. Music is not considered economically impactful,” Duan said. “I had to expand my research area into other types of audio, which is very doable because music and speech, they are just different forms of sound.”
Duan pivoted into an area he calls “computer audition,” which involves speech and general sounds.
“In music we work on music-source separation, which is trying to separate the music sources from the polyphonic music mixture,” Duan explained. “In speech, there’s a problem called speech enhancement, which is separating the speech signal from the background noise. It’s very similar to music-source separation.”
Then the work took another interesting turn. While Duan was building his research in computer audition at the University of Rochester, a businessman named Peter Soufleris was building a company in voice biometrics in Philadelphia.
A graduate of the University of Rochester, Soufleris is CEO and Founder of Voice Biometrics Group, a company that specializes in speaker verification systems. The technology creates a voice print that gives Voice Biometrics’ clients new ways to verify people’s identities; imagine being identified as you speak with your bank’s call-center agent without being asked any security questions or unlocking your apartment door by speaking into your mobile phone.
As his company looked to become more competitive in a growing industry, Soufleris first explored a collaboration with Temple University. Then he approached his alma mater for help, starting with summer internships and a few advanced research projects with Duan’s doctoral students.
Then tragedy struck Voice Biometrics Group with the sudden death of its senior engineer from brain cancer. Soufleris turned again to Duan and the University of Rochester. He needed creative solutions to move the company forward and build out its next generation of technology.
“We created somewhat of a sponsorship for a couple of students,” Soufleris said. “We had all the industry experience of running a company for almost a dozen years. We would be able to give them very specific guidance on what was needed in the market and design products for them to build. We could put them into our environment, test them with real-world data and give them real-world feedback.”
As the relationship between the company and the university deepened, along came a grant opportunity through the Center of Excellence in Data Science, a NYSTAR-backed center at the University of Rochester.
To apply for the grant, Voice Biometrics Group opened a branch in Rochester in 2021. As the project continued, it got Soufleris, who grew up in Syracuse, thinking even bigger. Maybe it was time to start an entirely new business based in New York State.
Through the process of working with the Center of Excellence and NYSTAR, Soufleris created a new biometrics company called IngenID (a combination of the words “ingenious” and “identification”). The plan is to establish headquarters — and make an impact — in the greater Rochester area. IngenID will focus on voice biometrics initially, but is also preparing to offer other biometric and identification technologies within its platform.
“The whole Center of Excellence and NYSTAR process was the inflection point for this new company,” Soufleris said. “As part of that grant I committed to [physically] being in Rochester two weeks each month. I thought if this works out, I’ll plant a stake here. And quickly I saw that with the right time and energy, I could make this its own thing.”
Although in operation less than a year, IngenID has several clients and is currently on-boarding several more. They recently hired a chief technology officer in Rochester to lead development of the company’s next generation platform and hire additional developers for the team.
“I’m an upstate New York guy. I want to make an impact on the local community and be part the University of Rochester community,” Soufleris said. “What we want to do is build a new delivery platform that specifically houses the University of Rochester technology, hire people in the greater Rochester area and contribute to the New York economy.”
While Voice Biometric Group will continue to operate separately, IngenID will be the focal point for all new development and for commercializing the innovations borne from the UR collaboration. All the new technology starts with fundamental research in Duan’s lab, where, for example, his team is working on anti-spoofing techniques and advanced voice biometrics, such as detecting the emotion of a person as they’re speaking — a game-changing technology for call centers.
For Duan, computer audition and speech processing systems provide a rich, relevant, and fascinating area of research. And he still keeps active with his first love: music, and research in music information retrieval.
In 2024 he will become president of the International Society for Music Information Retrieval. He also is part of the company Mango Future Education Technology, which developed the app Violy. The app can listen to a music learner’s instrumental performance and provide feedback about intonation, rhythmic and tempo accuracy.
“The actual content of music information retrieval is broader than what the name suggests,” Duan said. “When people hear the word retrieval maybe they think about searching a song from a database, but the research topics in this area are much wider. Just to mention a few: It includes the analysis of music signals along different musical aspects, based on which various kinds of matching or retrieval of music can be performed. It also includes generating new music from that analysis, such as creating new pieces from scratch or filling in gaps in existing compositions.
“Another area is music performance generation. When you have the score how do you generate audio that can perform the score in a realistic way, not in a robotic way? We try to design computer algorithms to mimic how humans interpret the score and play it expressively. There are really many fascinating areas of research.”
To learn more about Duan and his research, visit his University of Rochester faculty page.