Building a voice assistant interface for audio-based LLMs

Jim Schwoebel, Jin Xu, Nathan Schley


Team members:
@Jin Xu, @Nathan Schley
Jim Schwoebel

The emergence of large language models, such as OpenAI's GPT-3, has revolutionized natural language processing tasks, enabling various applications in text generation and understanding. One area where these models have garnered significant attention is text-to-audio conversion, where they serve as interfaces to convert written text into high-quality synthesized speech. However, this novel technology also brings along a unique set of challenges including:

Text-to-audio interfaces often struggle to capture subtle vocal cues, intonations, and emotions present in the original text, resulting in monotonous or robotic-sounding output that lacks the desired level of authenticity.
Large language models can occasionally introduce errors or inaccuracies when transforming text into speech, leading to misp

Screen Shot 2022-06-03 at 11.31.35 AM.png
github URL
