Celeste's dialogue design is the #1 most-asked-about topic when it comes to the game's sound. I figured I'd share some of how we went about creating it.
First, we explored some simple synth sounds to figure out a general tone for a given character's voice. Once we had a foundation timbre established, we moved on to setting up how that tone might change over time.
Essentially, using a Parametric EQ in FL Studio, we modelled what are called "formants" - that is, naturally occurring spectral peaks in human vowel sounds. These spectral peaks have specific frequency positions and relationships, and they looked something like this:
We then automated the frequency positions of those peaks over time, to resemble the way the human voice might transition between vowel sounds.
Then, we broke down the emotional range of a given character, and figured out what the sonic characteristics of those emotions might be.
Is the emotion generally high or low in pitch? Does the pitch stay in one place, or move around a lot? Is the speech slow and careful, or is it rapid and pointed? What kind of pitch movement should we hear, and what kinds of sentences are these sounds representing?
This was basically a lot of me reading the script and "over acting" the dialogue.
We then pressed PLAY in FL Studio (to set the formant automation in motion), and then "performed" the emotions on a midi piano keyboard while messing with the pitch wheel to try to hit all those characteristics we'd laid out above.
We recorded the audio output of that performance on a separate computer, and then went through and picked out the "good takes". These good takes were sorted into 3 basic categories per emotion:
Using @fmodstudio, we set up an emotion-driven (actually character portrait-driven) event system... We had one audio event per character, and each dialogue event plays non-stop for the duration of a given character's conversation in-game.
We loop on silence until it's that character's turn to speak. When that character speaks, the code-side sends the current character portrait info to FMOD, and FMOD responds by sending the playhead to the appropriate emotion on the event timeline.
When we arrive at that emotion, we progress through a random sequence of syllables via a transition system we built. FMOD doesn't have a great way of handling this degree of specificity when moving through a random sequence of audio files, so making this was pretty tedious.
It necessitated manually placing every syllable on the timeline, and setting up a "transition hub" that would rapidly send the playhead out to these various syllables. No syllable was allowed to play twice in a row, and emphasized syllables had a lower probability of playing.
The hub also made it trivial to fine-tune the spacing between syllables (as we could move entire hub's position rather than forty separate transition markers).
Anyway, once the text-draw ends in the game's UI, FMOD is notified that speech should conclude. The playhead then returns to silently waiting for the next portrait/emotion. (It either plays an end syllable before returning, or it returns immediately if on an emphasized syllable).
That's pretty much it! Hope you learned something about #gameaudio!
If you'd like to see a detailed breakdown of how the dialogue event works, you can watch it here: twitch.tv/powerupaudio/v…
You can also download and check out the complete @celeste_game FMOD project (along with all source sound & music - thanks @kuraine!) available on the fmod website.
Spoiler alert, of course. If you haven’t played yet, you might want to turn back now. D:
Seriously. Very, very heavy spoilers.
One main goal of Tunic’s overall design was to make it feel like this world isn’t *for* you. You can’t read the language. There is no clear path. You are not the hero.
You are small and lost in a land full of mysteries - many of which are hidden in plain sight.