Anastasia 🦄 Profile picture
Sep 12, 2020 19 tweets 4 min read Read on X
#Apple filed a new patent application called “FILE FORMAT FOR SPATIAL AUDIO” (US Patent Application 20200288258). Here is a bit of my breakdown (while holding my thoughts on its innovative merits 🙃) #SpatialAudio (thread)
As noted earlier, Apple doesn’t want any part of the current nomenclature, and is referring to the entire gamut of various “Realities” as SR (Simulated Reality)
In general terms, the application describes what is essentially a new flavor of an object-based audio format, and how this format can be stored, edited, and reproduced.

There will be a sound asset library, where each sound is comprised of the actual SOUND DATA plus METADATA.
SOUND DATA can be mono, multi-channel, another spatial audio format like ambisonics or “synthesized audio data for producing one of more sounds”.
I found this last point especially interesting, since that leaves the door open for real-time synthesis - what in my opinion is the biggest breakthrough required in the audio industry.
METADATA can contain some of the following:
- attenuation (distance and listener-angled based)

- directionality (from more common directionality patterns to totally arbitrary)

- transformation controls for modifying sound properties (RTPCs for my game audio peeps)
Metadata can describe the entire sound, but also each discreet sound channel - this is useful for virtually reproducing something like a 5.1 setup, where the audio file is a single multi-channel file, but each channel is positioned on its own virtual speaker in space.
Metadata can also describe the sound at the time of recording: position, rotation, SPL and distance at which this SPL was measured, size and/or shape of the sound using a polygonal mesh or volumetric size, and the microphone configuration at the time of sound capture.
OK, this is where we’re getting into interesting stuff…
Why would Apple want the SPL data?
The holy grail of putting virtual objects into the real world is to make them indistinguishable from real objects in the way they look, sound, and behave (see the recent addition of reflection to ARKit).
This also means a virtual person needs to be no louder or quieter than a real person standing right next to them. This is quite a hard problem to solve, because never before audio needed to precisely match real-world SPL levels.
But storing this information with the metadata is a core building block to the solution. So if I record a sound that was 10ft away, and it measured 60db SPL, an algorithm could estimate how the same sound would appear at 1ft or 100ft.
The other interesting thing is the volumetric information. Since the dawn of time… (ok maybe not for that long), but modern spatial audio engines treat every mono sound as point-source sounds in terms of reproducing the actual direct sound source.
I’m not talking about attenuation, but about defining the actual position of the sound in space. This might be wishful thinking on my part, but if Apple cracked the nut of accurately reproducing a volumetric sound source… well - I’d love to hear it.
Then the application proceeds to describe an authoring environment, where the sound data and metadata can be further manipulated and saved back into the library, and finally how this information is used in the application playback. What stood our for me here is this bit:
An application can write back usage information back into the sound library therefore “a historical record of use of the audio asset in any one or more SR applications is maintained [...] for a developer to know where the sound of the audio asset was previously used”.
Hard to tell what's the exact play here, but it almost looks like it’s opening the door for some type of “asset store” or “stock library” type setup, where audio asset creators can track and monetize usage.
To conclude: I’m not surprised that Apple is rolling their own object-based audio format. It certainly saves them the cost of license of something like Dolby Atmos or MPEG-H, and gives them control to tailor the format to their ecosystem and use cases.
This is all good and fine, but what I would REALLY like to see is a move to properly integrate audio into visual 3D formats (and no, a small handful of audio properties in USD ain’t it). (end thread)

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Anastasia 🦄

Anastasia 🦄 Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @AnastasiaDevana

Jun 9, 2020
Ever wonder what the heck has Russia been up to?
Here's a brief update on the situation in Russia that nobody asked for. (thread)
In a few weeks Russians will be asked to vote on a bundle of constitutional amendments (the exact number is unclear, but seems to be around 200). 2/11
There will be only one question on the ballot: do you approve of these changes to the constitution, yes or no?
3/11
Read 11 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(