Lu Ling Profile picture
@NVIDIA research intern丨PhD @PurdueCS丨#AI 丨#ComputerVision丨Agentic AI丨4D/3D GenAI丨 Multimodals
Dec 18 6 tweets 4 min read
Do we really need massive curated 3D scene data for interactive world generation?

#SAM3D, #WorldGen say yes.
We say no.

I-Scene learns better spatial knowlesge using only 25K randomly composed instances.

🔑 Key insight:
We reprogram the instance generator to infer support, proximity, and symmetry from purely geometric cues for generating interactive scenes.

🧠 Scene-context attention
👁️ View-centric space
🧱 Random composition beats expensive curation

🌐 luling06.github.io/I-Scene-projec…
💻 github.com/LuLing06/I-Sce…

🧵 Details below [1/6] 3D instance models such as #TRELLIS already generate coherent scenes as a single geometry.

We leverage this strong geometric prior to guide instance generation into full scene layout.

🔧 Scene Context Attention:
Just concatenate scene-branch KV with instance KV.

✨ Why this works:
• Minimal change to the base model
• Preserves geometry priors + learn the spatial layout

An elegant attention operation that unlocks spatial positioning.
🧵 [2/6]