I needed an excuse to stay up late tonight, so I decided to finally try my hand at integrating Video Depth Anything with Sports2D and Unity. It just so happens that Claude 4 is now available; seeing it listed in Cursor was the extra confidence boost I needed to just vibe again.

Isn’t it beautful? :’)

Since I’m in a dark space and can’t turn on the lights (sharing a hotel room with my mom), I’ll have to delay testing this out until tomorrow. Firstly, I’ll need to ensure this new VDAL submodule correctly utilizes the conda environment I created last week. Then, if this actually works, I’ll implement frame rate synchronization between the packages. Claude already planned that out for me, but I want to make sure the “basics” work before enhancing things further. Gn.


Some thoughts as I stare out the window on this airplane to Palm Springs.

NOTE

  1. Using depth estimation to move the body skeletons in Z-space will likely cause some noticeable issues when the webcam overlay is enabled. Since Sports2D does a great job at dynamically adjusting bone length without depth, I should actually leave all the bodies on a fixed plane to keep consistency with the webcam.
  2. Where does depth get used then? –> Controlling the radius of the energy ball. It will probably be far less jarring visually to use depth estimation to manipulate the energy ball size than it would be to manipulate its position.
  3. A pitfall with this implementation would be the lack of Z movement of the particles. Part of what makes this simulation so realistic is the realistic physics in 3D space. To account for this, I might need to abandon this whole idea. Welp.
  4. In a work project last year, I used a custom full-screen render pass to draw a bounding box around a character. Calculating the outer edges of the character required reading pixel color values on the CPU, which led to a small yet noticeable single frame delay in my calculations. If I were to draw a bounding box around my skeletons, it would require far less processing, as I’d only need to iterate through the joint positions.
  5. The reason for using a bounding box is this: if I were to have two copies of each tracked skeleton, I could have one with a fixed Z depth that perfectly matches the camera overlay, and another with depth applied to it via VDAL input. I could then draw a bounding box around each skeleton to compare their sizes, and use the difference in their sizes as a ratio to calculate depth changes. This could then get passed onto the energy ball to manipulate its position in Z space in what is hopefully a cleaner manner.

Tags: unity gamedev vfx ai 3d depth physics