The next logical step for me to take in my creative pursuit is to try to pair Sports2D’s realtime webcam pose tracking with a realtime webcam depth tracking model in order to add Z-data to my Sports2D body tracking data. There seems to only be one existing option for this: MiDaS. I gave it a try, and once I got my GPU working, was getting subpar depth data at 40 FPS, and decent depth data at 10 FPS.
Here’s a recording of the medium quality model:
Compare this to the lower quality model:
It’s an exchange of precision for speed. Something I’ve noticed is that the depth map colors calibrate based on the closest estimated point, not from the origin. Therefore, movement poses an issue if the object closest to the camera keeps changes.
A hacky solution to this would be placing a fixed object near the camera view that’s always closer than any other person or object to the camera. This is not ideal, as I’d rather not have to obscure my camera feed, but it certainly is easily testable.
…and this is why we test things. The model clearly has issues distinguishing what is closer to the camera once I move to the side of the close object. This issue persists even when I’m further away from the camera. My hypothesis is that this model has been overtrained to recognize humans. Perhaps placing a more common object (than the top of an eyeglasses case) in the camera view will yield better results. I’ll have to test that soon, but for now, I want to see if Gemini can help me adapt this repo, which is already built for processing arbitrarily-long videos, to work with my webcam. The good news is that a forkalready exists that does this, and more. The bad news is that the fork is several commits behind the current repo. This will be a good opportunity to cherry pick and improve my overall git skills.
I’m glad I didn’t bother testing the MiDaS library anymore. Video-Depth-Anything(-Live) is SO much better.
It clearly has some sort of built-in interpolation because the shifts are more gradual. Even better, it clearly runs at a high enough FPS, which seemingly better quality too. This is without even integrating any of the newer commits from the original repo. Fortunately, most of this fork’s commits are modifications to the README. I feel that.
The larger model is still quicker than the smallest model in MiDaS. Impressively, it even picks up the feather dangling from my hat. Better details aside, the depth interpolation between frames performs about the same as the small model, so I’ll probably stick with that one.
Tags: gamedev ai depth realtime performance webcam vdal midas