ClearRound: Learning Computer Vision Through Showjumping Videos

I film a lot of my daughter’s showjumping rounds. Training sessions. Competitions. Hours of footage. She occasionally shares clips on social media, which means manual editing—finding the jumps, trimming out everything between fences.

But I’d been looking at Sleip, a Swedish app that does computer vision analysis of horse biomechanics. From video. From afar. No onboard sensors needed. Just video analysis that extracts meaningful insights about how a horse is moving.

If Sleip could do that, why couldn’t I build something similar for jump detection? Detect the jumps, trim the footage, understand the biomechanics from the video itself.

That question pulled me into building ClearRound on 16 March 2026. The surface problem was editing friction. The real problem was: can I make computer vision work the way Sleip does?

This wasn’t a product design. It was a learning project inspired by what Sleip had proven was possible.

The Obvious Approach

The straightforward assumption: detect a jump by looking for vertical movement. Horse and rider at the bottom of the frame, then at the top, then back to the bottom. That’s a jump, right? Detect that pattern and you’re done.

Except it fails immediately.

A horse trotting away from the camera—bottom to top of frame. A horse cantering toward the camera—top to bottom. A horse walking across the arena—moving vertically through frame. None of those are jumps. But if your only signal is “vertical movement,” they all look the same.

So you need to look at shape. Body position. The specific configuration of the horse and rider during flight versus during approach or landing. That’s where it gets hard.

The Reality of Body Shape Detection

The Vision framework can detect objects. It can track motion. But detecting the subtle body shape differences that distinguish a jump from a trot? That’s genuinely difficult.

A horse’s body in different gaits—walk, trot, canter, jump—involves different postures, different muscle engagement, different positioning of the rider. These differences matter enormously to a human eye. They’re the difference between “that was a good jump” and “that was sloppy.”

To a computer vision model, they’re subtle variations in silhouette and position across video frames. The model needs thousands of examples to learn the difference. And even then, it’s not robust. Change the lighting, the angle, the horse’s breed, the rider’s position—suddenly the model is unreliable.

I built this anyway. Detected jumps by analyzing body shape, trajectory, and motion patterns. It works most of the time. Quick and easy. Not perfect.

The Architecture

ClearRound started as clipMyRound, then clipMyHorseVideo. The names reflected the expanding scope. Generic clip tool → horses → the specific problem (jumps).

The Vision framework for object and motion tracking. AVFoundation for video processing and trimming. SwiftUI for the interface. The hardest part wasn’t the computer vision. It was the video processing—precisely trimming each detected jump, maintaining audio sync, rendering the result without losing quality.

As a learning project, it forced me to understand two things: how fragile the detection is, and where the real signal should come from.

What Sleip Does

Sleip works. It watches horses from afar and extracts biomechanical insights from video. No sensors. No special equipment. Just video analysis.

That’s the goal. That’s what proved it was possible. But getting there is harder than it looks.

Sleip has years of training data, sophisticated models, teams of horse biomechanics experts. I have a GitHub repo and a learning project. The theory is the same—video contains enough information to extract meaningful insights about movement. The execution is… still in progress.

I have elevation data from the Apple Watch, which creates an unambiguous signal for jumps. I have video that contains the visual signature of a jump. In theory, I could combine them, use the Watch data to train the model, and eventually have a vision-only approach that works the way Sleip does.

I haven’t gotten there yet. The computer vision approach I built works sometimes. Quick and easy. But not robust. Not the way Sleip makes it look simple.

How She Actually Uses It

She films rounds. I import them into ClearRound. It detects jumps, often correctly, sometimes not. She gets edits quickly—good enough for social media, good enough to save iCloud storage. It’s not a product I’d ship to others. It’s a tool that works well enough for this specific use case because I built it knowing the limitations.

That matters. I know when it’s going to fail. I know when to trust it and when to manually review. A polished product would need to hide those limitations. This doesn’t need to.

The Real Output

ClearRound isn’t a polished product. It’s an ongoing learning project inspired by Sleip.

I learned what computer vision struggles with (robust body shape detection, handling variation across conditions, generalizing across different horses and camera angles).

I learned what it’s good at (detecting objects, tracking motion, working with consistent data).

I learned that Sleip makes it look deceptively simple. That “just analyze video” is hiding enormous complexity in training data, model sophistication, and real-world testing.

I still believe the approach should work. Video contains the information you need. The Watch provides ground truth to train models. But executing it is proving harder than watching Sleip do it makes it seem.

The Timeline

ClearRound started on 16 March 2026. Two months after FuelFinder. Three weeks after SeeFood. In that span, I’d built an app that solved a real driver problem (fuel prices), a parody that AI made feasible (hotdog or not hotdog), and an experiment that’s still teaching me what computer vision can and can’t do.

The common thread isn’t the technology. It’s curiosity meeting friction. FuelFinder came from overpaying for petrol. SeeFood came from rewatching a TV show. ClearRound came from asking: if Sleip can do it, why can’t I?

The answer turned out to be: it’s possible, but harder than it looks. Sleip proved the approach works. I’m still figuring out how to make it work for jump detection. That’s the real learning.

Quick and easy editing works for now. Someday the computer vision will work the way Sleip does. That’s the project.