← Back to Blog
aimachine-learningagritechux

Human-in-the-Loop AI: 80% Cost Reduction with DBSCAN and Turker Review

Digvijay Solanki·February 18, 2026

There's a pattern that comes up repeatedly in applied ML projects: the algorithm is good enough to do 80% of the work automatically, but the remaining 20% requires human judgment. The mistake most teams make is treating this as a problem to solve with more training data and a better model. Sometimes the right answer is to design for human-in-the-loop from the start.

HT Mechanical Turking was built on exactly this insight.

The Problem: Land Polygon Generation at Scale

For Hello Tractor's pricing engine, we needed to know which land areas had been worked by which tractors. This fed into calculations for booking costs, tractor utilisation rates, and farmer cost estimates.

Manually drawing land polygons on a map for thousands of farm locations across Africa was the original approach. It required skilled operators, was expensive, and didn't scale.

We had rich data to work with: GPS timeseries from tractors — high-frequency location pings captured during operation. The question was whether we could use that data to automatically construct polygons representing the land areas worked.

The Pipeline: GPS → DBSCAN → Polygons

We built a Python backend that processed tractor GPS data through the following pipeline:

1. Data cleaning Raw GPS data is noisy — stationary pings while the driver takes a break, anomalous readings from poor satellite lock, duplicate points. We filtered by velocity thresholds and deduplicated within a time window before passing data to the clustering stage.

2. DBSCAN clustering DBSCAN (Density-Based Spatial Clustering of Applications with Noise) was a natural fit for this problem. Unlike k-means, it doesn't require specifying the number of clusters upfront, handles irregular shapes well, and explicitly labels outlier points as noise — which maps perfectly to GPS noise.

Each cluster of GPS points represented a contiguous area of tractor activity. The parameters (epsilon — neighbourhood radius, min_samples — minimum points to form a cluster) required tuning per region, since farm sizes and tractor speed patterns varied by geography.

3. Polygon construction From each cluster, we constructed a polygon using a concave hull algorithm. The resulting polygon represented the land area worked in that session, with a configurable level of detail (more vertices = tighter boundary = more compute).

4. Pricing calculation Polygon area (in hectares) fed directly into the pricing engine. A tractor that worked a 2.5 hectare plot has a different cost basis than one that worked 0.8 hectares.

The Problem with Pure Automation

The pipeline worked well — most of the time. But there were failure modes:

  • Fragmented clusters: a tractor that paused at a field boundary for 20 minutes would sometimes produce two clusters instead of one, underestimating the worked area
  • Noise contamination: GPS drift on slow-moving tractors occasionally pulled polygon boundaries into adjacent areas
  • Edge fields: narrow or irregularly shaped fields sometimes produced polygons that extended beyond the actual farmland

Feeding unchecked polygons into the pricing engine would create billing errors — either overcharging or undercharging farmers. We needed a review step.

The Turker Interface

Rather than building a generic review tool, we designed the turker interface around the specific failure modes of the pipeline.

Each review task showed:

  • The generated polygon overlaid on a satellite map of the farm area
  • The source GPS track underneath the polygon
  • The key metrics: area, cluster count, confidence score from the pipeline

Turkers had three actions:

  1. Accept — polygon is correct, send to pricing
  2. Adjust — drag polygon vertices to correct the boundary
  3. Flag — something is wrong that requires expert review (rare)

After accepting or adjusting, turkers rated the algorithm's accuracy on a 1–5 scale. This rating data fed back into the pipeline's confidence scoring model, allowing us to progressively route only low-confidence polygons to human review as the model improved.

The Architecture

The system had two cleanly separated parts:

Data pipeline (Python, AWS Lambda):

  • GPS ingestion and cleaning
  • DBSCAN clustering
  • Polygon construction
  • Confidence score generation
  • Storage in PostgreSQL with PostGIS for geospatial queries

Review interface (React Native + Node.js + PostgreSQL):

  • Task queue: polygons sorted by confidence score (lowest first)
  • Map-based polygon editor with vertex dragging
  • Rating and feedback capture
  • Routing logic: high-confidence polygons bypass review entirely

Keeping these concerns separate was important. The pipeline team could iterate on clustering parameters and confidence scoring without touching the UI, and the product team could improve the review UX without understanding the ML pipeline.

The Result

Before HT Mechanical Turking, manual polygon drawing cost approximately X hours of skilled operator time per polygon. After:

  • ~70% of polygons passed automatically (confidence score above threshold) — zero human time required
  • ~25% of polygons went to turker review — typically 30–90 seconds per task
  • ~5% of polygons were flagged for expert review

Overall operational cost reduction: 80%. And because turker feedback continuously improved the model's confidence scoring, the automatic pass rate improved over time.

Closing Thoughts

The lesson here isn't "use DBSCAN" or "build a turker interface." The lesson is that the right architecture for AI systems often isn't end-to-end automation — it's a smart division of labour between the algorithm and the human, with a feedback loop connecting the two.

Design for automation where the confidence is high. Design for human review where it isn't. And make the feedback from human review improve the algorithm over time.

If you're building human-in-the-loop AI workflows or data pipelines with a review layer, reach out — it's a design problem as much as an engineering one.