#

From pixels to action: how we developed the brain of an autonomous rover with embedded computer vision

Our team faced the challenge of developing the complete brain for an autonomous rover, a system that required visual perception and real-time decision-making on low-power hardware. This case study is a technical deep dive into our process: from the implementation of the Mask R-CNN segmentation model for accurate object identification, to the acceleration of performance on a Raspberry Pi using a Coral TPU accelerator. Discover how we overcame the challenges of model optimization and quantization in TensorFlow to deliver a robust and efficient Edge AI solution for intelligent automation.

The problem: robotic precision at the micro level

The client presented us with a clear scenario: a rover had to move autonomously along an installation with thousands of cultivation units arranged in a grid. Our responsibility was to develop the rover’s brain, which had to be capable of:

Accurately identifying each individual compartment in the camera’s field of view.
Visually analyzing the content of each compartment to make a decision.
Sending commands to the robot’s actuators to perform a specific task, in this case, dispensing a product.
Doing all of this in real-time, while the rover was moving, and with low-power hardware to maximize autonomy.

This was a classic computer vision and robotics problem, where speed and efficiency on an embedded device were just as important as the accuracy of the AI model.

Our Solution Architecture: A Modular Approach on Raspberry Pi

We decided to base our solution on a Raspberry Pi, a versatile and cost-effective platform for prototyping and deployment. However, to equip it with the necessary intelligence, we designed a modular software and hardware pipeline focused on vision.

The workflow we proposed was as follows: the rover’s camera captures an image of a section of the tray; our software identifies each compartment; extracts each one as a sub-image; a second model classifies the state of each compartment; and finally, orders are sent to the rover’s motors.

Phase 1: visual perception — precise segmentation with Mask R-CNN

The first and most critical step was teaching the rover to “see” and delimit the objects of interest. It wasn’t enough to know that a compartment was there; we needed its exact contours to isolate its visual content from the rest of the image.

For this task, we chose Mask R-CNN (Region-based Convolutional Neural Network). Unlike other object detectors that only draw a rectangle (bounding box), Mask R-CNN offered us two key outputs:

Bounding box: A box that framed each detected compartment.
Segmentation mask: A pixel-level mask that delineated the exact shape of the compartment. This was the key to our strategy, as it allowed us to completely ignore the background and the tray structure, focusing the analysis only on the area of interest.

We trained this model using a dataset of images that our team carefully annotated. The result was a robust vision system capable of identifying and isolating each compartment with high fidelity, even under variations in lighting and perspective.

Phase 2: Preparing the ground for classification — anchors and data extraction

With the compartments already segmented, the next logical step was to prepare this data for the classification model. To optimize this process, we worked on the anchors calculation.

The
anchors
are reference boxes that detection models use to predict the location and size of objects. By adjusting and optimizing these anchors to the specific dimensions of the compartments, we improved the efficiency and accuracy of the detection. This refinement allowed us to create a highly efficient data extraction pipeline:

The rover captures an imagen.
Mask R-CNN, with its optimized anchors, generates the masks and bounding boxes in milliseconds.
Our software uses these coordinates to crop and normalize an individual image for each compartment.
These cropped images become the standardized input for the next module of the AI system.

This modular design ensured that, even though the classification model was still under development, the foundation of visual perception was already solid and ready for integration.

Phase 3: The hardware challenge — implementing Edge AI with the coral TPU accelerator

Running a model like Mask R-CNN on a Raspberry Pi in real-time is, to be direct, unfeasible if relying solely on its CPU. To overcome this computational barrier, we integrated a Coral TPU accelerator.

This small but powerful chip, designed by Google, is optimized for running machine learning model inferences. By connecting it to the Raspberry Pi, we transformed our platform:

High speed inference: The processing time per image was drastically reduced, going from seconds to mere milliseconds. This allowed the rover to operate smoothly and without pauses, meeting the real-time requirement.
Low power consumption: The TPU performs these calculations with much higher energy efficiency than a CPU, a decisive factor in maximizing the rover’s battery life during long work shifts.

The integration of the Coral TPU was a fundamental pillar of our design, demonstrating our ability to deploy advanced AI solutions in resource-constrained environments (Edge AI).

Phase 4: The final optimization — training and quantization with TensorFlow

Hardware alone is not the complete solution. To squeeze the maximum performance out of the Coral TPU, model optimization is essential. Our workflow for this was based on TensorFlow:

Cloud training: First, we trained our Mask R-CNN model on high-performance computing platforms to achieve the maximum possible accuracy.
Quantization: Once validated, we applied a technique called post-training quantization. This process converts the model weights from 32-bit floating-point numbers to 8-bit integers. The benefits are enormous for an embedded system:
- Up to 4 times smaller model: Facilitates storage and loading on the Raspberry Pi.
- Much faster inference: The Coral TPU is specifically designed to execute 8-bit integer operations at breakneck speed.
- Lower memory and power consumption: A lighter and more efficient model reduces the overall system load.

This quantization process was the final touch that allowed us to deploy a state-of-the-art vision model on modest hardware, without sacrificing the operational speed required for the project.

Conclusion: A model for intelligent automation

This project is a testament to how the integration of cutting-edge software and hardware can solve complex industrial automation problems. By combining an advanced segmentation model like Mask R-CNN with the power of hardware acceleration from the Coral TPU and optimization techniques like quantization, we developed a robust, fast, and efficient robotic brain.

The approach we followed —precise perception, data extraction, and optimization for edge hardware— is a versatile blueprint that our team can apply to a wide range of challenges, from quality control on production lines to automated logistics. We demonstrated that artificial intelligence does not have to live in the cloud; we can bring it to the field, where the action happens, creating autonomous and truly intelligent solutions.

KNOWLEDGE / Downloadables

Free eBook
OEE Efficiency

We assist your data design and analysis process

Discover the details of the OEE indicator, how to automate its calculation and the requirements your production processes must meet to implement it.

From pixels to action: how we developed the brain of an autonomous rover with embedded computer vision

The problem: robotic precision at the micro level

Our Solution Architecture: A Modular Approach on Raspberry Pi

Phase 1: visual perception — precise segmentation with Mask R-CNN

Phase 2: Preparing the ground for classification — anchors and data extraction

Phase 3: The hardware challenge — implementing Edge AI with the coral TPU accelerator

Phase 4: The final optimization — training and quantization with TensorFlow

Conclusion: A model for intelligent automation

Free eBook
OEE Efficiency

Contact
Sixphere

Schedule a meeting:

Schedule a meeting

Follow us:

Projects

Blogs / Jesús María Jurado

From pixels to action: how we developed the brain of an autonomous rover with embedded computer vision

The problem: robotic precision at the micro level

Our Solution Architecture: A Modular Approach on Raspberry Pi

Phase 1: visual perception — precise segmentation with Mask R-CNN

Phase 2: Preparing the ground for classification — anchors and data extraction

Phase 3: The hardware challenge — implementing Edge AI with the coral TPU accelerator

Phase 4: The final optimization — training and quantization with TensorFlow

Conclusion: A model for intelligent automation

Free eBook OEE Efficiency

Free eBook
OEE Efficiency