Vision Processing

Let's get our robot to recognize certain objects and their relative position.

What is OpenCV?

OpenCV began as an Intel research project in 1999, aiming to accelerate real-time computer vision tasks. Its 2002 open-source release fostered a large developer community, leading to its growth into a powerful and versatile library for various computer vision applications across fields like robotics, self-driving cars, and medical imaging.

OpenCV's success as a powerful yet general-purpose library sparked the development of more specialized solutions. EasyOpenCV (EOCV) emerged within the FTC (FIRST Tech Challenge) robotics community. It leverages OpenCV's core functionalities but simplifies them and adds robot-specific features for easier integration with FTC robots. This allows FTC teams to focus on their robot's vision-based tasks without needing in-depth computer vision expertise.

Basics on our Structure

Vocabulary

  • Tensor: A multi-dimensional array used to represent data in machine learning models. Tensors are the fundamental data structures in deep learning frameworks like TensorFlow and PyTorch.

  • Inference: The process of running a trained machine learning model on new input data to obtain predictions or outputs.

  • Allocate Tensors: The process of allocating memory for tensors in the model's input and output layers. This is typically done before running the model's inference on input data.

  • Output Tensor: The tensor(s) produced by the machine learning model after running inference on the input data. The output tensor(s) contain the model's predictions or outputs, such as object bounding boxes, class probabilities, or segmentation masks.

  • Non-Maximum Suppression (NMS): A post-processing technique used in object detection to eliminate redundant overlapping bounding boxes for the same object. NMS helps to ensure that only the most confident and accurate bounding box is retained for each object.

Pseudocode

This is rough AI-generated code that has served our class well as a springboard conversation.

// Import necessary libraries
import org.opencv.core.Mat;
import org.opencv.core.Rect;
import org.opencv.core.Point;
import org.opencv.imgproc.Imgproc;
import org.opencv.core.Core;
import org.opencv.core.Scalar;
import org.tensorflow.lite.Interpreter;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

// Define a class for the ObjectDetectionPipeline
public class ObjectDetectionPipeline extends EasyOpenCvPipeline {

    // Declare variables for model and labels
    private Interpreter tfliteInterpreter;
    private String[] labels;

    // Constructor to load the TensorFlow Lite model and labels
    public ObjectDetectionPipeline(String modelPath, String labelsPath) throws IOException {
        // Load TensorFlow Lite model from file
        tfliteInterpreter = new Interpreter(FileUtil.loadEssentiaModel(modelPath));

        // Load labels from file (assuming each line represents a label)
        labels = FileUtil.readFile(labelsPath).split("\\n");
    }

    @Override
    public Mat processFrame(Mat input) {

        // Pre-process the image for the TensorFlow Lite model (e.g., resize, normalize)
        Mat processedImage = preProcessImage(input);

        // Run inference on the pre-processed image
        runInference(processedImage);

        // Get detection results (bounding boxes and class IDs)
        List<Rect> boundingBoxes = getBoundingBoxes();
        List<Integer> classIds = getClassIds();

        // Draw bounding boxes and labels on the original image
        for (int i = 0; i < boundingBoxes.size(); i++) {
            Rect box = boundingBoxes.get(i);
            int classId = classIds.get(i);
            String label = labels[classId];

            // Draw bounding box and label on the image using OpenCV functions
            Imgproc.rectangle(input, box.tl(), box.br(), new Scalar(255, 0, 0), 2);
            Imgproc.putText(input, label, box.tl(), Core.FONT_HERSHEY_SIMPLEX, 1.0, new Scalar(255, 0, 0), 2);
        }
        return input;
    }

    // Helper methods to pre-process the image, run inference, and get results
    private Mat preProcessImage(Mat image) {
        // Implement your specific pre-processing steps here (e.g., resize)
        return image;
    }

    private void runInference(Mat image) {
        // Allocate tensors based on the model's input and output requirements
        tfliteInterpreter.run(new Object[]{image.toArray()}, null);
    }

    /**
     * This method is responsible for extracting the bounding box coordinates from the model's output tensor.
     * The implementation details will depend on the specific format of the output tensor produced by your
     * TensorFlow Lite model.
     *
     * Below is an example implementation that assumes the output tensor is a 4D float tensor with the following shape:
     * [batch_size, num_detections, 4 (coordinates), 2 (class_id and confidence)]
     *
     * Each detection is represented as a vector of length 6, where the first 4 values represent the normalized
     * bounding box coordinates [y_min, x_min, y_max, x_max], and the last two values represent the class ID and
     * confidence score, respectively.
     *
     * Note: This is just an example implementation, and you will need to modify it according to your model's
     * output format.
     */
    private List<Rect> getBoundingBoxes() {
        List<Rect> boundingBoxes = new ArrayList<>();

        // Get the output tensor from the interpreter
        float[][][][] outputTensor = tfliteInterpreter.getOutputTensor(0);

        // Iterate over the detections
        for (int i = 0; i < outputTensor[0].length; i++) {
            // Extract the bounding box coordinates and confidence score
            float[] detection = outputTensor[0][i][0];
            float yMin = detection[0];
            float xMin = detection[1];
            float yMax = detection[2];
            float xMax = detection[3];
            float confidence = detection[5];

            // Apply a confidence threshold (e.g., 0.5)
            if (confidence > 0.5) {
                // Denormalize the bounding box coordinates
                int imageHeight = /* Get the input image height */;
                int imageWidth = /* Get the input image width */;
                int left = (int) (xMin * imageWidth);
                int top = (int) (yMin * imageHeight);
                int right = (int) (xMax * imageWidth);
                int bottom = (int) (yMax * imageHeight);

                // Create a Rect object and add it to the list
                Rect rect = new Rect(left, top, right - left, bottom - top);
                boundingBoxes.add(rect);
            }
        }

        return boundingBoxes;
    }

    private List<Integer> getClassIds() {
        // Extract class ID data from the model's output tensor
        // ... (implementation details depend on the model's output format)
        return new ArrayList<>();
    }
}

Last updated