OpenCV began as an Intel research project in 1999, aiming to accelerate real-time computer vision tasks. Its 2002 open-source release fostered a large developer community, leading to its growth into a powerful and versatile library for various computer vision applications across fields like robotics, self-driving cars, and medical imaging.
OpenCV's success as a powerful yet general-purpose library sparked the development of more specialized solutions. EasyOpenCV (EOCV) emerged within the FTC (FIRST Tech Challenge) robotics community. It leverages OpenCV's core functionalities but simplifies them and adds robot-specific features for easier integration with FTC robots. This allows FTC teams to focus on their robot's vision-based tasks without needing in-depth computer vision expertise.
Basics on our Structure
Vocabulary
Tensor: A multi-dimensional array used to represent data in machine learning models. Tensors are the fundamental data structures in deep learning frameworks like TensorFlow and PyTorch.
Inference: The process of running a trained machine learning model on new input data to obtain predictions or outputs.
Allocate Tensors: The process of allocating memory for tensors in the model's input and output layers. This is typically done before running the model's inference on input data.
Output Tensor: The tensor(s) produced by the machine learning model after running inference on the input data. The output tensor(s) contain the model's predictions or outputs, such as object bounding boxes, class probabilities, or segmentation masks.
Non-Maximum Suppression (NMS): A post-processing technique used in object detection to eliminate redundant overlapping bounding boxes for the same object. NMS helps to ensure that only the most confident and accurate bounding box is retained for each object.
Pseudocode
This is rough AI-generated code that has served our class well as a springboard conversation.
// Import necessary librariesimportorg.opencv.core.Mat;importorg.opencv.core.Rect;importorg.opencv.core.Point;importorg.opencv.imgproc.Imgproc;importorg.opencv.core.Core;importorg.opencv.core.Scalar;importorg.tensorflow.lite.Interpreter;importjava.io.IOException;importjava.util.ArrayList;importjava.util.List;// Define a class for the ObjectDetectionPipelinepublicclassObjectDetectionPipelineextendsEasyOpenCvPipeline {// Declare variables for model and labelsprivateInterpreter tfliteInterpreter;privateString[] labels;// Constructor to load the TensorFlow Lite model and labelspublicObjectDetectionPipeline(String modelPath,String labelsPath) throwsIOException {// Load TensorFlow Lite model from file tfliteInterpreter =newInterpreter(FileUtil.loadEssentiaModel(modelPath));// Load labels from file (assuming each line represents a label) labels =FileUtil.readFile(labelsPath).split("\\n"); } @OverridepublicMatprocessFrame(Mat input) {// Pre-process the image for the TensorFlow Lite model (e.g., resize, normalize)Mat processedImage =preProcessImage(input);// Run inference on the pre-processed imagerunInference(processedImage);// Get detection results (bounding boxes and class IDs)List<Rect> boundingBoxes =getBoundingBoxes();List<Integer> classIds =getClassIds();// Draw bounding boxes and labels on the original imagefor (int i =0; i <boundingBoxes.size(); i++) {Rect box =boundingBoxes.get(i);int classId =classIds.get(i);String label = labels[classId];// Draw bounding box and label on the image using OpenCV functionsImgproc.rectangle(input,box.tl(),box.br(),newScalar(255,0,0),2);Imgproc.putText(input, label,box.tl(),Core.FONT_HERSHEY_SIMPLEX,1.0,newScalar(255,0,0),2); }return input; }// Helper methods to pre-process the image, run inference, and get resultsprivateMatpreProcessImage(Mat image) {// Implement your specific pre-processing steps here (e.g., resize)return image; }privatevoidrunInference(Mat image) {// Allocate tensors based on the model's input and output requirementstfliteInterpreter.run(newObject[]{image.toArray()},null); } /** * This method is responsible for extracting the bounding box coordinates from the model's output tensor. * The implementation details will depend on the specific format of the output tensor produced by your * TensorFlow Lite model. * * Below is an example implementation that assumes the output tensor is a 4D float tensor with the following shape: * [batch_size, num_detections, 4 (coordinates), 2 (class_id and confidence)] * * Each detection is represented as a vector of length 6, where the first 4 values represent the normalized * bounding box coordinates [y_min, x_min, y_max, x_max], and the last two values represent the class ID and * confidence score, respectively. * * Note: This is just an example implementation, and you will need to modify it according to your model's * output format. */privateList<Rect> getBoundingBoxes() {List<Rect> boundingBoxes =newArrayList<>();// Get the output tensor from the interpreterfloat[][][][] outputTensor =tfliteInterpreter.getOutputTensor(0);// Iterate over the detectionsfor (int i =0; i < outputTensor[0].length; i++) {// Extract the bounding box coordinates and confidence scorefloat[] detection = outputTensor[0][i][0];float yMin = detection[0];float xMin = detection[1];float yMax = detection[2];float xMax = detection[3];float confidence = detection[5];// Apply a confidence threshold (e.g., 0.5)if (confidence >0.5) {// Denormalize the bounding box coordinatesint imageHeight =/* Get the input image height */;int imageWidth =/* Get the input image width */;int left = (int) (xMin * imageWidth);int top = (int) (yMin * imageHeight);int right = (int) (xMax * imageWidth);int bottom = (int) (yMax * imageHeight);// Create a Rect object and add it to the listRect rect =newRect(left, top, right - left, bottom - top);boundingBoxes.add(rect); } }return boundingBoxes; }privateList<Integer> getClassIds() {// Extract class ID data from the model's output tensor// ... (implementation details depend on the model's output format)returnnewArrayList<>(); }}