840 字
4 分钟
yolov10模型Docker训练环境搭建

环境#

  1. docker compose

代码克隆#

使用git克隆仓库

git clone https://github.com/7emotions/robot-vision.git
7emotions
/
robot-vision
Waiting for api.github.com...
00K
0K
0K
Waiting...

数据集标定#

yolo/文件夹下(后续都在该工作目录),新建dataset/images/dataset/labels/。使用labelimg标定数据集,yolo支持的数据集格式为

<class_id> <x_center> <y_center> <width> <height>
NOTE

label文件的文件名需要与image 文件的文件名保持一致。

将图片文件与标签文件分别移动到dataset/images/dataset/labels下。

数据集划分#

运行split.py划分数据集

docker-compose run trainer python3 split.py

split.py会将数据集的80%划分为训练集20%划分为验证集。对于缺失labels的负样本,会新建空白label

划分后,dataset/的目录结构为

dataset/
├── images/
│   ├── train/
│   │   ├── image1.jpg
│   │   ├── image2.png
│   │   └── ...
│   └── val/
│       ├── image3.jpg
│       ├── image4.jpeg
│       └── ...
└── labels/
    ├── train/
    │   ├── image1.txt
    │   ├── image2.txt
    │   └── ...
    └── val/
        ├── image3.txt
        ├── image4.txt
        └── ...

数据配置文件#

训练之前需要编写数据配置文件。修改conf.yml文件

train: dataset/images/train
val: dataset/images/val
nc: 7
names: ["red target", "blue area", "blue target", "starting point", "black target","yellow target","red area"]

其中,nc表示类别数目,names表示类别名称列表。

预训练权重文件#

yolov10仓库的Release中提供预训练权重文件。此处以yolov10s.pt为例。

THU-MIG
/
yolov10
Waiting for api.github.com...
00K
0K
0K
Waiting...
NOTE

YOLOv10 系列通常会包含多个不同大小和复杂度的变体,以适应不同的计算资源和性能需求。这些变体通常会用后缀来区分,例如 -n(极小)、-s(小型)、-m(中型)、-l(大型)、-x(极大)等等。

NOTE

若更换预训练权重文件,请在compose.yml中替换yolov10s.pt

模型训练#

启动容器进行训练

docker-compose up -d

查看日志

docker-compose logs trainer -f

模型将输出在runs/目录下, 可查看到结果

labels.jpg

labels_correlogram

train_batch0.jpg

train_batch1.jpg

train_batch2.jpg

模型导出#

保存在yolo/runs/deteçt/train/weights/下的.pt文件可以通过以下命令导出为.onnx模型文件。

docker-compose run trainer yolo export model=runs/detect/train/weights/best.pt format=onnx

.onnx模型会保存在yolo/runs/detect/train/weights/目录下。

模型使用#

cpp为例,采用OpenCV导入.onnx模型。

#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <opencv2/opencv.hpp>
#include <opencv2/dnn.hpp>
#include <format>

int main() {
    std::string onnx_model_path = "best.onnx";
    std::string classes_path = "classes.txt";
    std::string image_path = "test.jpg";

    float confidenceThreshold = 0.5;
    float nmsThreshold = 0.4;

    cv::Net net = cv::dnn::readNetFromONNX(onnx_model_path);
    if (net.empty()) {
        std::cerr << "Error: Could not load ONNX model: " << onnx_model_path << std::endl;
        return -1;
    }
    std::cout << "ONNX model loaded successfully." << std::endl;

    std::vector<std::string> classes;
    std::ifstream ifs(classes_path);
    std::string line;
    if (ifs.is_open()) {
        while (getline(ifs, line)) {
            classes.push_back(line);
        }
    } else {
        std::cerr << "Error: Could not open classes file: " << classes_path << std::endl;
        return -1;
    }
    std::cout << "Loaded " << classes.size() << " classes." << std::endl;

    cv::Mat frame = cv::imread(image_path);
    if (frame.empty()) {
        std::cerr << "Error: Could not read image: " << image_path << std::endl;
        return -1;
    }
    int frameWidth = frame.cols;
    int frameHeight = frame.rows;

    cv::Mat blob;
    cv::dnn::blobFromImage(frame, blob, 1 / 255.0, cv::Size(640, 640), cv::Scalar(0, 0, 0), true, false);
    net.setInput(blob);

    cv::Mat output = net.forward();

    std::vector<int> classIds;
    std::vector<float> confidences;
    std::vector<cv::Rect> boxes;

    int rows = output.size[2];
    int cols = output.size[3];

    for (int i = 0; i < rows; ++i) {
        float confidence = output.at<float>(0, 0, i, 4);

        if (confidence > confidenceThreshold) {
            cv::Mat scores = output.row(i).colRange(5, cols);
            cv::Point classIdPoint;
            double maxScore;
            cv::minMaxLoc(scores, 0, &maxScore, 0, &classIdPoint);
            int classId = classIdPoint.x;

            if (maxScore > confidenceThreshold) {
                float centerX = output.at<float>(0, 0, i, 0) * frameWidth;
                float centerY = output.at<float>(0, 0, i, 1) * frameHeight;
                float width = output.at<float>(0, 0, i, 2) * frameWidth;
                float height = output.at<float>(0, 0, i, 3) * frameHeight;
                cv::Rect box(cv::Point(cvRound(centerX - width / 2), cvRound(centerY - height / 2)),
                             cv::Size(cvRound(width), cvRound(height)));

                classIds.push_back(classId);
                confidences.push_back(confidence * maxScore);
                boxes.push_back(box);
            }
        }
    }

    std::vector<int> indices;
    cv::dnn::NMSBoxes(boxes, confidences, confidenceThreshold, nmsThreshold, indices);

    for (int idx : indices) {
        cv::Rect box = boxes[idx];
        int classId = classIds[idx];
        float confidence = confidences[idx];

        cv::rectangle(frame, box, cv::Scalar(0, 255, 0), 2);
        
  std::string label = classes[classId] + ": " + std::format("{:.2f}", confidence);
        cv::putText(frame, label, cv::Point(box.x, box.y - 10), cv::FONT_HERSHEY_SIMPLEX, 0.5, cv::Scalar(0, 255, 0), 2);
    }

    cv::imshow("Detected Objects", frame);
    cv::waitKey(0);
    cv::destroyAllWindows();

    return 0;
}
yolov10模型Docker训练环境搭建
https://lorenzofeng.top/posts/yolov10s/yolov10s/
作者
Lorenzo Feng
发布于
2025-04-17
许可协议
CC BY-NC-SA 4.0