YOLOE: Optimize redundant predictor initialization (#21198)

Co-authored-by: UltralyticsAssistant <web@ultralytics.com> Co-authored-by: Laughing-q <1185102784@qq.com>
2025-09-15 15:48:41 +08:00 · 2025-06-26 12:13:28 +06:00 · 2025-06-26 12:13:28 +06:00 · 09f9299d8d
commit 09f9299d8d
parent b9cf82ebbf
24 changed files with 27 additions and 67 deletions
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@ -151,17 +151,14 @@ If you prefer not to open-source your project, consider obtaining an [Enterprise
 Complying means making the **complete corresponding source code** of your project publicly available under the AGPL-3.0 license.
 1. **Choose Your Starting Point:**
   - **Fork Ultralytics YOLO:** Directly fork the [Ultralytics YOLO repository](https://github.com/ultralytics/ultralytics) if building closely upon it.
   - **Use Ultralytics Template:** Start with the [Ultralytics template repository](https://github.com/ultralytics/template) for a clean, modular setup integrating YOLO.
 2. **License Your Project:**
   - Add an `LICENSE` file containing the full text of the [AGPL-3.0 license](https://opensource.org/license/agpl-v3).
   - Add a notice at the top of each source file indicating the license.
 3. **Publish Your Source Code:**
   - Make your **entire project's source code** publicly accessible (e.g., on GitHub). This includes:
     - The complete larger application or system that incorporates the YOLO model or code.
     - Any modifications made to the original Ultralytics YOLO code.
--- a/docs/en/guides/defining-project-goals.md
+++ b/docs/en/guides/defining-project-goals.md
@ -74,11 +74,9 @@ Other tasks, like [object detection](../tasks/detect.md), are not suitable as th
 The order of model selection, dataset preparation, and training approach depends on the specifics of your project. Here are a few tips to help you decide:
 - **Clear Understanding of the Problem**: If your problem and objectives are well-defined, start with model selection. Then, prepare your dataset and decide on the training approach based on the model's requirements.
    - **Example**: Start by selecting a model for a traffic monitoring system that estimates vehicle speeds. Choose an object tracking model, gather and annotate highway videos, and then train the model with techniques for real-time video processing.
 - **Unique or Limited Data**: If your project is constrained by unique or limited data, begin with dataset preparation. For instance, if you have a rare dataset of medical images, annotate and prepare the data first. Then, select a model that performs well on such data, followed by choosing a suitable training approach.
    - **Example**: Prepare the data first for a facial recognition system with a small dataset. Annotate it, then select a model that works well with limited data, such as a pre-trained model for [transfer learning](https://www.ultralytics.com/glossary/transfer-learning). Finally, decide on a training approach, including [data augmentation](https://www.ultralytics.com/glossary/data-augmentation), to expand the dataset.
 - **Need for Experimentation**: In projects where experimentation is crucial, start with the training approach. This is common in research projects where you might initially test different training techniques. Refine your model selection after identifying a promising method and prepare the dataset based on your findings.
--- a/docs/en/guides/isolating-segmentation-objects.md
+++ b/docs/en/guides/isolating-segmentation-objects.md
@ -62,7 +62,6 @@ After performing the [Segment Task](../tasks/segment.md), it's sometimes desirab
            # (1) Get detection class name
            label = c.names[c.boxes.cls.tolist().pop()]
    ```
    1. To learn more about working with detection results, see [Boxes Section for Predict Mode](../modes/predict.md#boxes).
    2. To learn more about `predict()` results see [Working with Results for Predict Mode](../modes/predict.md#working-with-results)
@ -93,7 +92,6 @@ After performing the [Segment Task](../tasks/segment.md), it's sometimes desirab
    # Draw contour onto mask
    _ = cv2.drawContours(b_mask, [contour], -1, (255, 255, 255), cv2.FILLED)
    ```
    1. For more info on `c.masks.xy` see [Masks Section from Predict Mode](../modes/predict.md#masks).
    2. Here the values are cast into `np.int32` for compatibility with `drawContours()` function from [OpenCV](https://www.ultralytics.com/glossary/opencv).
@ -234,7 +232,6 @@ After performing the [Segment Task](../tasks/segment.md), it's sometimes desirab
    ***
 6.  <u>What to do next is entirely left to you as the developer.</u> A basic example of one possible next step (saving the image to file for future use) is shown.
    - **NOTE:** this step is optional and can be skipped if not required for your specific use case.
    ??? example "Example Final Step"
--- a/docs/en/guides/kfold-cross-validation.md
+++ b/docs/en/guides/kfold-cross-validation.md
@ -38,7 +38,6 @@ Without further ado, let's dive in!
 | Watermelon  |      1976      |
 - Necessary Python packages include:
    - `ultralytics`
    - `sklearn`
    - `pandas`
@ -47,7 +46,6 @@ Without further ado, let's dive in!
 - This tutorial operates with `k=5` folds. However, you should determine the best number of folds for your specific dataset.
 1. Initiate a new Python virtual environment (`venv`) for your project and activate it. Use `pip` (or your preferred package manager) to install:
    - The Ultralytics library: `pip install -U ultralytics`. Alternatively, you can clone the official [repo](https://github.com/ultralytics/ultralytics).
    - Scikit-learn, pandas, and PyYAML: `pip install -U scikit-learn pandas pyyaml`.
@ -129,7 +127,6 @@ The rows index the label files, each corresponding to an image in your dataset,
 ## K-Fold Dataset Split
 1. Now we will use the `KFold` class from `sklearn.model_selection` to generate `k` splits of the dataset.
    - Important:
        - Setting `shuffle=True` ensures a randomized distribution of classes in your splits.
        - By setting `random_state=M` where `M` is a chosen integer, you can obtain repeatable results.
@ -218,7 +215,6 @@ The rows index the label files, each corresponding to an image in your dataset,
    ```
 5. Lastly, copy images and labels into the respective directory ('train' or 'val') for each split.
    - **NOTE:** The time required for this portion of the code will vary based on the size of your dataset and your system hardware.
    ```python
--- a/docs/en/guides/model-training-tips.md
+++ b/docs/en/guides/model-training-tips.md
@ -145,12 +145,10 @@ You can also fine-tune optimizer parameters to improve model performance. Adjust
 Different optimizers have various strengths and weaknesses. Let's take a glimpse at a few common optimizers.
 - **SGD (Stochastic Gradient Descent)**:
    - Updates model parameters using the gradient of the loss function with respect to the parameters.
    - Simple and efficient but can be slow to converge and might get stuck in local minima.
 - **[Adam](https://www.ultralytics.com/glossary/adam-optimizer) (Adaptive Moment Estimation)**:
    - Combines the benefits of both SGD with momentum and RMSProp.
    - Adjusts the learning rate for each parameter based on estimates of the first and second moments of the gradients.
    - Well-suited for noisy data and sparse gradients.
--- a/docs/en/guides/steps-of-a-cv-project.md
+++ b/docs/en/guides/steps-of-a-cv-project.md
@ -48,11 +48,9 @@ The first step in any computer vision project is clearly defining the problem yo
 Here are some examples of project objectives and the computer vision tasks that can be used to reach these objectives:
 - **Objective:** To develop a system that can monitor and manage the flow of different vehicle types on highways, improving traffic management and safety.
    - **Computer Vision Task:** Object detection is ideal for traffic monitoring because it efficiently locates and identifies multiple vehicles. It is less computationally demanding than image segmentation, which provides unnecessary detail for this task, ensuring faster, real-time analysis.
 - **Objective:** To develop a tool that assists radiologists by providing precise, pixel-level outlines of tumors in medical imaging scans.
    - **Computer Vision Task:** Image segmentation is suitable for medical imaging because it provides accurate and detailed boundaries of tumors that are crucial for assessing size, shape, and treatment planning.
 - **Objective:** To create a digital system that categorizes various documents (e.g., invoices, receipts, legal paperwork) to improve organizational efficiency and document retrieval.
--- a/docs/en/guides/view-results-in-terminal.md
+++ b/docs/en/guides/view-results-in-terminal.md
@ -55,7 +55,6 @@ The VSCode compatible protocols for viewing images using the integrated terminal
    # Plot inference results
    plot = results[0].plot()  # (1)!
    ```
    1. See [plot method parameters](../modes/predict.md#plot-method-parameters) to see possible arguments to use.
 4. Now, use [OpenCV](https://www.ultralytics.com/glossary/opencv) to convert the `numpy.ndarray` to `bytes` data. Then use `io.BytesIO` to make a "file-like" object.
@ -74,7 +73,6 @@ The VSCode compatible protocols for viewing images using the integrated terminal
    # Image bytes as a file-like object
    mem_file = io.BytesIO(im_bytes)
    ```
    1. It's possible to use other image extensions as well.
    2. Only the object at index `1` that is returned is needed.
--- a/docs/en/guides/yolo-common-issues.md
+++ b/docs/en/guides/yolo-common-issues.md
@ -39,7 +39,6 @@ Installation errors can arise due to various reasons, such as incompatible versi
 Additionally, here are some common installation issues users have encountered, along with their respective solutions:
 - Import Errors or Dependency Issues - If you're getting errors during the import of YOLO11, or you're having issues related to dependencies, consider the following troubleshooting steps:
    - **Fresh Installation**: Sometimes, starting with a fresh installation can resolve unexpected issues. Especially with libraries like Ultralytics, where updates might introduce changes to the file tree structure or functionalities.
    - **Update Regularly**: Ensure you're using the latest version of the library. Older versions might not be compatible with recent updates, leading to potential conflicts or issues.
@ -51,7 +50,6 @@ Additionally, here are some common installation issues users have encountered, a
    - Remember, keeping your libraries and dependencies up-to-date is crucial for a smooth and error-free experience.
 - Running YOLO11 on GPU - If you're having trouble running YOLO11 on GPU, consider the following troubleshooting steps:
    - **Verify CUDA Compatibility and Installation**: Ensure your GPU is CUDA compatible and that CUDA is correctly installed. Use the `nvidia-smi` command to check the status of your NVIDIA GPU and CUDA version.
    - **Check PyTorch and CUDA Integration**: Ensure PyTorch can utilize CUDA by running `import torch; print(torch.cuda.is_available())` in a Python terminal. If it returns 'True', PyTorch is set up to use CUDA.
--- a/docs/en/guides/yolo-performance-metrics.md
+++ b/docs/en/guides/yolo-performance-metrics.md
@ -56,7 +56,6 @@ One of the sections of the output is the class-wise breakdown of performance met
 - **Instances**: This provides the count of how many times the class appears across all images in the validation set.
 - **Box(P, R, mAP50, mAP50-95)**: This metric provides insights into the model's performance in detecting objects:
    - **P (Precision)**: The accuracy of the detected objects, indicating how many detections were correct.
    - **R (Recall)**: The ability of the model to identify all instances of objects in the images.
--- a/docs/en/help/contributing.md
+++ b/docs/en/help/contributing.md
@ -230,17 +230,14 @@ If you prefer not to open-source your project, consider obtaining an [Enterprise
 Complying means making the **complete corresponding source code** of your project publicly available under the AGPL-3.0 license.
 1. **Choose Your Starting Point:**
    - **Fork Ultralytics YOLO:** Directly fork the [Ultralytics YOLO repository](https://github.com/ultralytics/ultralytics) if building closely upon it.
    - **Use Ultralytics Template:** Start with the [Ultralytics template repository](https://github.com/ultralytics/template) for a clean, modular setup integrating YOLO.
 2. **License Your Project:**
    - Add an `LICENSE` file containing the full text of the [AGPL-3.0 license](https://opensource.org/license/agpl-v3).
    - Add a notice at the top of each source file indicating the license.
 3. **Publish Your Source Code:**
    - Make your **entire project's source code** publicly accessible (e.g., on GitHub). This includes:
        - The complete larger application or system that incorporates the YOLO model or code.
        - Any modifications made to the original Ultralytics YOLO code.
--- a/docs/en/integrations/clearml.md
+++ b/docs/en/integrations/clearml.md
@ -117,22 +117,18 @@ By clicking on the URL link to the ClearML results page in the output of the usa
 #### Key Features of the ClearML Results Page
 - **Real-Time Metrics Tracking**
    - Track critical metrics like loss, [accuracy](https://www.ultralytics.com/glossary/accuracy), and validation scores as they occur.
    - Provides immediate feedback for timely model performance adjustments.
 - **Experiment Comparison**
    - Compare different training runs side-by-side.
    - Essential for [hyperparameter tuning](https://www.ultralytics.com/glossary/hyperparameter-tuning) and identifying the most effective models.
 - **Detailed Logs and Outputs**
    - Access comprehensive logs, graphical representations of metrics, and console outputs.
    - Gain a deeper understanding of model behavior and issue resolution.
 - **Resource Utilization Monitoring**
    - Monitor the utilization of computational resources, including CPU, GPU, and memory.
    - Key to optimizing training efficiency and costs.
--- a/docs/en/integrations/coreml.md
+++ b/docs/en/integrations/coreml.md
@ -45,7 +45,6 @@ Before we look at the code for exporting YOLO11 models to the CoreML format, let
 CoreML offers various deployment options for machine learning models, including:
 - **On-Device Deployment**: This method directly integrates CoreML models into your iOS app. It's particularly advantageous for ensuring low latency, enhanced privacy (since data remains on the device), and offline functionality. This approach, however, may be limited by the device's hardware capabilities, especially for larger and more complex models. On-device deployment can be executed in the following two ways.
    - **Embedded Models**: These models are included in the app bundle and are immediately accessible. They are ideal for small models that do not require frequent updates.
    - **Downloaded Models**: These models are fetched from a server as needed. This approach is suitable for larger models or those needing regular updates. It helps keep the app bundle size smaller.
@ -192,7 +191,6 @@ For more details on integrating your CoreML model into an iOS app, check out the
 Once you export your YOLO11 model to CoreML format, you have multiple deployment options:
 1. **On-Device Deployment**: Directly integrate CoreML models into your app for enhanced privacy and offline functionality. This can be done as:
    - **Embedded Models**: Included in the app bundle, accessible immediately.
    - **Downloaded Models**: Fetched from a server as needed, keeping the app bundle size smaller.
--- a/docs/en/integrations/jupyterlab.md
+++ b/docs/en/integrations/jupyterlab.md
@ -203,12 +203,10 @@ JupyterLab's interactive environment allows for quick iterations and real-time f
 When working with JupyterLab and YOLO11, you might encounter some common issues. Here's how to handle them:
 1. GPU memory issues:
    - Use `torch.cuda.empty_cache()` to clear GPU memory between runs.
    - Adjust [batch size](https://www.ultralytics.com/glossary/batch-size) or image size to fit your GPU memory.
 2. Package conflicts:
    - Create a separate conda environment for your YOLO11 projects to avoid conflicts.
    - Use `!pip install package_name` in a notebook cell to install missing packages.
--- a/docs/en/integrations/tensorboard.md
+++ b/docs/en/integrations/tensorboard.md
@ -129,7 +129,6 @@ Scalars in the TensorBoard are crucial for plotting and analyzing simple metrics
 - **Learning Rate (lr) Tags**: These tags show the variations in the learning rate across different segments (e.g., `pg0`, `pg1`, `pg2`). This helps us understand the impact of learning rate adjustments on the training process.
 - **Metrics Tags**: Scalars include performance indicators such as:
    - `mAP50 (B)`: Mean Average [Precision](https://www.ultralytics.com/glossary/precision) at 50% [Intersection over Union](https://www.ultralytics.com/glossary/intersection-over-union-iou) (IoU), crucial for assessing object detection accuracy.
    - `mAP50-95 (B)`: [Mean Average Precision](https://www.ultralytics.com/glossary/mean-average-precision-map) calculated over a range of IoU thresholds, offering a more comprehensive evaluation of accuracy.
--- a/docs/en/integrations/tensorrt.md
+++ b/docs/en/integrations/tensorrt.md
@ -145,7 +145,6 @@ When processing implicitly quantized networks TensorRT uses INT8 opportunistical
 The arguments provided when using [export](../modes/export.md) for an Ultralytics YOLO model will **greatly** influence the performance of the exported model. They will also need to be selected based on the device resources available, however the default arguments _should_ work for most [Ampere (or newer) NVIDIA discrete GPUs](https://developer.nvidia.com/blog/nvidia-ampere-architecture-in-depth/). The calibration algorithm used is `"MINMAX_CALIBRATION"` and you can read more details about the options available [in the TensorRT Developer Guide](https://docs.nvidia.com/deeplearning/tensorrt/latest/_static/python-api/infer/Int8/MinMaxCalibrator.html). Ultralytics tests found that `"MINMAX_CALIBRATION"` was the best choice and exports are fixed to using this algorithm.
 - `workspace` : Controls the size (in GiB) of the device memory allocation while converting the model weights.
    - Adjust the `workspace` value according to your calibration needs and resource availability. While a larger `workspace` may increase calibration time, it allows TensorRT to explore a wider range of optimization tactics, potentially enhancing model performance and [accuracy](https://www.ultralytics.com/glossary/accuracy). Conversely, a smaller `workspace` can reduce calibration time but may limit the optimization strategies, affecting the quality of the quantized model.
    - Default is `workspace=None`, which will allow for TensorRT to automatically allocate memory, when configuring manually, this value may need to be increased if calibration crashes (exits without warning).
@ -551,7 +550,6 @@ These guides will help you integrate YOLOv8 models efficiently in various deploy
 Performance improvements with TensorRT can vary based on the hardware used. Here are some typical benchmarks:
 - **NVIDIA A100**:
    - **FP32** Inference: ~0.52 ms / image
    - **FP16** Inference: ~0.34 ms / image
    - **INT8** Inference: ~0.28 ms / image
--- a/docs/en/models/fast-sam.md
+++ b/docs/en/models/fast-sam.md
@ -254,7 +254,6 @@ FastSAM is also available directly from the [https://github.com/CASIA-IVA-Lab/Fa
 1. Download a [model checkpoint](https://drive.google.com/file/d/1m1sjY4ihXBU1fZXdQ-Xdj-mDltW-2Rqv/view?usp=sharing).
 2. Use FastSAM for inference. Example commands:
    - Segment everything in an image:
        ```bash
--- a/docs/en/models/yolo12.md
+++ b/docs/en/models/yolo12.md
@ -113,13 +113,11 @@ The examples below focus on YOLO12 [Detect](../tasks/detect.md) models (for obje
 ## Key Improvements
 1. **Enhanced [Feature Extraction](https://www.ultralytics.com/glossary/feature-extraction)**:
    - **Area Attention**: Efficiently handles large [receptive fields](https://www.ultralytics.com/glossary/receptive-field), reducing computational cost.
    - **Optimized Balance**: Improved balance between attention and feed-forward network computations.
    - **R-ELAN**: Enhances feature aggregation using the R-ELAN architecture.
 2. **Optimization Innovations**:
    - **Residual Connections**: Introduces residual connections with scaling to stabilize training, especially in larger models.
    - **Refined Feature Integration**: Implements an improved method for feature integration within R-ELAN.
    - **FlashAttention**: Incorporates FlashAttention to reduce memory access overhead.
--- a/docs/en/models/yoloe.md
+++ b/docs/en/models/yoloe.md
@ -757,7 +757,6 @@ Quickly set up YOLOE with Ultralytics by following these steps:
   Pre-trained YOLOE models (e.g., YOLOE-v8-S/L, YOLOE-11 variants) are available from the YOLOE GitHub releases. Simply download your desired `.pt` file to load into the Ultralytics YOLO class.
 3. **Hardware Requirements**:
    - **Inference**: Recommended GPU (NVIDIA with ≥4-8GB VRAM). Small models run efficiently on edge GPUs (e.g., [Jetson](../guides/nvidia-jetson.md)) or CPUs at lower resolutions.
    - **Training**: Fine-tuning YOLOE on custom data typically requires just one GPU. Extensive open-vocabulary pre-training (LVIS/Objects365) used by authors required substantial compute (8× RTX 4090 GPUs).
@ -765,7 +764,6 @@ Quickly set up YOLOE with Ultralytics by following these steps:
   YOLOE configurations use standard Ultralytics YAML files. Default configs (e.g., `yoloe-11s-seg.yaml`) typically suffice, but you can modify backbone, classes, or image size as needed.
 5. **Running YOLOE**:
    - **Quick inference** (prompt-free):
        ```bash
        yolo predict model=yoloe-11s-seg-pf.pt source="image.jpg"
--- a/docs/mkdocs_github_authors.yaml
+++ b/docs/mkdocs_github_authors.yaml
@ -217,6 +217,9 @@ waxmann.sergiu@me.com:
 web@ultralytics.com:
  avatar: https://avatars.githubusercontent.com/u/135830346?v=4
  username: UltralyticsAssistant
 willie.maddox@gmail.com:
  avatar: null
  username: null
 work.ankanghosh@gmail.com:
  avatar: https://avatars.githubusercontent.com/u/79740115?v=4
  username: ankanpy
--- a/examples/YOLO-Series-ONNXRuntime-Rust/README.md
+++ b/examples/YOLO-Series-ONNXRuntime-Rust/README.md
@ -26,7 +26,6 @@ This repository provides a [Rust](https://www.rust-lang.org/) demo showcasing ke
 <summary>You have two options to link the ONNXRuntime library:</summary>
 - **Option 1: Manual Linking**
  - For detailed setup instructions, consult the [ONNX Runtime linking documentation](https://ort.pyke.io/setup/linking).
  - **Linux or macOS**:
    1. Download the appropriate ONNX Runtime package from the official [Releases page](https://github.com/microsoft/onnxruntime/releases).
--- a/examples/YOLOv8-MNN-CPP/README.md
+++ b/examples/YOLOv8-MNN-CPP/README.md
@ -73,7 +73,6 @@ Follow these steps to build the project:
    ```
    **Note:**
    - The library file extensions (`.a` for static) and paths might vary based on your operating system (e.g., use `.lib` on Windows) and build configuration. Adjust the commands accordingly.
    - This example uses static linking (`.a` files). If you built shared libraries (`.so`, `.dylib`, `.dll`), ensure they are correctly placed or accessible in your system's library path.
--- a/examples/YOLOv8-ONNXRuntime-CPP/README.md
+++ b/examples/YOLOv8-ONNXRuntime-CPP/README.md
@ -120,7 +120,6 @@ Ensure you have the following dependencies installed:
    ```
    **CMake Options:**
    - `-DONNXRUNTIME_ROOT=<path>`: **(Required)** Path to the extracted ONNX Runtime library.
    - `-DCMAKE_BUILD_TYPE=Release`: (Optional) Build in Release mode for optimizations.
    - If CMake struggles to find OpenCV, you might need to set `-DOpenCV_DIR=/path/to/opencv/build`.
--- a/examples/YOLOv8-TFLite-Python/README.md
+++ b/examples/YOLOv8-TFLite-Python/README.md
@ -59,7 +59,6 @@ Follow these steps to run inference with your exported YOLOv8 TFLite model.
      --iou 0.45 \
      --metadata yolov8n_saved_model/metadata.yaml
    ```
    - `--model`: Path to the exported `.tflite` model file.
    - `--img`: Path to the input image for detection.
    - `--conf`: Minimum [confidence threshold](https://www.ultralytics.com/glossary/confidence) for detections (e.g., 0.25).
--- a/ultralytics/models/yolo/model.py
+++ b/ultralytics/models/yolo/model.py
@ -406,18 +406,18 @@ class YOLOE(Model):
                f"Expected equal number of bounding boxes and classes, but got {len(visual_prompts['bboxes'])} and "
                f"{len(visual_prompts['cls'])} respectively"
            )
-        self.predictor = (predictor or self._smart_load("predictor"))(
+            if not isinstance(self.predictor, yolo.yoloe.YOLOEVPDetectPredictor):
-            overrides={
+                self.predictor = (predictor or yolo.yoloe.YOLOEVPDetectPredictor)(
-                "task": self.model.task,
+                    overrides={
-                "mode": "predict",
+                        "task": self.model.task,
-                "save": False,
+                        "mode": "predict",
-                "verbose": refer_image is None,
+                        "save": False,
-                "batch": 1,
+                        "verbose": refer_image is None,
-            },
+                        "batch": 1,
-            _callbacks=self.callbacks,
+                    },
-        )
+                    _callbacks=self.callbacks,
                )
        if len(visual_prompts):
            num_cls = (
                max(len(set(c)) for c in visual_prompts["cls"])
                if isinstance(source, list) and refer_image is None  # means multiple images
@ -426,18 +426,19 @@ class YOLOE(Model):
            self.model.model[-1].nc = num_cls
            self.model.names = [f"object{i}" for i in range(num_cls)]
            self.predictor.set_prompts(visual_prompts.copy())
            self.predictor.setup_model(model=self.model)
-        self.predictor.setup_model(model=self.model)
+            if refer_image is None and source is not None:
-
+                dataset = load_inference_source(source)
-        if refer_image is None and source is not None:
+                if dataset.mode in {"video", "stream"}:
-            dataset = load_inference_source(source)
+                    # NOTE: set the first frame as refer image for videos/streams inference
-            if dataset.mode in {"video", "stream"}:
+                    refer_image = next(iter(dataset))[1][0]
-                # NOTE: set the first frame as refer image for videos/streams inference
+            if refer_image is not None:
-                refer_image = next(iter(dataset))[1][0]
+                vpe = self.predictor.get_vpe(refer_image)
-        if refer_image is not None and len(visual_prompts):
+                self.model.set_classes(self.model.names, vpe)
-            vpe = self.predictor.get_vpe(refer_image)
+                self.task = "segment" if isinstance(self.predictor, yolo.segment.SegmentationPredictor) else "detect"
-            self.model.set_classes(self.model.names, vpe)
+                self.predictor = None  # reset predictor
-            self.task = "segment" if isinstance(self.predictor, yolo.segment.SegmentationPredictor) else "detect"
+        elif isinstance(self.predictor, yolo.yoloe.YOLOEVPDetectPredictor):
-            self.predictor = None  # reset predictor
+            self.predictor = None  # reset predictor if no visual prompts
        return super().predict(source, stream, **kwargs)