Developers’ Hands-on | Segment Anything Quantitative Acceleration

Brand Connect

September 06, 2023 / 15:06 IST

Author: Ethan Yang

1. Background

Segment Anything Model (SAM) is a powerful AI image segmentation application developed by Meta AI Lab. It can automatically identify which pixels in an image belong to an object and perform automatic stylistic processing on different objects in the image.

SAM’s complete application consists of an image encoder model and a mask decoder + prompt encoder model, with the image encoder taking the major computing workload during inference. Therefore, improving the execution efficiency of the image encoder becomes one of the main optimization directions for SAM applications.

Let’s focus on how to achieve quantization compression of the SAM encoder using the OpenVINO™ NNCF model compression tool to improve performance of inferencing on CPU.

2. Quantization Introduction

Before we dive into the practical implementation, we must mention the concept of quantization. Quantization refers to mapping the expression range of model parameters from FP32 to INT8 or INT4 without changing the model structure.

Intel AVX512 VNNI extension instructions compress the INT8 matrix multiplication and addition operations, which originally required three clock cycles, to one clock cycle. In the latest AMX instruction set, multiple VNNI modules are stacked to achieve a multiple-fold performance improvement within a single cycle.

3. NNCF Post-Training Quantization Mode

Neural Network Compression Framework (NNCF) is a solution implementation within the OpenVINO™ toolkit specifically designed for model compression and acceleration. NNCF usage can be categorized into two modes: Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT). While QAT requires the original training script and dataset, PTQ allows direct compression of the trained model file without the need for additional training scripts and labeled datasets. PTQ can be achieved through the following steps:

1. Prepare a calibration dataset. During the quantization process, the calibration data is used solely for calculating data range and distribution and does not require additional labeled data. Additionally, a DataLoader object and transform_fn data conversion functions need to be defined. The DataLoader is responsible for reading each element of the calibration dataset, while the transform_fn is used to convert the read elements into direct input data for OpenVINO™ model inference.

2. Run model quantization. First, import the model object and then bind the model object with the calibration dataset using the nncf.quantize() interface to initiate the quantization task. NNCF supports various model object types, including openvino.runtime.Model, torch.nn.Module, onnx.ModelProto, and tensorflow.Module.

3. (Optional) Accuracy control mode. If the exported model by NNCF in the default mode shows a decrease in accuracy, accuracy control mode can be used for post-training quantization. In this case, a labeled test dataset is required to evaluate the sensitivity of each layer's impact on model accuracy loss during the quantization process. For specific methods, please refer to the following link.

4. Segment Anything + NNCF Practical Application

Next, let’s take a step-by-step look at how to use NNCF’s PTQ mode to complete the quantization of the SAM encoder.

Project can be found here.

1. Define the data loader

For this, the coco128 dataset is used as the calibration dataset, which includes 128 .jpg format images. Since the data loader must be a torch DataLoader class when quantizing ONNX or IR static models, we need to inherit torch.utils.data.Dataset and reconstruct a dataset class that includes the getitem method for iterating over each object in the dataset, and the len method to get the number of objects in the dataset. Finally, a DataLoader is generated using the torch.utils.data.DataLoader method.

2. Define the data format conversion module

The next step is to define the data conversion module. We can use the previously defined preprocess_image function to preprocess the data. It’s worth noting that since the calibration_loader module returns a single data object in the torch tensor format, and the OpenVINO™ Python interface does not support this data type, we need to convert it to the numpy format first.

3. Run NNCF quantization

To ensure the accuracy of the quantized model, we use the original FP32 ONNX format model as the input object instead of the FP16 IR format model. Then, the model is passed into the nncf.quantize interface for quantization. This interface has several important additional parameters:

● model_type: Model type is used to enable special quantization strategies. For example, for transformer models, we need to prioritize model accuracy.

● preset: Quantization mode. The default mode is PERFORMANCE but in this case, we use the MIXED mode to achieve a balance between model accuracy and performance.

Since the SAM encoder model has a complex network structure and the quantization process requires traversing the parameters of each layer multiple times, the quantization process may take longer. It is recommended to use hardware devices with more than 32GB of memory.

4. Model accuracy comparison:

Next, we compare the inference results of the INT8 and FP16 models: It can be seen that in both prompt and auto modes, the INT8 model shows almost no change in accuracy compared to the FP16 model.

5. Performance comparison:

Finally, we compare the performance indicators using the benchmark_app tool provided by OpenVINO™: It can be found that on the CPU, the INT8 model achieves approximately a 30% improvement compared to the FP16 model, and the model size is reduced from around 350MB to less than 100MB.

5. Conclusion

Given the outstanding automatic segmentation capability of SAM, it is expected that there will be more application scenarios where this technology will be deployed. During the industrialization process, developers often focus on striking a balance between performance and accuracy to obtain a more cost-effective solution. OpenVINO™ NNCF tool achieves significant improvements in model runtime efficiency and reduces model space occupation without significantly impacting model accuracy.

Notices & Disclaimers

Intel technologies may require enabled hardware, software or service activation.

No product or component can be absolutely secure.

Your costs and results may vary.

Moneycontrol Journalists were not involved in the creation of the article.

first published: Sep 6, 2023 03:06 pm

Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!