Running models using NPU with Copilot+ PC

Introduction

Recently I have been exploring models to run locally from Qualcomm® library on the AI Hub to try something out and run a small model such as Super Resolution that is designed to upscale images to a higher resolution. The challange came when trying to explicitly run the model on the NPU (Neural Processing Unit) instead of the CPU - which I have managed to get working recently. This post shares how to do that, if you are struggling and want to learn about local models too.

So lets get started - but to warn you, I’m not a Python guy, I mainly dotnet C# and Typescript but I’m learning something new since alot of samples using AI start with Python. There was a new Local On-Device AI in Copilot+ PCs Explained | Microsoft Mechanics video launched recently that highlights the capabilities of Snapdragon based devices to give you a bit of a background, bear in mind, I dont have a Surface device, I have a Lenono Slim 7x Copilot+ PC.

Why not use a studio

During my time research on how to utilise the core feature of a Copilot+ PC, the 40 TOPS NPU, I found that many SDKs i.e. Windows App SDK including Windows Copilot Runtime, Onnxruntime-GenAI library, Samples, and so on, are not yet ready for using the local devices NPU. Typical me to get a device that bleeding edge that software has yet to catch up. Even products for LLM/Model studios for running models have yet to support NPUs. I suspect, if you read this towards end of 2024 this might be a different story.

I’ve tried the following studios for running models, typically, LLM models and they work BUT, none of them yet utilise the NPU.

LM Studio
AI Toolkit
Anaconda AI Studio

These are great tools, most in beta or preview and I havent found one yet that runs using the Qualcomm Hexagon NPU.

Setup with Python

Lets get started, I’m not going to do much of a step by step guide here but this is my setup to set that foundation. I have installed,

Miniconda specically the Windows x64 version. At the time of writing seen many warnings about ARM64 dependencies need to catch up. So you may want to explore the ARM version, if your wondering what this is, its like NVM but for python allowing you to create contained environments for version, dependencies etc so you can switch versions easily.
Setup an environment running Python 3.11
Activate the environment accordingly.

I thank Fabian Williams for the recommendation, its might easier to jump versions.

At this point don’t yet start installing packages with pip.

Additonal Software

Before I get into this, I need to thank Qualcomm AI Support thorugh their Slack channels, super quick and responsive to help me unlock why I couldnt get a key component running.

Next your going to need:

Visual Studio 2022 with the C++ development tools installed. Now when I was working on this I am not fully sure this is needed, but as part of getting this working it was recommended.
Qualcomm® AI Engine Direct SDK - you will need some key files from this SDK. Signup and download here Qualcomm® AI Engine Direct SDK | Software Centre

Getting Model working starting with the CPU

So for this example, you can get Snapdragon Elite X optimised models from the Qualcomm®‘s AI Hub download the - QuickSRNetLarge-Quantized AI Model. This model is designed to upscale images to a higher resolution up to 4x the orignal size. I felt this was a good starting place. Download the model and keep note of the path.

Next we need a sample, I wrote this with the help of GitHub Copilot, to get an image and format, convert and size according to the specifications of the model, Im using a 128px x 128px image of some rocks on a beach.

Input Image

First install the phython modules:

pip install onnx onnxruntime numpy pillow

then either save the following script, or use a Jupyter Notebook (what I use) to run the Python script:

import onnx
import onnxruntime as ort
import numpy as np
from PIL import Image
import sys

# Install - pip install onnx onnxruntime onnxruntime-qnn pillow

def preprocess_image(image_path):
    # Load image
    image = Image.open(image_path).convert('RGB')  # Convert to grayscale
    image = image.resize((128, 128))  # Resize to the required input size
    image_data = np.array(image).astype(np.uint8)
    image_data = image_data.transpose(2, 0, 1)  # Change data layout from HWC to CHW
    image_data = np.expand_dims(image_data, axis=0)  # Add batch dimension
    return image_data

def postprocess_output(output_data):
    # Remove batch dimension and change data layout from CHW to HWC
    output_image = output_data.squeeze().transpose(1, 2, 0)
    # Convert to uint8
    output_image = output_image.astype(np.uint8)
    return output_image

def main(image_path):
    # Check available providers
    available_providers = ort.get_available_providers()
    print("Available providers:", available_providers)

    # Load the ONNX model
    model_path = r"C:\\ai\\models\\quicksrnetlarge_quantized.onnx"  # Ensure this file exists in the correct path

    
    providers = ['CPUExecutionProvider']
    ort_session = ort.InferenceSession(model_path, providers=providers)
    
    # Prepare input data
    input_name = ort_session.get_inputs()[0].name
    input_data = preprocess_image(image_path)

    # Run the model
    result = ort_session.run(None, {input_name: input_data})

    # Post-process the output
    output_image = postprocess_output(result[0])

    # Save the output image
    output_image = Image.fromarray(output_image)
    output_image.save("output_image_notebook.png")
    print("Output image saved as output_image_notebook.png")


if __name__ == "__main__":
    main('example_image_128.jpg')

if you run this, its a very small model, it will run in Milliseconds, so will be fast.

Note: If you run pip install onnxruntime-qnn before doing installing the dependenies, which I did, you will need to install and re-install the package. For a while, the “QNNExecutionProvider” was not showing up as available providers in the sample code provided later.

Your output should look like this:

Available providers: ['AzureExecutionProvider', 'CPUExecutionProvider']
Output image saved as output_image_notebook.png

Output Image

Hopefully, this will run successfully for you, you should get an upscaled output image of 512px x 512px.

Now OK, may not feel super implressive, but you could slice and dice up a larger one into tiles for example the dimentions of the input and run multiple times then stitch together. Bit beyond this example though.

The key aspect of choosing where the model runs is the CPUExecutionProvider which runs on the CPU.

Enhancing model to utilise the NPU

So you can tweak this script to run on the NPU, how can you check, you should see a small spike in Task Manager.

Install the python module

pip install qnnxruntime-qnn

Next from the SDK, we need to find the QnnHtp.dll file in the C:\Qualcomm\AIStack\QAIRT\2.26.0.240828\lib\arm64x-windows-msvc directory.

I copied this into a common location, then modify the python script accordingly:

change:

providers = ['CPUExecutionProvider']
    ort_session = ort.InferenceSession(model_path, providers=providers)

to:

execution_provider_option = {
        "backend_path": f"C:\\ai\\qcdll\\QnnHtp.dll",
        "session.enable_htp_fp16_precision": "1",
        "htp_performance_mode": "high_performance",
    }
    
    # Use QNNExecutionProvider regardless of the available providers
    providers = ['QNNExecutionProvider']
    ort_session = ort.InferenceSession(model_path, providers=providers, provider_options=[execution_provider_option])

Re-run the model and your output should look like this:

Available providers: ['QNNExecutionProvider', 'AzureExecutionProvider', 'CPUExecutionProvider']
Output image saved as output_image_notebook.png

Running on the NPU

Wahoo!!

Conclusion

I personally, did ALOT of playing and research to get this up and running to have SOMETHING using the NPU, but now I have the QNNExecutionProvider working, I can explore more dense models to run. I am keen to explore this further with LLMs when the Windows Copilot Runtime and Windows SDK and/or the onnxruntime-genai library get QNN support - I literally have watches on the relevant repos to monitor when this happens.

It wont be long until these libraries are released and when that happens - I feel we will have an explosion of samples, demos, new apps using these local models.

Resources

The following resource might be useful for you to learn further:

Qualcomm® AI Hub
Qualcomm® Slack Community
Qualcomm® GitHub Repo - note these examples/demos will utilise their cloud devices not the local ones.
QNN EP | Windows on Snapdragon Docs
deeplearning.ai if you want significantly more technical detail on how AI models work.
Copilot+ PCs Developer Guide | Microsoft Learn

Enjoy!