Skip to content

Install CUDA to Enable GPU Acceleration for LLM Apps

After downloading LLM Apps with LM Downloader, many users have encountered this issue: even though the computer has high-end hardware, the software runs surprisingly slow, with the CPU maxed out while the powerful GPU sits mostly idle. If you do have a high-performance dedicated GPU, the problem is likely due to improper installation or configuration of the required acceleration software.

For example, when running ComfyUI for image/video generation with complex prompts, you might endure long wait times. The task manager shows the CPU working overtime, while the GPU sits there 'twiddling its thumbs.'

This happens because such software typically requires massive computational workloads. While CPUs are versatile, they're far less efficient than GPUs at handling these large-scale parallel computing tasks. Without proper configuration of acceleration tools, the GPU's potential goes untapped—resulting in sluggish performance.

Currently, utilizing NVIDIA GPUs for large model training and inference acceleration remains the mainstream choice. Since its launch in 2006, CUDA has undergone years of iteration and evolved into a mature ecosystem featuring a tightly integrated "hardware-software-developer" closed loop with high uniformity.

This article primarily focuses on installing CUDA for NVIDIA graphics cards in Windows systems to accelerate large model software. It's worth noting that running AMD's ROCm on Windows requires WSL (Windows Subsystem for Linux), resulting in a significantly different installation process compared to CUDA. We will provide detailed documentation on ROCm setup in a separate guide.

Many AI applications (such as ComfyUI, LLaMA, Stable Diffusion, and Spark-TTS) are developed using PyTorch. This creates an ecosystem similar to Android - since it's widely adopted, software naturally supports it by default. When you use AI for image generation or chatting, the software leverages PyTorch to utilize your GPU (requiring CUDA for NVIDIA graphics cards), resulting in significantly faster and smoother performance.

However, if your computer lacks an NVIDIA GPU (or hasn't installed CUDA), the software may fall back to CPU-only operation, which is substantially slower.

Introduction to Chip Manufacturers' Tools

NVIDIA CUDA

Introduced in 2006, CUDA (Compute Unified Device Architecture) is NVIDIA's proprietary parallel computing platform designed to harness GPU acceleration for complex computations. It provides a comprehensive suite of development tools and libraries, enabling developers to efficiently leverage NVIDIA GPUs for high-performance computing. With a mature ecosystem, CUDA is widely adopted in academia and industry, supporting major deep learning frameworks and scientific computing libraries. However, it is exclusive to NVIDIA GPUs.

AMD ROCm

Launched in 2015, ROCm (Radeon Open Compute) is AMD’s open-source alternative to CUDA, targeting high-performance computing (HPC) and large-scale GPU acceleration. It includes developer tools, software frameworks, libraries, compilers, and programming models. While primarily optimized for AMD GPUs, ROCm is gradually expanding support for other hardware vendors. Though relatively newer, its ecosystem is growing rapidly and already supports multiple deep learning frameworks.

Intel’s AI Tools (OpenVINO™, oneAPI, IPEX)

Intel’s Core Ultra processors (e.g., Core Ultra 200 series) deliver robust AI capabilities, offering up to 120 TOPS of compute power—sufficient for locally deployed large-scale AI models. As the leader in x86 processors, Intel prioritizes compatibility across Windows/Linux and frameworks like PyTorch/TensorFlow, adopting an open-standard, modular toolset approach:

  • OpenVINO™: Built on open VPU APIs, it supports cross-vendor hardware (e.g., NPUs from Intel/AMD/ARM).
  • oneAPI: An open, unified programming model (e.g., SYCL) for heterogeneous computing (CPU/GPU/FPGA).
  • IPEX & Neural Compressor: Framework extensions optimized for PyTorch/TensorFlow’s native interfaces.

Principles of Acceleration Inference

CUDA

NVIDIA GPUs feature a massive number of cores capable of processing multiple computational tasks simultaneously. CUDA accelerates performance by breaking down tasks into smaller subtasks and distributing them across GPU cores for parallel execution. For instance, during the training and inference of large AI models—which involve extensive matrix operations—CUDA leverages the GPU’s parallel architecture to complete these computations far faster than a CPU’s sequential processing.

ROCm

Similarly, ROCm harnesses the parallel computing power of AMD GPUs. It employs the HIP (Heterogeneous-compute Interface for Portability) programming model to distribute tasks across AMD GPU cores for concurrent processing. Additionally, ROCm includes highly optimized libraries like rocBLAS and rocFFT, tailored for machine learning and high-performance computing (HPC) workloads. These libraries maximize AMD GPU efficiency, significantly speeding up large AI model operations.

Checking NVIDIA Graphics Card Information

Windows

Right-click on an empty area of your desktop and select "Display settings", then scroll down and click "Advanced display settings". Here, you can view the monitors connected to your system and the corresponding graphics adapter information.

 

 

Via Device Manager

Press Win + X to open the system menu, then select "Device Manager". Locate and expand the "Display adapters" section—the listed graphics card name will help determine whether it's a dedicated GPU. Typically, dedicated graphics cards have specific model names, often including the brand and series.

Example Scenarios:

  • In the following image, the display adapter only shows an Intel integrated graphics card.
  • In this image, the display adapter lists both an AMD integrated graphics card and an NVIDIA GeForce RTX 5060 Ti dedicated GPU.

Downloading and Installing Drivers

Go to the NVIDIA website to download the drivers. It's recommended to use the latest version whenever possible. If you don't prioritize immediate support for new games, choose the "Studio Drivers" for greater stability and reliability.

Select your region accordingly if you’re outside these areas.

Installation Tip:
If you have no specific requirements, choose the "Express" installation option for a hassle-free setup.

 

Checking GPU CUDA Compatibility

Verifying CUDA Support and Version

For NVIDIA graphics cards, there are two methods to check the compatible CUDA version:

Method 1: NVIDIA Control Panel

  1. Open NVIDIA Control Panel
  2. Click "System Information"
  3. Navigate to the "Components" tab
  4. Check the maximum CUDA version supported by your current driver

Note: If you see "NVIDIA CUDA 12.9.76 driver", this indicates support for CUDA version 12.9.

Method 2: Command Line

  1. Open Command Prompt or PowerShell
  2. Enter the command: nvidia-smi
  3. Locate the "CUDA Version" field in the output

Key information: "CUDA Version: 12.9" means your GPU supports CUDA 12.9.

Important:
The supported CUDA version may vary between systems. Please verify based on your specific hardware configuration.

Downloading and Installing CUDA

1. Downloading CUDA

  1. Visit the official CUDA download page:
    https://developer.nvidia.com/cuda-downloads
  1. Select the appropriate CUDA version based on:

    • Your GPU's supported CUDA version
    • Your operating system
  2. For example, Windows users should:

    • Select: Windows → x86_64 → Version 11 → exe (local)
    • Note:
      • exe (local): Complete package (~3.31GB), recommended for offline installation
      • exe (network): Small installer (~13.9MB), requires internet connection during installation

2. Installing CUDA

  1. Run the downloaded EXE installer

  2. Wait for:

    • File extraction to complete
    • System compatibility check to finish
  3. Accept the license agreement ("Agree and Continue")

  4. Choose installation type:

    • Express Installation: Recommended for most users (default settings)
    • Custom Installation: For advanced users (can deselect unnecessary components)
  5. Complete the installation and restart your computer if prompted

 

What to Do When This Option Appears

If you see the "CUDA Visual Studio Integration" option during installation:

  1. For Most Users (No Visual Studio Installed):

    • Simply check the box and click "Next"
    • Note: This may appear if Visual Studio isn't detected - most AI/ML applications will work fine without it
  2. For Developers (Optional):

    • If you plan to do CUDA programming in Visual Studio:
      1. Ensure Visual Studio 2019/2022 is installed first
      2. Then re-run the CUDA installer
      3. Select this component
  1. Verify Installation: Open the command prompt and enter "nvcc --version". If the CUDA version information is displayed, it means the installation was successful.
C:\Users\LMD>nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Wed_Apr__9_19:29:17_Pacific_Daylight_Time_2025
Cuda compilation tools, release 12.9, V12.9.41
Build cuda_12.9.r12.9/compiler.35813241_0

If previously installed apps still fail to enable GPU acceleration, please reinstall the apps using LM Downloader. Reinstalling will not delete your data and model files. However, deleting the apps often removes the models and related data files.

If you still encounter issues, please contact our technical support team. tech@daiyl.com