Home/Blog/Free Quantum Simulators Compared: Qiskit Aer vs Cirq vs PennyLane vs CUDA-Q
SimulatorsPerformanceGPU

Free Quantum Simulators Compared: Qiskit Aer vs Cirq vs PennyLane vs CUDA-Q

A practical comparison of the major free quantum simulators — performance, qubit limits, GPU support, and when to use each one.

FreeQuantumComputing
·· 7 min read

Quantum simulators let you test circuits without waiting in hardware queues or paying per shot. But with six major simulators available for free, which should you use? Here's a practical breakdown.

The Quick Answer

SimulatorBest forMax qubits (CPU)GPU?
Qiskit AerGeneral purpose~30 (statevector)✅ AerCuda
Cirq SimulatorCirq circuits, density matrix~25
PennyLane default.qubitQML, gradients~20✅ lightning.gpu
PennyLane lightning.qubitFast CPU simulation~30✅ lightning.gpu
NVIDIA CUDA-QLarge circuits, speed~34 single GPU✅ native
Braket LocalSimulatorBraket circuits, free~25

Qiskit Aer: The Workhorse

Qiskit Aer is the most feature-complete free simulator. It supports multiple simulation methods:

  • statevector_simulator: Exact simulation up to ~30 qubits. Memory scales as 2ⁿ complex numbers (16 GB RAM needed for 30 qubits).
  • qasm_simulator: Shot-based sampling with noise model support
  • density_matrix: Simulates mixed states and open quantum systems
  • mps: Matrix Product State — simulates circuits with limited entanglement up to hundreds of qubits
  • stabilizer: Clifford circuits only, but scales to thousands of qubits in polynomial time
from qiskit_aer import AerSimulator

# Default: automatic method selection
sim = AerSimulator()

# Force a specific method
sim_sv = AerSimulator(method='statevector')
sim_dm = AerSimulator(method='density_matrix')
sim_mps = AerSimulator(method='matrix_product_state')

When to use Aer: For any Qiskit workflow, noise modeling, or when you want to match IBM hardware behavior closely.

PennyLane: Best for Quantum ML

PennyLane shines when you need differentiable circuits. Its default.qubit simulator computes exact gradients via the parameter-shift rule, enabling gradient-based optimization of quantum circuits:

import pennylane as qml
import numpy as np

dev = qml.device("default.qubit", wires=2)

@qml.qnode(dev)
def circuit(theta):
    qml.RX(theta[0], wires=0)
    qml.RY(theta[1], wires=1)
    qml.CNOT(wires=[0, 1])
    return qml.expval(qml.PauliZ(0))

# Automatic gradient computation
grad_fn = qml.grad(circuit)
theta = np.array([0.5, 1.2])
print(grad_fn(theta))  # exact gradient

For faster simulation without gradients, use lightning.qubit (C++ backend, ~10× faster than default.qubit):

pip install pennylane-lightning
dev = qml.device("lightning.qubit", wires=20)

When to use PennyLane: Variational algorithms (VQE, QAOA), quantum ML, any workflow requiring circuit gradients.

NVIDIA CUDA-Q: When Speed Matters

CUDA-Q provides GPU-accelerated simulation that can be 100–10,000× faster than CPU simulators for large circuits. If you have any NVIDIA GPU, this is the right choice for 25+ qubit circuits:

import cudaq

@cudaq.kernel
def large_circuit(n: int):
    qvec = cudaq.qvector(n)
    h(qvec[0])
    for i in range(n - 1):
        cx(qvec[i], qvec[i + 1])
    mz(qvec)

# Run on GPU (specify 'nvidia' target)
cudaq.set_target('nvidia')
counts = cudaq.sample(large_circuit, 30, shots_count=10000)
print(counts)

Performance benchmark (GHZ circuit, 28 qubits, 1000 shots):

  • Qiskit Aer CPU: ~45 seconds
  • PennyLane lightning.qubit: ~30 seconds
  • CUDA-Q (A100 GPU): ~0.8 seconds

When to use CUDA-Q: Any circuit with 25+ qubits, performance-critical simulations, multi-GPU workloads.

Cirq: Noise Modeling and NISQ Research

Google Cirq includes three simulators:

import cirq

# Exact statevector simulation
sim = cirq.Simulator()

# Density matrix with noise
noise_model = cirq.ConstantQubitNoiseModel(
    cirq.depolarize(p=0.01)
)
noisy_sim = cirq.DensityMatrixSimulator(noise=noise_model)

# Clifford circuits only (but exponentially faster for stabilizer states)
clifford_sim = cirq.CliffordSimulator()

When to use Cirq: NISQ noise modeling, stabilizer circuit research, Google AI Quantum workflows.

Amazon Braket LocalSimulator: Isolation and Portability

The Braket SDK includes a free local simulator that mirrors the cloud API exactly:

from braket.devices import LocalSimulator
from braket.circuits import Circuit

device = LocalSimulator()

circuit = Circuit()
circuit.h(0)
circuit.cnot(0, 1)
circuit.probability()

task = device.run(circuit, shots=1000)
result = task.result()
print(result.measurement_counts)

When to use Braket local: You're building for AWS deployment and want local dev/test that matches the cloud API.

The HLQuantum Shortcut

If you need to run the same circuit on multiple simulators for benchmarking or comparison, HLQuantum makes this trivial:

import hlquantum as hlq
import time

qc = hlq.Circuit(28)
qc.h(0)
for i in range(27):
    qc.cx(i, i + 1)
qc.measure_all()

for backend in ["qiskit", "pennylane", "cudaq"]:
    t0 = time.time()
    result = hlq.run(qc, shots=1000, backend=backend)
    print(f"{backend}: {time.time() - t0:.2f}s")

One circuit, three backends, directly comparable results.