Every time you run a quantum circuit on real hardware, it costs QPU time and money. Variational algorithms like VQE and QAOA can require hundreds of thousands of circuit executions to converge — at $0.075–$0.90 per task on cloud QPUs, that adds up fast.
The good news: most default implementations are wildly shot-inefficient. With the right techniques, you can cut shot requirements by 50–90% without sacrificing result quality.
Why Shot Counts Are So High by Default
VQE computes expectation values of Pauli operators. A Hamiltonian is decomposed into a sum of Pauli terms, and each term requires a separate circuit execution. For a molecule like H₂ (4 qubits), there are ~15 Pauli terms. For larger molecules the count explodes:
| Molecule | Qubits | Pauli terms | Naive shots/iteration |
|---|---|---|---|
| H₂ | 4 | 15 | 15,000 |
| LiH | 12 | 631 | 631,000 |
| BeH₂ | 14 | 666 | 666,000 |
| H₂O | 14 | 1,086 | 1,086,000 |
With 200 optimizer iterations, H₂O naively requires 217 million shots. The techniques below reduce this by 80–95%.
Technique 1: Measurement Grouping (Biggest Win)
Many Pauli terms commute — they can be measured simultaneously in a single circuit execution rather than separately. Grouping commuting observables is the single biggest reduction available.
from qiskit.primitives import StatevectorEstimator
from qiskit_nature.second_q.mappers import JordanWignerMapper
from qiskit_algorithms import VQE
from qiskit_algorithms.optimizers import COBYLA
# Qiskit automatically groups commuting Paulis in the Estimator primitive
# This reduces shots from O(n_terms) to O(n_groups) — often 5-10x reduction
estimator = StatevectorEstimator()
# With PennyLane, use grouping explicitly:
import pennylane as qml
H = qml.Hamiltonian(coeffs, observables)
# Group commuting terms — usually reduces term count by 5-10x
groups = qml.grouping.group_observables(observables, grouping_type='qwc')
print(f"Original terms: {len(observables)}, Groups: {len(groups)}")
# Original terms: 631, Groups: 68 (for LiH)
Expected savings: 5–15× on typical chemistry Hamiltonians.
Technique 2: Shot-Frugal Optimizers
Classical optimizers like L-BFGS-B or ADAM assume noiseless function evaluations — they request more gradient evaluations than necessary when results are noisy. Shot-frugal optimizers allocate shots adaptively based on measurement variance.
from pennylane.optimize import AdaptiveOptimizer, ShotAdaptiveOptimizer
dev = qml.device("default.qubit", wires=4)
@qml.qnode(dev)
def circuit(params):
# ansatz
...
return qml.expval(H)
# ShotAdaptiveOptimizer: allocates more shots to high-variance directions
opt = ShotAdaptiveOptimizer(min_shots=10)
params = np.random.uniform(-np.pi, np.pi, n_params)
for i in range(100):
params, _, shots_used = opt.step_and_cost(circuit, params)
print(f"Step {i}: shots used = {shots_used}")
The optimizer starts with few shots per evaluation and increases them only when the gradient is uncertain. A typical VQE run uses 5–10× fewer shots vs. fixed-shot COBYLA.
Technique 3: Parameter-Shift Gradients (Use Fewer Evaluations)
The naive finite-difference gradient estimate (f(x+ε) - f(x))/ε has high variance at small ε and large bias at large ε. The parameter-shift rule gives exact gradients with just 2 circuit evaluations per parameter:
# PennyLane uses parameter-shift by default for gradients
@qml.qnode(dev, diff_method="parameter-shift") # 2 evals per param
def circuit(params):
...
# Compare to finite-difference (requires 1 eval per param but biased):
@qml.qnode(dev, diff_method="finite-diff") # 1 eval but approximate
# For large circuits, use "best" — PennyLane chooses adjoint on simulator,
# parameter-shift on hardware
@qml.qnode(dev, diff_method="best")
def circuit(params):
...
With n parameters, parameter-shift costs 2n evaluations per gradient step. Use gradient-free optimizers (COBYLA, SPSA, Nelder-Mead) when n is large.
Technique 4: SPSA — Stochastic Gradient Estimation
Simultaneous Perturbation Stochastic Approximation (SPSA) estimates the full gradient with just 2 circuit evaluations regardless of parameter count, by perturbing all parameters simultaneously:
from qiskit_algorithms.optimizers import SPSA
# SPSA: 2 evaluations per step regardless of parameter count
# vs parameter-shift: 2n evaluations per step
optimizer = SPSA(maxiter=300, learning_rate=0.1, perturbation=0.05)
# For 10 parameters:
# - Parameter shift: 2×10 = 20 evals/step × 300 steps = 6,000 total
# - SPSA: 2 evals/step × 300 steps = 600 total ← 10x reduction
Best when: circuit has many parameters (> 10). The tradeoff is slower convergence per step, but fewer total shots.
Technique 5: Early Termination + Variance Thresholding
Don't run circuits to completion when the result is already good enough:
from pennylane.optimize import AdamOptimizer
import numpy as np
opt = AdamOptimizer(stepsize=0.02)
params = init_params.copy()
prev_energy = float('inf')
for step in range(max_steps):
params, energy = opt.step_and_cost(circuit, params)
# Stop when change is below shot-noise floor
variance = 1.0 / np.sqrt(shots_per_eval) # shot noise floor
if abs(energy - prev_energy) < variance:
print(f"Converged at step {step}")
break
prev_energy = energy
Many VQE runs plateau early — continuing just wastes shots on noise fluctuations.
Technique 6: Warm Starting
Initialize QAOA or VQE parameters from a related classical solution rather than random:
# For QAOA on Max-Cut: warm start from a greedy classical solution
import networkx as nx
G = nx.from_edgelist(edges)
classical_cut = nx.algorithms.approximation.one_exchange(G)
# Map classical solution to initial QAOA angles
# γ₀ ≈ π/4 for a good cut, β₀ ≈ π/8
init_gamma = [np.pi / 4]
init_beta = [np.pi / 8]
# Warm-started QAOA typically converges in 30-50% fewer iterations
Combining Everything: A Practical VQE Template
import pennylane as qml
import numpy as np
from pennylane.optimize import ShotAdaptiveOptimizer
dev = qml.device("default.qubit", wires=n_qubits, shots=512)
# 1. Group commuting terms (5-10x reduction in circuit count)
grouped_H = qml.Hamiltonian(*qml.grouping.group_observables(H))
@qml.qnode(dev, diff_method="parameter-shift")
def ansatz(params):
# Hardware-efficient ansatz
for i in range(n_qubits):
qml.RY(params[i], wires=i)
for i in range(n_qubits - 1):
qml.CNOT(wires=[i, i + 1])
return qml.expval(grouped_H)
# 2. Use shot-adaptive optimizer
opt = ShotAdaptiveOptimizer(min_shots=50)
# 3. Warm start
params = warm_start_params(H)
# 4. Run with early stopping
for step in range(300):
params, energy, shots = opt.step_and_cost(ansatz, params)
if check_convergence(energy, shots):
break
print(f"Ground state energy: {energy:.4f} Ha")
Quick Reference: Shot Budget by Method
| Method | Shots/step | Best for |
|---|---|---|
| COBYLA + fixed shots | n_terms × shots | Small parameter count |
| Parameter-shift + Adam | 2n_params × shots | Differentiable circuits |
| SPSA | 2 × shots | Large parameter count |
| ShotAdaptiveOptimizer | Adaptive | General VQE |
| Grouped Paulis | ÷5–15× | Always apply first |
Apply measurement grouping first — it's the biggest single win and requires no changes to your optimizer or circuit.
Related: VQE with PennyLane · QPU access guide · HLQuantum