Software Diagnostics Library » Blog Archive » Crash Dump Analysis Patterns (Part 278, Linux)

Crash Dump Analysis Patterns (Part 278, Linux)

This is a Linux pattern variant of the Windows Spiking Interrupts memory analysis pattern. The Windows pattern describes high interrupt and DPC activity that causes perceived freezes, response lag, or high kernel CPU time; the original pattern uses per-processor interrupt counts, DPC/interrupt time, and DPC queue data, and stresses comparisons with a normal system because raw counters depend on uptime.

On Linux, the closest mapping is (including associated Paratext):

Hardware interrupts - hard IRQs, vectors, MSI/MSI-X, per-device IRQ lines

DPC - softirq, tasklet, NAPI poll, threaded IRQ, sometimes workqueue

DPC delegate thread / idle-context DPC execution - ksoftirqd/N, irq/<irq>-<name>, kworker/*

!prcb, DPC counts, interrupt time - /proc/interrupts, /proc/softirqs, /proc/stat, crash> irq -s, stacks, logs

Note: The mapping of DPC delegate thread / idle-context DPC execution to kworker/* is an approximation. Workqueues run in kernel thread context and can execute deferred work similarly to how idle-context DPCs offload work from the interrupt path, but workqueues are a general-purpose deferral mechanism, not specifically an interrupt bottom-half mechanism. Unlike softirqs and tasklets, which exist primarily to defer interrupt handler work, a kworker stall may have nothing to do with interrupt pressure. When investigating spiking interrupt symptoms, kworker threads appearing in stacks should be treated as a secondary signal only, and their work items examined individually (via crash> bt on the kworker thread) to determine whether they originate from IRQ or softirq context before drawing conclusions about interrupt-driven CPU saturation.

As we see, Linux has no exact DPC object model. The strongest Linux analogy is when a CPU is consumed by hard IRQs and/or interrupt-related bottom-half work, such as softirqs, tasklets, NAPI polling, threaded IRQs, or interrupt-originated workqueue processing.

The crash tool irq command and its various options may show when a single CPU is absorbing a device interrupt stream:

crash> irq -s
           CPU0       CPU1
[...]
 77:     513461          0  MSI io-request
[...]

Then we check whether that CPU was also the crash CPU, a soft-lockup CPU, an RCU-stall CPU, or the CPU running ksoftirqd/N using the follow-up commands such as:

crash> ps | grep -E "ksoftirqd|irq/|kworker|rcu"

crash> runq

crash> bt -a

crash> bt -E

The last command searches IRQ stacks, and on x64 also exception stacks, for possible exception frames on supported architectures

If ksoftirqd/* is running or runnable and CPU* also has rapidly accumulated NET_RX, NET_TX, BLOCK, TIMER, or RCU softirq work, that is the Linux equivalent of DPC pressure. softirqs are deferred interrupt work that can run after an interrupt handler or from ksoftirqd; when limits are reached, pending softirqs are run from ksoftirqd. Also, ksoftirqd/* executes softirq handlers when threaded or under heavy load, and irq/<irq>-<name> handles threaded interrupts. See: https://docs.kernel.org/admin-guide/kernel-per-CPU-kthreads.html

You can also see interrupt-pressure symptoms in the kernel log:

crash> log

Typical diagnostic messages include these fragments:

watchdog: BUG: soft lockup - CPU#N stuck NMI watchdog: Watchdog detected hard LOCKUP on cpu N rcu: INFO: rcu_sched detected stalls on CPUs/tasks irq XX: nobody cared Disabling IRQ #XX NETDEV WATCHDOG: ... transmit queue timed out

RCU stall logs are particularly useful because the kernel documentation explicitly lists CPUs looping with interrupts disabled, preemption disabled, bottom halves disabled, or periodic interrupt handlers taking too long as possible causes. It also says reproducible massive hard/soft interrupt cases can be narrowed using /proc/interrupts. See: https://docs.kernel.org/RCU/stallwarn.html

For RCU definition, see https://en.wikipedia.org/wiki/Read-copy-update

Below is the guide for collecting supplemental paratext information from the live system:

// Hard IRQ distribution

cat /proc/interrupts watch -n 1 cat /proc/interrupts

/proc/interrupts records the number of interrupts per CPU per I/O device and, on x64, also includes internal interrupts such as NMI, LOC, TLB, RES, and CAL. See: https://man7.org/linux/man-pages/man5/proc_interrupts.5.html

// SoftIRQ distribution

cat /proc/softirqs watch -n 1 cat /proc/softirqs

Common rows include:

NET_RX      receive-side network pressure
NET_TX      transmit-side network pressure
BLOCK       block I/O completion pressure
IRQ_POLL    block polling pressure
TIMER       timer callback pressure
HRTIMER     high-resolution timer pressure
SCHED       scheduler/IPI/load-balancing pressure
RCU         RCU callback pressure
TASKLET     legacy driver deferred work

The /proc/stat softirq line reports the count of softirqs serviced since boot, and /proc/stat also reports CPU time spent servicing irqs and softirqs. See: https://www.kernel.org/doc/html/v6.9/filesystems/proc.html

// IRQ and softirq CPU time

awk ' /^cpu[0-9]/ { printf "%s irq_jiffies=%s softirq_jiffies=%s\n", $1, $7, $8 }' /proc/stat

while true; do date awk '/^cpu[0-9]/ {printf "%s irq=%s softirq=%s\n",$1,$7,$8}' /proc/stat sleep 1 done

High irq time points more toward hard interrupt handling. High softirq time points more toward deferred interrupt work such as NAPI, timers, block completions, scheduler softirqs, or RCU.

// IRQ affinity

Summary:

Spiking Interrupt activity is suspected when response latency or apparent freezes coincide with disproportionate hard IRQ or softirq activity on one or more CPUs. In a core dump, the pattern appears as high per-CPU IRQ counters, IRQ-affinity skew, active IRQ/softirq/ksoftirqd/threaded-IRQ stacks, and possible watchdog or RCU-stall messages. In live /proc, the pattern appears as rapidly increasing deltas in /proc/interrupts, /proc/softirqs, and /proc/stat irq/softirq CPU time.

- Dmitry Vostokov @ DumpAnalysis.org + TraceAnalysis.org -

This entry was posted on Saturday, May 30th, 2026 at 5:06 pm and is filed under Core Dump Analysis, Crash Dump Analysis, Crash Dump Patterns, Kernel Memory Dump Analysis. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

Comments are closed.