Trace Analysis Patterns (Part 113)

Recently we analyzed a few logs which ended with a specialized Activity Region from a subsystem that sets operational parameters. The problem description stated that the system became unresponsive after changing parameters in a certain sequence. Usually, for that system, when we stop logging (even after setting parameters) we end up with messages from some Background Components since some time passes between the end of setting parameters activity and the time the operator sends stop logging request:

However, in the problem case we see message flow stops right in the middle of parameter setting activity:

So we advised to check for any crashes or hangs, and, indeed, it was found that the system was actually experiencing system crashes, and we got memory dumps for analysis where we found Top Module from a 3rd-party vendor related to parameter setting activity.

Please also note an analogy here between normal thread stack traces from threads that are waiting most of the time and Spiking Thread stack trace caught up in the middle of some function.

We call this pattern Ruptured Trace after a ruptured computation.

Note, that if it is possible to restart the system and resume the same tracing we may get an instance of Blackout analysis pattern.

- Dmitry Vostokov @ + -

Leave a Reply

You must be logged in to post a comment.