Performance Profiling in Squeak Smalltalk

Finally, an explanation of how the MessageTally profiler in Squeak works. I've been meaning to ask someone for an explanation of what all this info means, but haven't quite got around to it yet. Thank you Andreas! Quoting here in case that site becomes unavailable...


The primary tool to measure performance, both for Squeak in general as
well as for Croquet, is MessageTally. MessageTally acts on a particular
expression (MessageTally spyOn:["your expression here"]) and provides
the following information:
a) A hierarchy showing how much time was spent where in the computation.
b) A list showing which amount of time was spent in which leaf nodes
c) Memory statistics, incl. the growth rate, garbage collection info etc.

MessageTally uses a technique known as "pc-sampling". What that means is
that a high-priority process is started which (based on a timer) samples
the call stack of the process and allocates a time value (typically
whatever it's using for sampling).

It is important to notice that this is a statistical measure - given a
large enough number of samples, the reported result will be
statistically valid. On the other hand, small numbers of samples are
typically statistically invalid - a single garbage collection can lead
to a major change in an otherwise insignificant part of the computation.

Here is an example:

    MessageTally spyOn:[100 raisedTo: 1000].

resulted in the following output:

100.0% {3ms} SmallInteger(Number)>>raisedTo:
   100.0% {3ms} SmallInteger(Number)>>raisedToInteger:
     50.0% {2ms} LargePositiveInteger>>*
     50.0% {2ms} primitives

These results claim that we're spending 50% of the overall time in
Multiplying large integers. Which is completely bogus since we have only
two samples (the default sampling rate is 1ms). If we run this for a
longer period of time, like here:

   MessageTally spyOn:[1000 timesRepeat:[100 raisedTo: 1000]]

100.0% {1007ms} SmallInteger(Number)>>raisedTo:
   99.9% {1006ms} SmallInteger(Number)>>raisedToInteger:
     96.8% {975ms} primitives
     3.0% {30ms} LargePositiveInteger>>*

We see that indeed, we only spend roughly 3% in multiplying large
integers. The other 97% are spent in "primitives" which, unfortunately,
are not broken out separately in the measures (however, if such a
measure is critical, then the primitives can to be factored into
separate methods which then call the primitives themselves - this allows
message tally to "see" the method frames and report the usage accordingly).

In a more complex situation, the percentage tree is typically useful to
figure out roughly the areas in which time is spent which can then be
measured individually.


The **Leaves* reported by MessageTally is the amount of time spent in a
method WITHOUT the time spent in the methods called from that method. In
our above example the leaves are reported as:

96.8% {975ms} SmallInteger(Number)>>raisedToInteger:

which is the overall time spent in Number>>raisedToInteger: (1006ms)
minus the time spent in LargePositiveInteger>>* (30ms). If a method
shows up in the leaves it typically means that this method is
computationally expensive or just gets called a large number of times.


The **Memory** statistics shown in MessageTally provide information
about how various memory regions have changed:

- old: Describes the "old space" in memory. This is the portion that
will not be included in incremental garbage collection but only during a
full garbage collection. See also the "tenure" information below.
Extensively growing old space typically means that there is a problem
with the allocation patterns or garbage collector settings.

- young: Describes the "young space" in memory, e.g., the region handled
by the incremental garbage collector. Changes in young space are usually
not relevant.

- used: Total amount of used memory.

- free: Total amount of unused memory.


The *GCs* statistics provide information about the garbage collector:

- full: The number of "full" garbage collections and time spent in
those. Automatic full garbage collections should be VERY rare, they are
a sign that you're allocating huge amounts of memory repeatedly. Note
that at times these garbage collections are manually triggered though
(like in the checkpointing process) and a normal effect of the operation.

- incr: The number of "incremental" garbage collections and time spent
in those. Generally speaking, IGCs should be quick (avg. < 2ms) and
frequent (several times a second). However, the total time spent in IGCs
should generally be less than 10%, otherwise this is a sign of a problem
with the allocation patterns. If the time spent in IGCs exceeds 25%
something is *definitely* wrong.

- tenures: Tenuring occurs when the number of surviving objects in young
space exceeds a certain threshold. In this case, the young space
boundary is increased (which adds to the size of "old space" mentioned
above). Tenuring typically means that the working set of the application
hasn't been reached. For example, in a space construction we would
expect frequent tenuring until the space is fully constructed. Once the
working set has been reached, tenuring should be rare to non-existent.
Frequent tenuring in such cases means that the garbage collection
parameters need to be adjusted.

- root table overflows: Root table overflows describe the (rare) case
that the number of "roots" for the incremental garbage collector will
overflow the internal table. This will force an immediate garbage
collection plus tenuring. The measure is provided in order to be able to
find such rare cases (which otherwise leave you wondering why the system
is running full GCs all the time for no apparent reason)

Multiple processes
Historically, MessageTally measured and reported only the call stack of
the current process. This had the disadvantage that if time was spent in
a different process, it would be attributed to a bogus frame in current
thread. For Croquet, I have changes this such that *all* processes are
reported in order to be able to see "what else" is going on.

For example, if we measure an expression like here:

   MessageTally spyOn:[(Delay forSeconds: 5) wait]

we will find that all of the time is reported here:

99.5% {4975ms} ProcessorScheduler class>>startUp
   99.5% {4975ms} ProcessorScheduler class>>idleProcess

The idle process is the process that is being activated when no other
activity occurs (the implementation of the idle process requests the VM
to sleep for a millisecond so that the VM isn't running a busy).
Generally, time reported in idleProcess is time spent "doing nothing"
(e.g., waiting for some activity).

The other relevant system process that may show up is the finalization
process. If the finalization process shows up, it means we're having a
problem with too many weak references being finalized. This has been a
*big* problem in the past, so keep an eye on it.

   - Andreas
about me|good books|popular posts|atom|rss