The Speedup-Test:

A Rigorous Statistical Protocol for Declaring Program Speedups with High Confidence

Saturday, April 24, 2010 - Afternoon

Organizer

Sid Touati - biography: Sid Touati got a DEA in computer science from ENS-Lyon (France) in 1998, then a PhD from University of Versailles in 2002. He is an assistant professor at the University of Versailles St Quentin en Yvelines since 2003, in charge of many research projects and PhD students. Currently, he is in a sabbatical year at the INRIA laboratory. His topics of interests are code optimisation for embedded VLIW processors and high performance multi-core architectures. Sid Touati relies on both fundamental research (formal) and empirical research (experimental).

Intented Audience

High performance computing, program performance evaluation and analysis, statistics.

Abstract

Numerous code optimisation methods are usually experimented by doing multiple observations of the initial and the optimised executions times in order to declare a speedup. Even with fixed input and execution environment, programs executions times vary in general, especially for toy/kernel benchrmaks. With the introduction of multi-core architectures, execution times variability is becoming increasingly unstable. So hence different kinds of speedups may be reported: the speedup of the average execution time, the speedup of the minimal execution time, the speedup of the median, etc. Many published speedups in the literature are observations of a set of experiments that do not guarantee reproducibility. In order to improve the reproducibility of the experimental results, this tutorial presents a rigorous statistical methodology regarding program performance analysis. We rely on well known statistical tests (Shapiro-wilk's test, Fisher's F-test, Student's t-test, Kolmogorov-Smirnov's test, Wilcoxon-Mann-Whitney's test) to study if the observed speedups are statistically significant or not. By fixing 0 < α < 1 a desired risk level, we are able to analyse the statistical significance of the average execution time as well as the median. We can also check if P( X > Y) > 1/2, the probability that an individual execution of the optimised code is faster than the individual execution of the initial code. Our methodology defines a consistent improvement compared to the usual performance analysis method in high performance computing as in \cite{Jain:1991:ACS,lilja:book}. We explain in each situation what are the hypothesis that must be checked to declare a correct risk level for the statistics. The Speedup-Test protocol certifying the observed speedups with rigorous statistics is implemented and will distributed for the tutorial as an open source tool based on the R software.

Topics

1. Introduction

Known hints for making a research result non reproducible
Why program execution times vary
Related literature on statistical performance analysis

2. Common observed non-rigorous experiments and statistics in the community of code optimisation and high performance computing

Biased experiements
Outliers elimination
Using statistical tests without checking the hypothesis
Confusion between continuous and discrete data

3. Reporting the different kinds of observed speedups

Computing a speedup for a single program with a single data input
Reporting the overall speedup and performance gain of a set of benchmarks

4. Analysing the statistical significance of the observed speedups

The speedup of the observed average execution time
The speedup of the observed median execution time, as well as individual runs

5. Measuring the confidence interval of the proportion of accelerated benchmarks

6. The Speedup-Test software (demo)

7. Discussion

Why the observed minimal execution time is not necessarily a good statistical estimation of program performance?
Hypothesis testing in statistics and probablilty theory
What is a reasonable large sample? Observing the central limit theorem in practice

Tutorial document and tool available at: http://hal.archives-ouvertes.fr/inria-00443839/