The Speedup-Test:

A Rigorous Statistical Protocol for Declaring Program Speedups with High Confidence



Saturday, April 24, 2010 - Afternoon

Organizer

Sid Touati - biography

Sid Touati got a DEA in computer science from ENS-Lyon (France) in 1998, then a PhD from University of Versailles in 2002. He is an assistant professor at the University of Versailles St Quentin en Yvelines since 2003, in charge of many research projects and PhD students. Currently, he is in a sabbatical year at the INRIA laboratory. His topics of interests are code optimisation for embedded VLIW processors and high performance multi-core architectures. Sid Touati relies on both fundamental research (formal) and empirical research (experimental).

Intented Audience

High performance computing, program performance evaluation and analysis, statistics.

Abstract

Numerous code optimisation methods are usually experimented by doing multiple observations of the initial and the optimised executions times in order to declare a speedup. Even with fixed input and execution environment, programs executions times vary in general, especially for toy/kernel benchrmaks. With the introduction of multi-core architectures, execution times variability is becoming increasingly unstable. So hence different kinds of speedups may be reported: the speedup of the average execution time, the speedup of the minimal execution time, the speedup of the median, etc. Many published speedups in the literature are observations of a set of experiments that do not guarantee reproducibility. In order to improve the reproducibility of the experimental results, this tutorial presents a rigorous statistical methodology regarding program performance analysis. We rely on well known statistical tests (Shapiro-wilk's test, Fisher's F-test, Student's t-test, Kolmogorov-Smirnov's test, Wilcoxon-Mann-Whitney's test) to study if the observed speedups are statistically significant or not. By fixing 0 < α < 1 a desired risk level, we are able to analyse the statistical significance of the average execution time as well as the median. We can also check if P( X > Y) > 1/2, the probability that an individual execution of the optimised code is faster than the individual execution of the initial code. Our methodology defines a consistent improvement compared to the usual performance analysis method in high performance computing as in \cite{Jain:1991:ACS,lilja:book}. We explain in each situation what are the hypothesis that must be checked to declare a correct risk level for the statistics. The Speedup-Test protocol certifying the observed speedups with rigorous statistics is implemented and will distributed for the tutorial as an open source tool based on the R software.

Topics

1. Introduction

2. Common observed non-rigorous experiments and statistics in the community of code optimisation and high performance computing

3. Reporting the different kinds of observed speedups

4. Analysing the statistical significance of the observed speedups

5. Measuring the confidence interval of the proportion of accelerated benchmarks

6. The Speedup-Test software (demo)

7. Discussion



Tutorial document and tool available at: http://hal.archives-ouvertes.fr/inria-00443839/