Treatment programmes for juvenile delinquents Delinquency 0. However, an analysis of a standard spelling test used in Britain Vincent and Crumpler, suggests that the increase in a spelling age from 11 to 12 corresponds to an effect size of about 0.

Maths and English have standard deviations of between 1. In the context of secondary schools therefore, introducing a change in practice whose effect size was known to be 0.

Even Cohen's 'small' effect of 0. Olejnik and Algina give a similar example based on the Iowa Test of Basic Skills Finally, the interpretation of effect sizes can be greatly helped by a few examples from existing research. Table II lists a selection of these, many of which are taken from Lipsey and Wilson The examples cited are given for illustration of the use of effect size measures; they are not intended to be the definitive judgement on the relative efficacy of different interventions.

In interpreting them, therefore, one should bear in mind that most of the meta-analyses from which they are derived can be and often have been criticised for a variety of weaknesses, that the range of circumstances in which the effects have been found may be limited, and that the effect size quoted is an average which is often based on quite widely differing values.

It seems to be a feature of educational interventions that very few of them have effects that would be described in Cohen's classification as anything other than 'small'. This appears particularly so for effects on student achievement.

No doubt this is partly a result of the wide variation found in the population as a whole, against which the measure of effect size is calculated. One might also speculate that achievement is harder to influence than other outcomes, perhaps because most schools are already using optimal strategies, or because different strategies are likely to be effective in different situations - a complexity that is not well captured by a single average effect size.

What is the relationship between 'effect size' and 'significance'? Effect size quantifies the size of the difference between two groups, and may therefore be said to be a true measure of the significance of the difference.

If, for example, the results of Dowson's 'time of day effects' experiment were found to apply generally, we might ask the question: However, in statistics the word 'significance' is often used to mean 'statistical significance', which is the likelihood that the difference between the two groups could just be an accident of sampling.

If you take two samples from the same population there will always be a difference between them. The statistical significance is usually calculated as a 'p-value', the probability that a difference of at least the same size would have arisen by chance, even if there really were no difference between the two populations.

For differences between the means of two groups, this p-value would normally be calculated from a 't-test'.

There are a number of problems with using 'significance tests' in this way see, for example Cohen, ; Harlow et al. The main one is that the p-value depends essentially on two things: One would get a 'significant' result either if the effect were very big despite having only a small sample or if the sample were very big even if the actual effect size were tiny.

It is important to know the statistical significance of a result, since without it there is a danger of drawing firm conclusions from studies where the sample is too small to justify such confidence.

However, statistical significance does not tell you the most important thing: One way to overcome this confusion is to report the effect size, together with an estimate of its likely 'margin for error' or 'confidence interval'.

What is the margin for error in estimating effect sizes? Clearly, if an effect size is calculated from a very large sample it is likely to be more accurate than one calculated from a small sample.

This 'margin for error' can be quantified using the idea of a 'confidence interval', which provides the same information as is usually contained in a significance test: If this confidence interval includes zero, then that is the same as saying that the result is not statistically significant.

Using a confidence interval is a better way of conveying this information since it keeps the emphasis on the effect size - which is the important information - rather than the p-value.

He rose rapidly in the ranks of the Soviet Army to become the . Effect size is a simple way of quantifying the difference between two groups that has many advantages over the use of tests of statistical significance alone.

Effect size emphasises the size of the difference rather than confounding this with sample size. However, primary reports rarely mention.

