Becky Conway, Author at Science for Sport

Statistical Significance

Becky Conway — Sun, 17 Dec 2017 08:30:03 +0000

Contents of Article

Summary
Why is research important in sports?
What is statistical significance?
Why is statistical significance important?
How is statistical significance calculated?
Is future research needed on statistical significance?
Conclusion
References
About the Author

Summary

As sport continues to grow and the line between winning and losing becomes progressively thinner, the need for evidence-based research is increasing. Evidence-based research adds to the academic credibility of sport development by challenging knowledge and improving our understanding. However, one issue highlighted is the ability of some practitioners to effectively interpret research findings.

Statistical significance is a scientific method that helps to determine whether reported research findings are actually true. Statistical evidence, therefore, contributes to our level of confidence in research findings, rather than relying on human judgement or bias.

However, simply reporting a significant value does not provide sufficient evidence to make a scientific claim, as it does not tell us the magnitude of the reported difference. Therefore, statistical significance should be used in conjunction with effect size to provide a greater understanding of research findings.

Why is research important in sports?

With increased media exposure and ever-growing fan bases, businesses (e.g. sports teams) in the world of sport are striving to deliver world-class results and performance (1). The resulting pressure on professional athletes to perform well is high, ensuring the need for optimal development from all its practitioners and a competitive edge against the opposition (2,3).

However, optimal performance can only be achieved when adequate knowledge is provided from supporting disciplines (e.g. sports science). Consequently, the demand for evidence-based research is increasing, with the ultimate aim to evaluate the efficacy of sports programmes (4).

This expanding evidence base is adding to the academic credibility of sports development by challenging knowledge and improving our understanding of issues that determine the value and impact of interventions for developing sport (5). By focusing on evidence-based research, practitioners can criticise, with reasonable confidence, the success of a programme in relation to its objectives (6). Ultimately, this can enhance the base for future developments and aid in decision-making regarding the allocation of resources (e.g. time, budget and equipment) (7).

What is statistical significance?

Evidence-based practice is supposed to enhance practical decision-making, but interpreting research is often difficult for some practitioners (8). As such, clinical research is only of value if it is properly interpreted (8).

Underpinning many scientific conclusions is the concept of ‘statistical significance’, which is essentially a measure of whether the research findings are actually true. In other words, statistical significance is the probability that the observed difference between two groups is due to chance or some factor of interest (9, 10). When a finding is significant, it simply means that you can feel confident it is real, not that you just got lucky in choosing the sample.

The most common method of statistical testing is using a specified statistical model to test the null hypothesis (Null Hypothesis Significance Testing; NHST) against a predetermined level of significance (11). This method is essentially generated from four components:

Null Hypothesis

The null hypothesis postulates the absence of an effect (e.g. no relationship between variables, no difference between groups, or no effect of treatment) (9,12). For example, in reality, there is no association between caffeine consumption and reaction times. This is the formal basis for testing statistical significance. By starting with the proposition that there is no association, statistical tests can estimate the probability that an observed variation is due to chance or some factor of interest (13).

Alternative Hypothesis

The alternative hypothesis is the proposition that there is an association between the predictor and outcome variable (13). For example, there is an association between caffeine consumption and reaction times.

Statistical Model

The statistical model is the statistical test chosen to analyse the data and is constructed under a set of assumptions that must be met in order for valid conclusions about the null hypothesis to be made (9). Examples of statistical tests include Independent t-test, ANOVA, and Pearson’s correlation.

Level of Significance

A predetermined level of significance allows for the null hypothesis to either be rejected or accepted (11). The significance level that is widely used in academic research is 0.05, which is often reported as ‘p = 0.05’ or ‘α = 0.05’. The null hypothesis is rejected in favour of the alternative hypothesis if the calculated p-value is less than the predetermined level of significance. For instance, if you were to analyse a set of data looking at reaction times following caffeine consumption, with the resulting significance value being p = 0.03, you are able to reject the null hypothesis and accept the alternative hypothesis, on the basis that all assumptions for the statistical model were met. This is because, the smaller the p-value, the greater the statistical incompatibility of the data with the null hypothesis (14). In other words, the smaller the p-value, the more unusual the data would be if every single assumption were correct (14).

There is a common misconception that lower p-values are associated with having a stronger treatment effect than those with higher p-values (15). For example, an outcome of 0.01 is often interpreted as having a stronger treatment effect than an outcome of 0.05. Whilst this is true if we can be certain that every assumption was met, a smaller p-value does not tell us which assumption, if any, is incorrect. For example, the p-value may be very small because, indeed, the targeted hypothesis is false; however, it may instead be very small because the study protocols were violated (14). As a result, the p-value tells us nothing specifically related to the hypothesis unless we are absolutely positive that every other assumption used for its computation is correct (14) In other words, a lower p-value is not synonymous with importance.

Therefore, we must take caution when accepting or rejecting the null hypothesis and should not be taken as proof that the alternative is indeed valid (12).

Although the use of the p-value as a statistical measure is widespread, the sole use and misinterpretation of statistical significance has led to a large amount of misuse of the statistic and thus has resulted in some scientific journals discouraging the use of p-values (11). For instance, NHST and p-values should not lead us to think that conclusions can be a simple, dichotomous decision (i.e. reject vs not reject) (12). A conclusion does not simply become “true” on one side of the divide and “false” on the other (9). In fact, many contextual factors (i.e. study design, data collection, the validity of assumptions, and research judgement) can all contribute to scientific inference rather than by finding statistical significance (9,12).

Despite these criticisms, the recommendation is not that clinical researchers discard significance testing, but rather that they incorporate additional information that will supplement their findings (11). With that being said, it is important that statistical significance can be correctly interpreted to avoid further misuse.

Figure 1 – Example of varied levels of statistical significance (From Midgeley et al., 2016)

Although 0.05 is used as the cut-off value for significance in the majority of analyses (Figure 1), this figure is not set in stone. In fact, other researchers may reduce the significance level to 0.001 or increase to 0.10 (Figure 2). For instance, if an experiment is difficult to replicate, has access to a large sample size, requires a different standard of rigour or very serious adverse consequences may occur if the wrong decision were made about the hypothesis, then researchers may reduce the significance level to 0.01 in order to be more stringent and reduce the level of chance in the findings (16).

Figure 2 – Significance level set at p < 0.001 (From Keller et al., 2015)

Within research, authors may report varied levels of statistical significance (Figure 3), which is dependent upon the number of variables they have chosen to analyse. Figure 3 also demonstrates the use of an exact p-value (i.e. equals [=]) rather than reporting p < 0.05. Reporting as an exact value gives the reader much greater evidence of the fit of the model to the data. In other words, the lower the p-value, the less compatible the data is to the null hypothesis (i.e. despite both being significant, p = 0.04 is a weaker significance value than p = 0.004 and therefore we would be more confident that the results are ‘true’ with p = 0.004) if we are confident that all assumptions were met. This is because p = 0.04 is closer to the model prediction and all other assumptions, allowing for chance variation (14).

Why is statistical significance important?

Scientific knowledge changes rapidly, but the concepts and methods of conducting research change much more slowly (11) Therefore, currently, significance testing remains the most widely used, convenient and reproducible method for evaluating statistical hypotheses (17), though it’s not without its limitations.

However, degrading practices for their limitations that have been ingrained in the mainstream is easy; the difficulty is providing suitable alternatives that can withstand these criticisms (11). Therefore, it is likely that significance testing will continue to be used for the foreseeable future.

A refined goal of statistical analysis is to provide an evaluation of certainty or uncertainty regarding the size of an effect (14). Statistical significance is therefore used to answer questions on probability – using a scientific method – in order to determine if a hypothesis can be accepted or rejected (8). Scientifically and objectively demonstrating that variables are related, rather than being based on assumptions, gives readers confidence that what they are reading is true (18).

As such, when a report states that the relationship between two variables was found to be statistically significant, there is a normal tendency to feel more inclined toward the research findings (18). For instance, if a study reports that dynamic stretching prior to practice or competition significantly improves power output, we are more likely to implement this method, as there is solid evidence to support the claims. Therefore, statistical evidence contributes to our level of confidence in research findings, rather than relying on human judgement or bias (19).

How is statistical significance calculated?

Calculating statistical significance accurately can be a complicated task if one has little or no understanding of statistics. Fortunately, there are a number of statistical software packages, such as SPSS, JASP, and Stats Engine that can easily determine the statistical significance of experiments that do not require any mathematical equations. Despite this, using significance testing to analyse the results of an experiment is a three-step process (20):

Formulating a null hypothesis (i.e. formulating a statement to be answered).
Determining the probability (i.e. setting the significance level to 0.05).
Accepting or rejecting the null hypothesis (i.e. using statistical significance to determine whether the formulated null hypothesis is true or false).

Statistical significance plays an important role in helping to make sense of statistical data and gives scientific support to claims said to be true. It gives us a level of confidence that an observed change is actually true (18). However, as an isolated value, statistical significance is not sufficient enough to provide a scientific claim (19). This is because statistical significance fails to tell us the magnitude (or importance) of the reported difference (10). A statistic that does tell us the magnitude of the difference is termed the ‘effect size’.

Significance and importance are in actual fact weakly correlated (19), and thus in reporting and interpreting studies, both the substantive significance (effect size) and statistical significance are both essential results to be reported (14). For example, in elite sport, a moderate effect size may not be statistically significant, but it may represent something quite meaningful (e.g. a 10 % drop in a 100m running time could actually translate into reaching close to one’s personal best more consistently) (21).

Figure 3 – Graphical depiction of probability and statistical significance(Analytics-Toolkit.com)

Is future research needed on statistical significance?

Whilst significance values can provide an important statistic to help interpret research findings, they should not be the sole focus of a statistical report. This is because the arbitrary classification of results into ‘significant’ or ‘non-significant’ is often damaging to the valid interpretation of data (12). Albeit, beyond the scope of this article, there are methods that can be used in conjunction with statistical testing, such as effect size, power analysis and confidence intervals that can supplement findings. For instance, in addition to statistical significance, academics should report the effect size in order to help identify the magnitude of the reported difference (10). In fact, academics should use an estimated effect size prior to their investigation to calculate sample size and to ensure the study is sufficiently powered to help reduce the chance of error (i.e. false negative) (22).

Conclusion

The significance value is an important statistic underpinning many scientific conclusions. Statistical significance uses a scientific method to help determine whether a hypothesis can be accepted or rejected. As such, it gives readers confidence that the reported difference is actually true as it is based on a scientific concept, rather than human judgement.

However, despite its ability to interpret differences in statistical data, it fails to tell us the magnitude of this difference. Therefore, both effect size and statistical significance are both essential results to be reported and/or interpreted in research.

Dohmen, T. J. (2008). Do professionals choke under pressure? Journal of Economic Behavior & Organization, 65(3-4), 636–653. https://econpapers.repec.org/article/eeejeborg/v_3a65_3ay_3a2008_3ai_3a3-4_3ap_3a636-653.htm
Freitas, S., Dias, C., & Fonseca, A. (2013). Psychological skills training applied to soccer: A systematic review based on research methodologies. Review of European Studies, 5(5), 18-29. http://dx.doi.org/10.5539/res.v5n5p18
Reade, I., Rodgers, W., & Hall, N. (2008). Knowledge transfer: How do high performance coaches access the knowledge of sport scientists? International Journal of Sports Science and Coaching, 3(3), 319–334. http://dx.doi.org/10.1260/174795408786238470
Hills, L. and Maitland, A. (2014). Research-based knowledge utilization in a community sport evaluation: a case study. International Journal of Public Sector Management, 27(2), 165-172. http://dx.doi.org/10.1108/ijpsm-04-2013-0051
Grix, J. and Carmichael, F. (2012). Why do governments invest in elite sport? A polemic. International Journal of Sport Policy and Politics, 4(1), 73-90. http://dx.doi.org/10.1080/19406940.2011.627358
Rossi, P. H., Lipsey, M. W. and Freeman, H. E. (2004). Evaluation: a systematic approach. 7th ed., London: Sage Publications. https://www.amazon.com/Evaluation-Systematic-Approach-Peter-Rossi/dp/0761908943
Daniels, J. (2015). Evidence based practice in sport development: A Realistic Evaluation of a sport and physical activity strategy. Doctoral thesis (PhD), 1-294. https://e-space.mmu.ac.uk/608775/
Page, P. (2014). Beyond Statistical Significance: Clinical Interpretation of Rehabilitation Research Literature. International Journal of Sports Physical Therapy, 9(5), 726-736. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4197528/
Wasserstein, R. and Lazar, N. (2016). The ASA’s Statement on p-Values: Context, Process, and Purpose. The American Statistician, 70(2), 129-133. http://amstat.tandfonline.com/doi/abs/10.1080/00031305.2016.1154108#.WeC6jGhSyUk
Sullivan, G. and Feinn, R. (2012). Using Effect Size – or Why the P Value Is Not Enough. Journal of Graduate Medical Education, 4(3), 279-282. http://dx.doi.org/10.4300/jgme-d-12-00156.1
Glaser, D. (1999). The controversy of significance testing: misconceptions and alternatives. American Journal of Critical Care, 8(5), https://www.ncbi.nlm.nih.gov/pubmed/10467465
Verdam, MG., Oort, FJ. and Sprangers MA. (2014). Significance, truth and proof of p values: reminders about common misconceptions regarding null hypothesis significance testing. Qual Life Res, 23(1), 5-7. https://www.researchgate.net/publication/236933923_Significance_truth_and_proof_of_p_values_Reminders_about_common_misconceptions_regarding_null_hypothesis_significance_testing
Banerjee, A, et al. (2009) Hypothesis testing, type I and type II errors. Industrial Psychiatry Journal, 18(2), 127-131. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2996198/
Greenland, S. et al. (2016). Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. European Journal of Epidemiology, 31, 337-350. https://link.springer.com/article/10.1007/s10654-016-0149-3
Gliner, J., Leech N. and Morgan G. (2002). Problems with Null Hypothesis Significance Testing (NHST): What Do the Textbooks Say? The Journal of Experimental Education 7(1), 83-92. https://www.tandfonline.com/doi/abs/10.1080/00220970209602058
Cramer, D., & Howitt, D. (2006). The Sage dictionary of statistics. London: SAGE. https://www.emeraldinsight.com/doi/abs/10.1108/09504120510580208
Sham, PC. and Purcell SM. (2014). Statistical power and significance in large-scale genetic studies. Nature Reviews Genetics, 15(5), 335-346. https://www.nature.com/articles/nrg3706
Weinbach, R. (1984). When Is Statistical Significance Meaningful? A Practice Perspective. Journal of Sociology & Social Welfare, 16(4), 31-37. http://scholarworks.wmich.edu/cgi/viewcontent.cgi?article=1880&context=jssw
S. (2016). Statistical significance and scientific misconduct: improving the style of the published research paper. Review of Social Economy, 74, 83-97. http://www.tandfonline.com/doi/abs/10.1080/00346764.2016.1150730?journalCode=rrse20
Greco, D. (2011). Significance Testing in Theory and Practice. The British Journal for the Philosophy of Science, 62(3), 607-637. https://academic.oup.com/bjps/article/62/3/607/1509462/Significance-Testing-in-Theory-and-Practice
Andersen, M., McCullagh, P., & Wilson, G. (2007). But What Do the Numbers Really Tell Us? Arbitrary Metrics and Effect Size Reporting in Sport Psychology Research. Journal of Sport and Exercise Psychology, 29(5), 664-672. http://dx.doi.org/10.1123/jsep.29.5.664
Sainani, K. (2009). Putting P values into perspective. PM & R: The Journal of Injury, Function, and Rehabilitation, 1(9), 873-877. http://dx.doi.org/10.1016/j.pmrj.2009.07.003

The post Statistical Significance appeared first on Science for Sport.

Smallest Worthwhile Change

Becky Conway — Sun, 29 Oct 2017 07:00:25 +0000

Contents of Article

Summary
Why is fitness testing important in sports?
What is the smallest worthwhile change?
Why is the smallest worthwhile change important?
How is the smallest worthwhile change calculated?
Is further research into the smallest worthwhile change needed?
Conclusion
References
About the Author

Summary

Laboratory and field-based performance/fitness tests are a fundamental element of athlete profiling. Conducting fitness testing not only allows the coach to measure performance improvements or decrements but also design the training programme accordingly. Therefore, identifying a meaningful change in performance is a critical factor in the athlete profiling process.

A meaningful change in performance can be determined by calculating the smallest worthwhile change. However, we must also be mindful of the variation that occurs between testing trials, as this is often higher than the calculated smallest worthwhile change. As a result, a combination of the smallest worthwhile change, the coefficient of variance, and two times the coefficient of variance can be used to set achievable, but also meaningful, targets for individual athletes.

Why is fitness testing important in sports?

A basic precondition of the long-term athlete development structure is the regular assessment of physical performance, which forms a major component of the sport science service (1). Fitness testing is typically conducted several times per year to evaluate changes in physical qualities (e.g. strength or speed) relevant to performance. For example, a 5m sprint test may be used to measure an athlete’s ability to accelerate. As a result, laboratory or field-based tests are seemingly fundamental elements of athlete profiling, with the ability to track performance change and programme efficacy (2).

With the aim of determining performance improvements or decrements in these individual physical performance markers, it is important to identify what change in performance is actually considered meaningful (3).

What is the smallest worthwhile change?

The ability to reliably detect meaningful changes in fitness tests over time is a critical factor when assessing athletes (4). However, due to a plethora of influential factors (such as arousal, caffeine consumption, sleep quality and quantity, fatigue levels, ambient temperature, ground surface, and many others), fitness test results are almost always guaranteed to show variation every time an athlete is tested.

Due to this variation, simply identifying that an athlete performed better in a fitness test over a period of time does not provide sufficient evidence that this change is actually meaningful. This is because an observed change in performance could be caused by variation in fitness testing (by any of the factors mentioned above), therefore, a calculation that determines an actual meaningful change is necessary.

A meaningful change in performance, however, can be determined by calculating the smallest worthwhile change (SWC) (5).

Why is the smallest worthwhile change important?

Any person involved in sport, whether that be the sports science or technical coaching staff or even the athletes themselves, should appreciate the importance of the SWC in regards to fitness testing (6). Knowledge of this change is important when conducting testing, and thus providing appropriate feedback to coaches and athletes. When relaying information, the coach must be confident that the documented change is of practical significance (5). Calculating the SWC, therefore, allows the coach to be confident that they can accurately determine if a real change in performance has occurred over a subsequent period of time, rather than just typical variation in the test.

A winning performance is a primary goal in many sporting environments, which is often distinguished by marginal differences, highlighting the importance of the SWC calculation. For instance, it allows for the identification of the smallest change possible that will increase an athlete’s chance of success. Within an individual sport, this increase in the chance of success would be based on the minimal change necessary to improve competition placement. Whereas in team sports, in which there is no clear relationship between fitness test performance and team performance, the SWC is based on standardised change in team performance (7).

How is the smallest worthwhile change calculated?

Estimation of the SWC comes from an analysis of the reliability of physical performance, and thus assuming that the data are reliable and free from systematic bias, we can analyse the SWC (8). For instance, the implementation of fitness tests that have good test-retest reliability is favourable when calculating the SWC as they produce the lowest variation between tests. This is also true for using testing equipment which provide consistent, reliable, data. For example, when measuring speed, electronic timing gates have a higher degree of accuracy and consistency in comparison to a handheld stopwatch (9).

Reducing the variation between tests increases the likelihood of identifying a real change in performance. In team sports, it has been suggested that the SWC for elite athletes can be calculated as 0.2 multiplied by the between-subject standard deviation of the particular test (7). The calculation is based on Cohen’s effect size principle, with 0.2 representing a small, but not trivial, effect size (10).

Figure 1 provides an example of 5-10-5 Shuttle test scores. For example, if an athlete runs a 5-10-5 shuttle test in 5.07 seconds, with the standard deviation for that particular test on that particular population being 0.15 seconds, the athlete would have to run 0.03 seconds faster to demonstrate a meaningful difference (SWC = 0.2 * 0.15).

Figure 1 – Example testing data

This would mean that a time of 5.04 seconds, or less, would be considered a real change. A change of 0.03 seconds is rather trivial, and it is often the case that performance tests have a lot more noise (variation/error) than the observed smallest worthwhile change (7). Therefore, when assessing an individual’s measured performance change, the observed change in performance (signal) needs to be higher than the random variation (noise) associated with the test (5).

It is also important to clearly understand that the variation (noise) of a test score can be influenced by an abundance of different things, such as athlete preparation, nutrition, environmental conditions, testing equipment, testing environment etc. Therefore, it is extremely important the coach controls/standardises as many of these variables as possible if they wish to obtain worthwhile data.

Coefficient of Variation
Coefficient of variation (CV) is a statistic used to measure the degree of variation between testing trials in an individual athlete’s repeated measurements (11). In other words, the CV is a measure of the spread of data relative to the mean. In the example above (Figure 1), we can see that the CV in seconds is more than the calculated SWC. Therefore, a firm conclusion about this trivial SWC cannot be made, and in this instance, it would be more beneficial to use the CV to identify a real change. This is because, despite the SWC calculation, we must be mindful of the variation that occurs between testing trials.

When the typical error is greater than the SWC, the observed changes are often due to error of measurement rather than real change (Video 1). The measure is, therefore, considered too noisy to be useful, and as such we can only be confident of a real change if the data falls outside of the calculated variation.

A further consideration to make is that if there is variation in the first set of testing, there will also be variation in the second set of testing, and so the CV needs to be doubled (2CV) to account for the chance of error in both testing periods. For instance, if an athlete’s first performance is lower than their ‘true’ score and their second performance above their ‘true’ score, one needs to account for this extension of the standard deviation. Thus, 2CV is necessary to ensure the threshold is large enough to account for a real change (12).

Using Figure 2 as an example, targets can be set, accounting for SWC, CV, and 2CV, allowing for the identification of a trivial change, a possibly meaningful change, or a certainly meaningful change, respectively. For instance, data that falls outside of the 2CV range provides a target that would be a certainly meaningful change in performance. However, this target may be unrealistic to reach as it requires the greatest change in performance.

In contrast, the SWC provides an achievable target for athletes as it requires the smallest change in performance, but the change in performance is likely to be trivial. Therefore, there needs to be an appreciation of all three statistics when identifying performance change in athletes. Coaches can use these data to set achievable, but also meaningful targets for their athletes, in which there is consideration for not only the magnitude of change but also the degree of certainty to which this change is meaningful.

Figure 2 – Example testing targets

Is further research into the smallest worthwhile change needed?

A lower CV for a test implies less random noise, and therefore a greater ability or likelihood of detecting a real change in performance (12). In other words, reducing the variation between tests increases the likelihood of identifying a real change in performance, due to the ability to attribute a lower percentage of the change to chance. As a result, future research needs to focus on identifying tests that produce the lowest variation between testing trials (4).

Tests that show good test-retest reliability (i.e. high repeatability and reproducibility) are paramount within testing procedures as they produce the lowest variation between tests. Future research should also produce strict testing protocols with the aim of reducing the random error that occurs between tests (4). For example, if conducting a 20m sprint test, the athlete must follow a standardised warm-up, perform the test in the same starting position, have the same period of rest between trials, be tested in the same location, use the same testing equipment, etc.

Conclusion

The SWC is a favourable procedure for coaches to implement as part of the fitness testing process, as it allows for the determination of a meaningful change, rather than based on assumption. However, we must be mindful of the typical error associated with performance tests, and as such, the CV is also a necessary statistic to take into consideration when identifying performance change in athletes between two testing sessions.

Athlete targets can be set based on a combination of SWC, CV and 2CV to provide the greatest certainty in performance change.

Reilly, T., Morris, T., & Whyte, G. (2009). The specificity of training prescription and physiological assessment: A review. Journal Of Sports Sciences, 27(6), 575-589. http://dx.doi.org/10.1080/02640410902729741
Fernandez-Fernandez, J., Ulbricht, A., & Ferrauti, A. (2014). Fitness testing of tennis players: How valuable is it? British Journal Of Sports Medicine, 48, i22-i31. http://dx.doi.org/10.1136/bjsports-2013-093152
Moir, G., Button, C., Glaister, M., & Stone, M. (2004). Influence of Familiarization on the Reliability of Vertical Jump and Acceleration Sprinting Performance in Physically Active Men. The Journal Of Strength And Conditioning Research, 18(2), 276. http://dx.doi.org/10.1519/r-13093.1
Duthie, G., Pyne, D., Ross, A., Livingstone, S., & Hooper, S. (2006). The Reliability of Ten-Meter Sprint Time Using Different Starting Techniques. The Journal Of Strength And Conditioning Research, 20(2), 246. http://dx.doi.org/10.1519/r-17084.1
Spencer, M., Fitzsimons, M., Dawson, B., Bishop, D., & Goodman, C. (2006). Reliability of a repeated-sprint test for field-hockey. Journal Of Science And Medicine In Sport, 9, 181-184. http://dx.doi.org/10.1016/j.jsams.2005.05.001
Paton, C., & Hopkins, W. (2005). Competitive Performance of Elite Olympic-Distance Triathletes: Reliability and Smallest Worthwhile Enhancement. Sportscience,9, 1-5.
Hopkins, W. G. (2004). How to interpret changes in an athletic performance test. Sportscience, 8, 1-7. http://www.sportsci.org/jour/04/wghtests.htm
Turner, A., Brazier, J., Bishop, C., Chavda, S., Cree, J., & Read, P. (2015). Data Analysis for Strength and Conditioning Coaches. Strength And Conditioning Journal, 37, 76-83. http://dx.doi.org/10.1519/ssc.0000000000000113
Hetzler, RK, Stickley, CD, Lundquist, KM, and Kimura, IF. Reliability and accuracy of handheld stopwatches compared with electronic timing in measuring sprint performance. J Strength Cond Res 22(6): 1969–1976, 2008. https://www.ncbi.nlm.nih.gov/pubmed/18978613

Sullivan, G., & Feinn, R. (2012). Using Effect Size—or Why the P Value Is Not Enough. Journal Of Graduate Medical Education, 4(3), 279-282. http://dx.doi.org/10.4300/jgme-d-12-00156.1
Panichkitkosolkul, W. (2013). Confidence Intervals for the Coefficient of Variation in a Normal Distribution with a Known Population Mean. Journal Of Probability And Statistics, 2013, 1-11. http://dx.doi.org/10.1155/2013/324940
Hopkins, W. (2000). Measures of Reliability in Sports Medicine and Science. Sports Medicine, 30, 1-15. http://dx.doi.org/10.2165/00007256-200030010-00001

The post Smallest Worthwhile Change appeared first on Science for Sport.