This post provides links to a range of resources related to the use andinterpretation of correlations.I wanted to provide a page with links to a number of additional resources thatwould be useful both for those of my students who might be keen to learn moreand for anyone else who might be interested.Specifically, this post provides links to:(a) introductory book-style chapters on correlation,(b) resources related to assorted issues in correlation (i.e., discussion ofcausal inference, correlation with various variable types, range restriction,statistical power, correlation interpretation, and significance testing),(c) tutorials on computing correlations using SPSS and R, and(d) tips for reporting correlations in APA Style.
Introductions to correlation
The following provide general textbook style overviews of correlation:
- David Kenny's Chapter 16 Testing Measures of Association provides a textbook overviewof correlation designed for psychology undergraduate students. It also includes several practice questions.David Kenny has kindly made his entire textbook 'Statistics for the Socialand Behavioral Sciences' available online for free as either an overall pdf or individual chapters.
- David Stockburger's Introductory Statistics chapter on Correlation
- My own slides and notes on correlation
Assorted Issues
Correlation and Causation
Knowing how to reason about causality in the behavioural and social sciences isa really important skill.
- Check out this earlier post on correlation and causationwhich includes links to PDFs of important journal articles on the topic.
- Joy of Stats on Correlationprovides a 4 minute video with a few entertaining examples of correlations andtheir connection with causal inference.
Types of variables
The prototypical correlation example is based on two continuous, normallydistributed variables. However, in practice there are many other types of variables that you mightwish to correlate.The following provide pages provide links to suggestions for how to analyse someother common scenarios:
- What to do when one of the variables is non-normal?
- What to do when one of the variables is a Likert item?
- What to do if you want to treat a variable as ordinal?
Range restriction
- HyperStat has a general discussion of rangerestriction
- See this simulation on connexions showing the effect of rangerestriction
Statistical Power
Statistical power within the context of correlation is the probability ofobtaining a statistically significant correlation in a study given that a truecorrelation exists.
- This earlier postprovides (a) some simple rules of thumb for power analysis for correlations,(b) how to calculate statistical power using free software called G-Power,and (c) links to additional reading on the important topic of statisticalpower.
Interpretation
When I first learnt about the correlation coefficient, I found itchallenging to truly grok what a particular value meant.Learning the standard interpretation was easy. The challenging part was understanding the practical and theoreticalimplications for a correlation of a given size.
The following are some of the standard interpretations of a correlation:
- Pearson's correlation is an index of the direction and strength of linearassociation between two variables.
- The square of the correlation between X and Y is the percentage ofvariance shared between X and Y (e.g., if
r = .50, then the two variablesshare.50 * .50 = 25%of variance). - If X and Y were standardised (i.e., made so that the mean of bothvariables was zero and the standard deviation was one) then, thecorrelation would be the same as the regression coefficient of Xpredicting Y or Y predicting X.Thus, for example, if
r = .25you could say that "a value one standard deviationgreater on X predicts a .25 standard deviation greater value on Y".
Strategies for building an intuition of what a correlation means:
- Play with the Regression by Eye simulation.The simulation generates a scatterplot, and you are asked to indicate which ofa set of correlations corresponds to the scatterplot.It helps to build a mapping between the graphical intuitiveness of ascatterplot and the numeric summary of the linear association in thescatterplot (i.e., the correlation coefficient).
- Memorise some of the rules of thumbs for describing correlation effect sizes(see this discussion by AndyField), but don't take therules of thumb too seriously.
- Try to build up a frame of reference for correlations in different contexts byreading results sections. Meta analyses can also be particularly useful inthis regard.
- Read the article 'Meyer, G. J., et al (2001). Psychological Testing and PsychologicalAssessment: A Review of Evidence and Issues. American Psychologist, 56(2),128-165.' (PDF)which provides large tables of meta-analytic correlations for a wide range ofmedical and psychological domains sorted by the size of the correlation.Studying these tables can help build an intuition and a context forinterpretation of correlations.
Graphical approaches
As with most statistical techniques, there are various ways of representing thedata. The correlation coefficient provides a very brief summary of the associationbetween two variables.However, graphical representations of association are much richer.
The following are some general heuristics that I find useful when plotting datathat might also be represented as a correlation:
- Use scatterplots to explore features of the association (e.g., presence ofoutliers, linearity, distributional properties, spread of data around anytrend line, etc.);
- If one of the variables is positively skewed, consider plotting thecorresponding axis on a log scale;
- If there are a lot of data points (e.g.,
n > 1000), adopt a different strategysuch as using some form of partial transparency (e.g., see use of the alphaproperty in ggplot2), or samplingthe data; - If one of the variables takes on a limited number of discrete categories,consider using a jitter or a sunflower plot;
- If there are three or more variables, consider using a scatterplot matrix;
- Fitting some form of trend line is often useful;
- Adjust the size of the plotting character to the sample size (for bigger n,use a smaller plotting character).
Significance tests on correlations
There are a wide range of possible significance tests that can be performed oncorrelations.The following links provide some suggestions and links for different scenarios.
- General post on comparing significance of two correlationsunder various conditions.
- Significance of correlation using Pearson's table
Statistical Software
Calculating a correlation coefficient and its associated statisticalsignificance is a standard task that almost any statistical package can perform.Many psychology students are taught to use SPSS. It is a proprietary (i.e., youcan't run it at home without a paid licence)data analysis system with a strong empahsis on a GUI and making it easy toperform various standardised analyses common in the social sciences.
My preferred tool for performing data analysis is R.It is open source (thus, you can run it at home for free) and is often describedas the lingua franca of statistics. It generally requires a more sophisticatedunderstanding of statistics and computing to use effectively.Thus, for the interested psychology student or researcher I have this introduction to R for researchers in psychology.
Below I list resources for performing correlation analysis in SPSS and R.
SPSS
- Andy Field has a chapter on correlationwhich discusses correlation using SPSS.
- This video tutorial on running and interpreting a correlation analysis usingSPSS goes for about 7 minutesand is elementary.
R
R makes it easy to perform correlations on datasets.Specifically, the following links provide example syntax:
- Quick-R on correlations
- Quick-R on scatterplots
- More generally, William Revelle has some great resources on R forpsychology.
Reporting Correlations in APA Style
- APA Style Manual: When required to report results using APA style, theauthoritative source is the Publication Manual of theAPA.
- Article Deconstruction: Another general strategy is to find a journalarticle that (a) reports a similar statistical test as you require, and (b)that is published in an APA journalor at least is in a journal that uses APA style.
- APA journals are listed here
- A quick search on Google Scholar willoften be sufficient and quicker, although PsycInfo (a subscriptionservice) is more reliable if you have access to it (many universities do).E.g., a quick search for apa "significant correlation between"psychologyrevealed several relevant articles and some with immediate PDF access.
- I also have a separate post on this general approach of deconstructingjournal articlesto discern writing principles.
- Correlation Matrices: Many psychological studies, particularly those based oncorrelational/observational designs, involve the measurement of a range ofnumeric variables. It is particularly useful, and common, in such cases to report a correlationmatrix between sets of variables. I have a post with instructions on formatting a correlation matrixin APA style using a combination of SPSS, Excel, and Word. The post also includes links to examples of correlation matrices beingreported.
- General overview of reporting statistics including correlations using APAstyle
Hiç yorum yok:
Yorum Gönder