Statistics notes from Statistical Methods on Psychology by David Howell

Chapter 6 - Categorical Data and Chi-square

  • A multinomial distribution is one that has a range of possible mutually exclusive and exhaustive outcomes for each event, for instance rolling a number greater than three, a number starting with the letter T, or a one. The probability of an event is given by
  • A chi-square distribution is a particular type of mathematical distribution used for several statistical tests. Probably the most common of these is the Pearson’s Chi-Square test. The chi-square distribution is related to the gamma function which is very much like a continuous form of the factorial function. It’s formula is given by:
    Note the chi-square distribution has only one parameter k which becomes the degrees of freedom when performing statistical tests. The distribution only takes on values for x>0 and becomes more spread out as k increases. As a matter of fact, for the chi-square distribution the mean is k and the variance is 2k   !!

  • A chi-square goodness of fit test asks whether observed frequencies in different categories are significantly different from the expected frequencies if the null hypothesis were true. The chi-square test statistic is given by

    For one-way categorisation of data the degrees of freedom = C-1.
    For two way categorisation where we are testing contingency tables (whether the two dimensions are independent of one another) we calculate Eij = RiCj/N and the degrees of freedom is = (R-1)(C-1).
  • If measuring continuous data Yates suggests correcting for continuity by reducing the absolute value of each numerator by 0.5 before squaring, however some statisticians argue against this.
  • Small expected frequencies mean that the distribution is unable to accurately approximate the chi-square distribution, so a general (conservative) rule of thumb is to expect that all expected frequencies should be at least 5 (pg 149)
  • An alternative to the chi-square test is Fisher’s Exact test, which is useful in situations where there are few observations because it does not based on the chi-square distribution.
  • Another alternative is to use likelihood ratio tests, which sum  obs* ln(obs/exp) to calculate the chi-square statistic. The
  • Chi-square tests require independence of observations - for instance that two observations are not from responses by the same participant. This is distinct from testing independence of variables, for instance that a person’s weight and height categorisation are independent. Be sure to always include non-occurences (for instance people who do not like the idea of daylight savings time as well as those who do).
  • Effect sizes can be measured using differences between groups or levels of the independent variable (d-family measures) or the correlation between the two independent variables (r-family). D-family measures of effect size include the risk ratio and odds ration. R-family measures include phi and Cramers V.
  • The Kappa statistic does not use the chi-square distribution but is based on contingency tables, and is used to measure agreement between two measurers.

This entry was posted on Friday, January 8th, 2010 at 1:47 am and is filed under Uncategorized. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Leave a Reply

You must be logged in to post a comment.