11 Kasım 2012 Pazar

Sweave Tutorial 3: Console Input and Output - Multiple Choice Test Analysis

To contact us Click HERE
This post provides an example of using Sweave to perform an item analysis of a multiple choice test.It is designed as a tutorial for learning more about using Sweave in a mode where console input and output is displayed.Copies of all source code and the final PDF report is provided.

Overview

The repository with all source files is available at:

  • https://github.com/jeromyanglim/Sweave_Item_Analysis/.

A copy of the resulting PDF can be viewed here.

For general information on program requirements and running the code see this earlier post

Sweave Documents that Display Console Input and Output

I find it useful to distinguish between different kinds of Sweave documents. One key distinction is between

  • reports that display the console and reports that do no display the console.

Reports that display the console are suited to distinct applications, including:

  • R tutorials
  • Personal analyses
  • Analyses provided to experts who understand R and the Project

It would be further possible to distinguish between reports that do and do not show the console input (echo=true).

Developing Sweave reports that display the consolestill benefit from thoughtful variable names, selective display of output and so forth.However, naturally less time is invested in putting the polish on figures, tables, and inline text.

The previous Sweave Tutorials were examples of Sweave documents that do not display the console.Tutorial 1 was a data driven documentTutorial 2 was a set of batch polished reports.

The present tutorial is an example ofa Sweave document that displays console input and output.

The remainder of the post discusses various aspects of the source code.

Source Code

Folder and File Structure

  • .gitignore records the folder where derived files are stored by make.
  • makefile is similar to that explained and used in Sweave Tutorial 1.This similarity has been obtained through (a) the use of variables in the makefile (b) the fact that both projects are driven by the Rnw file;thus, make calls Rnw, which in turn imports data, and so forth.
  • README.md is a file in markdown.Markdown is the markup language used on Github.The file is automatically displayed on the repository home page.
  • data: This folder contains a file with the responses to the 50multiple choice questions.
  • meta: This folder contains a file with informationabout each of the 50 multiple choice questions including the text,response options, and the supposedly correct response.
  • backup: This folder contains a copy of the resulting PDF.Although this is a derived file and as such should not generally be monitoredby Git, it's helpful to include a copy for easy access.
  • Sweave.sty: I find it easier and more portable to just include thisLaTeX style file required by Sweave in with the project.

Item_Analysis_Report.Rnw

Library loading and data import

<<initial_settings, echo=false>>=options(stringsAsFactors=FALSE)options(width=80)library(psych) # used scoring and alphalibrary(CTT) # used for spearman brown prophecy@<<import_data, echo=false>>=cases <- read.delim("data/cases.tsv")items <- read.delim("meta/items.tsv")items$variable <- paste("item", items$item, sep="")@
  • options(width=80) ensures that the console width is suitablefor the printed page
  • options(stringsAsFactors=FALSE) means that character variablesimported using read.delim are left as character variables and not converted into factors.In general I find this a more useful default behaviour.In particular I often use the actual text, particularly in metadata, to generate variable names, print text and so forth.Leaving variables as character is better for this.

Using data before it has apparently been generated

...The example involves performing an item analysis of responses of \Sexpr{nrow(cases)} students to a set of \Sexpr{nrow(items)} multiple choice test items....<<>>=<<initial_settings>><<import_data>>@
  • In the above code I wanted to be able to write the numberof cases before showing the code for importing settings and data.Thus, I first ran the code chunks with echo=false to prevent display.Then, afterwards, these code chunks were rerun inside a code chunkusing the syntax <<name_of_code_chunk>> (i.e., without the = sign at the end of the opening.This time they were displayed.

Scoring multiple choice tests

<<score_test>>=itemstats <- score.multiple.choice(key = items$correct,             data = cases[,items$variable])@
  • score.multiple.choice is a function in the psych packagefor scoring multiple choice tests.key is a vector of integers representing the correct response.data is a matrix or data.frame of responses from a set of respondents.
  • the example shows how metadata can be used to simplify code.items$variable includes the name of the 50 personality test itemsitems$correct includes the vector of correct responses.

...

Figures in Sweave

<<plot_mean_by_r, fig=true>>=plot(r ~ mean , itemstats$item.stats, type="n")text(itemstats$item.stats$mean, itemstats$item.stats$r, 1:50)abline(h=.2, v=c(.5, .9))@
  • Code chunks can produce single figures.the fig=true key-value pair is required.
  • type="n" is used to not show pointsand then text(...) is used to plot the item numbers on the plot.
  • Because the document is an informal documentdesigned to display the console,the figure is not wrapped in a figure float.A float would involve more typing and might even be annoying ifit moved around the document.

...

Using Sweave to Better follow the DRY (Don't Repeat Yourself) Principle

<<flag_bad_items>>=rules <- list(        tooEasy = .95,        tooHard = .3,        lowR = .15)oritemstats$item.stats$tooEasy <-     oritemstats$item.stats$mean > rules$tooEasy...@\begin{itemize}\item \emph{Too Easy}: mean correct $>$\Sexpr{rules$tooEasy}.\Sexpr{sum(oritemstats$item.stats$tooEasy)}items were bad by this definition.... \end{itemize}
  • The above abbreviated version of the actual code highlights how Sweave can be used to prevent repetitionand facilitate modifiability.
  • The code flags items as too easy if more than 95% of participantsget the item correct.This value (.95) is stored in a variable.It's then subsequently used both in the code to flagitems as too easy and also used in the text where therule is described in plain text (i.e., \Sexpr{rules$tooEasy}).
  • This is a particularly powerful use of Sweave whereby any textin a document that might be repeated or any text that describesdetails of a data analytic algorithm is a good candidate for simplificationusing Sweave.

...

\Sexpr{} and formatting

The formula suggests  that in order to obtainan alpha of \Sexpr{sbrown$targetAlpha},\Sexpr{round(sbrown$multiple, 2)} times as many items are required.Thus, the final scale would need around\Sexpr{ceiling(sbrown$refinedItemCount)} items.Assuming a similar number of good and bad items,this would require an initial pool of around\Sexpr{ceiling(sbrown$totalItemCount)} items.
  • The above code highlights a couple of examples of how inlineformatting of numbers can be done, and is often requiredwhen including inline text.In this case, ceiling and round functions were used.

Sweave Tutorial Series

This post is the third installment in a Sweave Tutorial Series:

  1. Using Sweave, R, and Make to Generate a PDF of Multiple Choice Questions
  2. Batch Individual Personality Reports using R, Sweave, and LaTeX

Related Posts

  • Getting Started with Sweave
  • makefiles for Sweave, R and LaTeX using Eclipse on Windows

Hiç yorum yok:

Yorum Gönder