Overview
The repository with all source files is available at:
- https://github.com/jeromyanglim/Sweave_Item_Analysis/.
A copy of the resulting PDF can be viewed here.
For general information on program requirements and running the code see this earlier post
Sweave Documents that Display Console Input and Output
I find it useful to distinguish between different kinds of Sweave documents. One key distinction is between
- reports that display the console and reports that do no display the console.
Reports that display the console are suited to distinct applications, including:
- R tutorials
- Personal analyses
- Analyses provided to experts who understand R and the Project
It would be further possible to distinguish between reports that do and do not show the console input (echo=true).
Developing Sweave reports that display the consolestill benefit from thoughtful variable names, selective display of output and so forth.However, naturally less time is invested in putting the polish on figures, tables, and inline text.
The previous Sweave Tutorials were examples of Sweave documents that do not display the console.Tutorial 1 was a data driven documentTutorial 2 was a set of batch polished reports.
The present tutorial is an example ofa Sweave document that displays console input and output.
The remainder of the post discusses various aspects of the source code.
Source Code
Folder and File Structure
.gitignorerecords the folder where derived files are stored bymake.makefileis similar to that explained and used in Sweave Tutorial 1.This similarity has been obtained through (a) the use of variables in themakefile(b) the fact that both projects are driven by theRnwfile;thus, make calls Rnw, which in turn imports data, and so forth.README.mdis a file in markdown.Markdown is the markup language used on Github.The file is automatically displayed on the repository home page.data: This folder contains a file with the responses to the 50multiple choice questions.meta: This folder contains a file with informationabout each of the 50 multiple choice questions including the text,response options, and the supposedly correct response.backup: This folder contains a copy of the resulting PDF.Although this is a derived file and as such should not generally be monitoredby Git, it's helpful to include a copy for easy access.Sweave.sty: I find it easier and more portable to just include thisLaTeX style file required by Sweave in with the project.
Item_Analysis_Report.Rnw
Library loading and data import
<<initial_settings, echo=false>>=options(stringsAsFactors=FALSE)options(width=80)library(psych) # used scoring and alphalibrary(CTT) # used for spearman brown prophecy@<<import_data, echo=false>>=cases <- read.delim("data/cases.tsv")items <- read.delim("meta/items.tsv")items$variable <- paste("item", items$item, sep="")@options(width=80)ensures that the console width is suitablefor the printed pageoptions(stringsAsFactors=FALSE)means that character variablesimported usingread.delimare left as character variables and not converted intofactors.In general I find this a more useful default behaviour.In particular I often use the actual text, particularly in metadata, to generate variable names, print text and so forth.Leaving variables as character is better for this.
Using data before it has apparently been generated
...The example involves performing an item analysis of responses of \Sexpr{nrow(cases)} students to a set of \Sexpr{nrow(items)} multiple choice test items....<<>>=<<initial_settings>><<import_data>>@- In the above code I wanted to be able to write the numberof cases before showing the code for importing settings and data.Thus, I first ran the code chunks with
echo=falseto prevent display.Then, afterwards, these code chunks were rerun inside a code chunkusing the syntax<<name_of_code_chunk>>(i.e., without the=sign at the end of the opening.This time they were displayed.
Scoring multiple choice tests
<<score_test>>=itemstats <- score.multiple.choice(key = items$correct, data = cases[,items$variable])@score.multiple.choiceis a function in thepsychpackagefor scoring multiple choice tests.keyis a vector of integers representing the correct response.datais a matrix or data.frame of responses from a set of respondents.- the example shows how metadata can be used to simplify code.
items$variableincludes the name of the 50 personality test itemsitems$correctincludes the vector of correct responses.
...
Figures in Sweave
<<plot_mean_by_r, fig=true>>=plot(r ~ mean , itemstats$item.stats, type="n")text(itemstats$item.stats$mean, itemstats$item.stats$r, 1:50)abline(h=.2, v=c(.5, .9))@- Code chunks can produce single figures.the
fig=truekey-value pair is required. type="n"is used to not show pointsand thentext(...)is used to plot the item numbers on the plot.- Because the document is an informal documentdesigned to display the console,the figure is not wrapped in a figure float.A float would involve more typing and might even be annoying ifit moved around the document.
...
Using Sweave to Better follow the DRY (Don't Repeat Yourself) Principle
<<flag_bad_items>>=rules <- list( tooEasy = .95, tooHard = .3, lowR = .15)oritemstats$item.stats$tooEasy <- oritemstats$item.stats$mean > rules$tooEasy...@\begin{itemize}\item \emph{Too Easy}: mean correct $>$\Sexpr{rules$tooEasy}.\Sexpr{sum(oritemstats$item.stats$tooEasy)}items were bad by this definition.... \end{itemize}- The above abbreviated version of the actual code highlights how Sweave can be used to prevent repetitionand facilitate modifiability.
- The code flags items as too easy if more than 95% of participantsget the item correct.This value (
.95) is stored in a variable.It's then subsequently used both in the code to flagitems as too easy and also used in the text where therule is described in plain text (i.e.,\Sexpr{rules$tooEasy}). - This is a particularly powerful use of Sweave whereby any textin a document that might be repeated or any text that describesdetails of a data analytic algorithm is a good candidate for simplificationusing Sweave.
...
\Sexpr{} and formatting
The formula suggests that in order to obtainan alpha of \Sexpr{sbrown$targetAlpha},\Sexpr{round(sbrown$multiple, 2)} times as many items are required.Thus, the final scale would need around\Sexpr{ceiling(sbrown$refinedItemCount)} items.Assuming a similar number of good and bad items,this would require an initial pool of around\Sexpr{ceiling(sbrown$totalItemCount)} items.- The above code highlights a couple of examples of how inlineformatting of numbers can be done, and is often requiredwhen including inline text.In this case,
ceilingandroundfunctions were used.
Sweave Tutorial Series
This post is the third installment in a Sweave Tutorial Series:
- Using Sweave, R, and Make to Generate a PDF of Multiple Choice Questions
- Batch Individual Personality Reports using R, Sweave, and LaTeX
Related Posts
- Getting Started with Sweave
- makefiles for Sweave, R and LaTeX using Eclipse on Windows
Hiç yorum yok:
Yorum Gönder