30 Eylül 2012 Pazar

Getting Started with JAGS, rjags, and Bayesian Modelling

To contact us Click HERE

This post provides links to various resources on getting started with Bayesianmodelling using JAGS and R.It discusses: (1) what is JAGS;(2) why you might want to perform Bayesian modelling using JAGS;(3) how to install JAGS;(4) where to find further information on JAGS;(5) where to find examples of JAGS scripts in action;(6) where to ask questions; and(7) some interesting psychological applications of Bayesian modelling.

What is JAGS?

JAGS stands for Just Another Gibbs Sampler.To quote the program author, Martyn Plummer, "It is a program for analysis ofBayesian hierarchical models using Markov Chain Monte Carlo (MCMC) simulation..." It uses a dialect of the BUGS language, similar but a little different to OpenBUGS and WinBUGS.

Why JAGS?

The question of why you might want to use JAGS can be approached in several different ways:

  • Why Bayesian rather than Null Hypothesis Significance Testing (NHST) approaches?

    • To quote John D. Cook quoting Anthony O'Hagan, the benefits of "the bayesianapproach are that it is 1. fundamentally sound, 2. very flexible, 3.produces clear and direct inferences, and 4. makes use of all availableinformation." (see John's blog post forelaboration)
    • John K. Kruschke made a similar argument in an Open Letter extolling thebenefits of the bayesianapproach summarisedas: "(1) Scientific disciplines from astronomy to zoology are moving toBayesian data analysis. We should be leaders of the move, not followers. (2) Modern Bayesian methods provide richer information,with greater flexibility and broader applicability than 20th centurymethods. Bayesian methods are intellectually coherent and intuitive.Bayesian analyses are readily computed with modern software and hardware.(3) Null-hypothesis significance testing (NHST), with its reliance on pvalues, has many problems. There is little reason to persist with NHST nowthat Bayesian methods are accessible to everyone."
  • Why JAGS/BUGS rather than coding in a low-level language?

    • It's simpler; for models that BUGS can handle, BUGS can shield you fromsome of the thorny details related to numeric integration.
    • There are simple interfaces with R.
  • Why JAGS rather than WinBUGS or OpenBUGS?

    • I'm using JAGS because it works well on Ubuntu. WinBUGS is broadly Windowsspecific, although I've read that it may work with the emulation software Wine.
    • JAGS interfaces well with R. I'm comfortable writingscripts. Thus, I don't personally see the benefits of using a dedicatedGUI like WinBUGS. I can leverage what I know about R.
    • However, ultimately converting code between different flavours of BUGSis not that difficult.
    • For further discussion of the issue, see this r-helpdiscussion andthis discussion on CrossValidated.

More than anything I found that JAGS provided a useful entry point into the world ofBayesian modelling. This in turn appealed to me for several reasons:

  1. Even when I perform analyses using an NHST approach I often intuitively thinkof empirical research questions in terms of probability densities on aparameter of interest that changes as empirical and theoretical evidence isaccumulated. See for example Thompson's (2002) concept ofmeta-analytic thinking.Bayesian analysis provides tools for formalising this orientation.
  2. More broadly, I appreciate the explicitness that a Bayesian approachrequires and encourages. E.g., specifying the distribution of the error term,specifying a prior, specifying the distribution of parameters in a mixedeffects model, and so on.
  3. There are several modelling challenges that I'm currently working throughwhere a Bayesian approach offers substantial flexibility and applicability. In particular, I'm interested in modelling individual differences in theeffect of practice on strategy use and task performance and then relatingthese individual differences to factors like intelligence, prior experience,and personality.

JAGS Installation

JAGS runs on Linux, Mac, and Windows.I run JAGS on Ubuntu through an interface with R called rjags.

The following sets out a basic installation process:

  1. If necessary Download and install R andpotentially a user interface to R like R Studio(see here for tips on getting started withR).
  2. Download and install JAGS as peroperating system requriements.
  3. Install additional R packages: e.g., in R install.packages("rjags") .In particular, I use the packages rjags to interface with JAGS andcoda to process MCMC output.

Information on JAGS

  • The manual for different versions of JAGS is locatedhere.e.g., the pdf of the manual for 3.1.0.Several particularly relevant sections include:
    • the list of supported distributions and how they are parameterised. This isoften important given that the code looks similar to R but often uses different parameterisation (e.g., precision is used instead ofstandard deviation for a normal distribution).
    • It summarises differences between WinBUGS and JAGS.
    • It sets out available functions and operators.
  • The rjags help pdffor information about how to interface with JAGS from R.
  • Martin Plummer has a blog called JAGS NEWS
  • The Bayesian Task View on CRAN lists and brieflydescribes the many R packages related to Bayesian statistics.
  • Lunn and colleagues have a 2009 article calledThe BUGS project: Evolution, critique and future directions.It provides a useful historical perspective on the broader BUGS project,although it does not mention much about JAGS specifically.

Examples JAGS Scripts

I find it easier to pick up a new language by playing with examples.The following provides links to example JAGS code, often with accompanyingexplanations:

  • John Myles White
    • A course on statistical models that is under development with JAGS scripts on github
    • A model of Cannabalt scores using a gammadistribution
    • Simple introductory examples of fitting a normal distribution, linearregression, and logisticregression
    • A follow-up post demonstrating the use of the coda package with rjagsto perform MCMCdiagnostics.
  • John K. Kruschke
    • John Krushke wrote a book called Doing Bayesian Data Analysis: A Tutorial with Rand BUGS. It's an excellent entry point into the world of Bayesianstatistics for the social and behavioural scientist who hasreasonable quantiative training, but is not necessarily ready to absorbthe kinds of books that are used in graduate-level statistics courses.
    • The book has awebsite thatprovides all the examples used in the book all the examples used in thebook. See this blog post for a link to the zip file containing the JAGScode.
  • BUGS Project
    • BUGS is well known for the large set of examples that accompany the project.
    • The PDF providing documentation for Volume 1 and 2 of the examples is availablehere.
    • You can see the JAGS code used to run these exampleshere.
  • Patrick J Mineault

    • An example from Gelman et al examining the effect of training programs onSATscores
  • Miguel Lobo

    • A short tutorial
  • Simon Jackman
    • Simon Jackman wrote the book Bayesian Analysis for the Social Sciencesthat has accompanying JAGS code.
    • The book's website has severaluseful resources including example papers using Bayesian methods.
    • An associated coursethat uses the book as a text book has slides and many examples of usingand R and JAGS.
  • Johannes Karreth

    • A course on applied bayesian modelling with examples of data, and code using the R2jags interface.
  • Myself

    • I also plan to post a few examples in upcoming blog posts. I typicallywill share the code for these on my github account:jeromyanglim. If you are reading thisthrough syndication you may wish to subscribe to the RSS feed of thesource blog jeromyanglim.blogspot.com.

More broadly, examples and tutorials designed for WinBUGS can generally beadapted to be useful for JAGS. So for example, you can explore these WinBUGSexamples:

  • Michael Lee and Eric-Jan Wagemakers have a free online book called A Course in Bayesian Graphical Modelingfor Cognitive Science: see PDF andwebsite.
  • The website for the book Markov Chain MonteCarlo has several WinBUGS examples.
  • There is an extensive list of BUGSresources onthe BUGS project website.

Asking questions

There are several places to ask questions about JAGS, R, and Bayesianstatistics.

  • JAGS, BUGS, and bayesian questionson stats.stackexchange.com (aka CrossValidated).
  • JAGS discussion forum
  • There's also a BUGS discussionlist

In general, I prefer the Stack Exchange model for asking and answering questionson the internet, although the most important issue is typically where theexperts are located.

Interesting Psychological Applications of Bayesian Modelling

If you want to see some examples of Bayesian modelling applied to psychologicaldata, I found the following articles quite interesting. PDFs are available online.

  • Shiffrin, Lee, Kim, and Wagenmakers (2008, PDF) present a tutorial on hierarchical bayesian methods in the context of cognitive science.
  • Michael Lee (2011, PDF) in Journal of Mathematical Psychologydiscusses the benefits of hiearchical Bayesian methods to modellingpsychological data and provides several example applications.
  • Lee Averell and Andrew Heathcote (2010,PDF) in Journal ofMathematical Psychology analyse individual differences in the forgetting curveusing a hierarchical Bayesian approach.

If you know of any other interesting JAGS resources or have any comments about mychoice of software for Bayesian data analysis, feel free to post a comment.

How to plot three categorical variables and one continuous variable using ggplot2

To contact us Click HERE

This post shows how to produce a plot involving three categorical variablesand one continuous variable using ggplot2 in R.

The following code is also available as a gist on github.

1. Create Data

First, let's load ggplot2 and create some data to work with:

library(ggplot2)set.seed(4444)Data <- expand.grid(group=c("Apples", "Bananas", "Carrots", "Durians",             "Eggplants"),            year=c("2000", "2001", "2002"),            quality=c("Grade A", "Grade B", "Grade C", "Grade D",             "Grade E"))Group.Weight <- data.frame(    group=c("Apples", "Bananas", "Carrots", "Durians", "Eggplants"),    group.weight=c(1,1,-1,0.5, 0))Quality.Weight <- data.frame(    quality=c("Grade A", "Grade B", "Grade C", "Grade D", "Grade E"),    quality.weight = c(1,0.5,0,-0.5,-1))Data <- merge(Data, Group.Weight)Data <- merge(Data, Quality.Weight)Data$score <- Data$group.weight + Data$quality.weight +     rnorm(nrow(Data), 0, 0.2)Data$proportion.tasty <- exp(Data$score)/(1 + exp(Data$score))
2. Produce Plot

And here's the code to produce the plot.

ggplot(data=Data,        aes(x=factor(year), y=proportion.tasty,            group=group,           shape=group,           color=group)) +                geom_line() +                geom_point() +               opts(title =                "Proportion Tasty by Year, Quality, and Group") +               scale_x_discrete("Year") +               scale_y_continuous("Proportion Tasty") +         facet_grid(.~quality )

And here's what it looks like:

three categorical variables ggplot2

Getting Started with R Markdown, knitr, and Rstudio 0.96

To contact us Click HERE

This post examines the features of R Markdown using knitr in Rstudio 0.96.This combination of tools provides an exciting improvement in usability for reproducible analysis.Specifically, this post (1) discusses getting started with R Markdown and knitr in Rstudio 0.96;(2) provides a basic example of producing console output and plots using R Markdown;(3) highlights several code chunk options such as caching and controlling how input and output is displayed;(4) demonstrates use of standard Markdown notation as well as the extended features of formulas and tables; and (5) discusses the implications of R Markdown.This post was produced with R Markdown. The source code is available here as a gist. The post may be most useful if the source code and displayed post are viewed side by side. In some instances, I include a copy of the R Markdown in the displayed HTML, but most of the time I assume you are reading the source and post side by side.

Getting started

To work with R Markdown, if necessary:

  • Install R
  • Install the lastest version of RStudio (at time of posting, this is 0.96)
  • Install the latest version of the knitr package: install.packages("knitr")

To run the basic working example that produced this blog post:

  • Open R Studio, and go to File - New - R Markdown
  • If necessary install ggplot2 and lattice packages: install.packages("ggplot2"); install.packages("lattice")
  • Paste in the contents of the gist (which contains the R Markdown file used to produce this post) and save the file with an .rmd extension
  • Click Knit HTML
opts_knit$set(upload.fun = imgur_upload)  # upload all images to imgur.com

Prepare for analyses

set.seed(1234)library(ggplot2)library(lattice)

Basic console output

To insert an R code chunk, you can type it manually or just press Chunks - Insert chunks or use the shortcut key. This will produce the following code chunk:

```{r}```

Pressing tab when inside the braces will bring up code chunk options.

The following R code chunk labelled basicconsole is as follows:

```{r basicconsole}x <- 1:10y <- round(rnorm(10, x, 1), 2)df <- data.frame(x, y)df```

The code chunk input and output is then displayed as follows:

x <- 1:10y <- round(rnorm(10, x, 1), 2)df <- data.frame(x, y)df
##     x    y## 1   1 1.31## 2   2 2.31## 3   3 3.36## 4   4 3.27## 5   5 5.04## 6   6 6.11## 7   7 8.43## 8   8 8.98## 9   9 8.38## 10 10 9.27

Plots

Images generated by knitr are saved in a figures folder. However, they also appear to be represented in the HTML output using a data URI scheme. This means that you can paste the HTML into a blog post or discussion forum and you don't have to worry about finding a place to store the images; they're embedded in the HTML.

Simple plot

Here is a basic plot using base graphics:

```{r simpleplot}plot(x)```
plot(x)

plot of chunk simpleplot

Note that unlike traditional Sweave, there is no need to write fig=TRUE.

Multiple plots

Also, unlike traditional Sweave, you can include multiple plots in one code chunk:

```{r multipleplots}boxplot(1:10~rep(1:2,5))plot(x, y)```
boxplot(1:10 ~ rep(1:2, 5))

plot of chunk multipleplots

plot(x, y)

plot of chunk multipleplots

ggplot2 plot

Ggplot2 plots work well:

qplot(x, y, data = df)

plot of chunk ggplot2ex

lattice plot

As do lattice plots:

xyplot(y ~ x)

plot of chunk latticeex

Note that unlike traditional Sweave, there is no need to print lattice plots directly.

R Code chunk features

Create Markdown code from R

The following code hides the command input (i.e., echo=FALSE), and outputs the content directly as code (i.e., results=asis, which is similar to results=tex in Sweave).

```{r dotpointprint, results='asis', echo=FALSE}cat("Here are some dot points\n\n")cat(paste("* The value of y[", 1:3, "] is ", y[1:3], sep="", collapse="\n"))```

Here are some dot points

  • The value of y[1] is 1.31
  • The value of y[2] is 2.31
  • The value of y[3] is 3.36

Create Markdown table code from R

```{r createtable, results='asis', echo=FALSE}cat("x | y", "--- | ---", sep="\n")cat(apply(df, 1, function(X) paste(X, collapse=" | ")), sep = "\n")```
xy
11.31
22.31
33.36
43.27
55.04
66.11
78.43
88.98
98.38
109.27

Control output display

The folllowing code supresses display of R input commands (i.e., echo=FALSE)and removes any preceding text from console output (comment=""; the default is comment="##").

```{r echo=FALSE, comment="", echo=FALSE}head(df)```
  x    y1 1 1.312 2 2.313 3 3.364 4 3.275 5 5.046 6 6.11

Control figure size

The following is an example of a smaller figure using fig.width and fig.height options.

```{r smallplot, fig.width=3, fig.height=3}plot(x)```
plot(x)

plot of chunk smallplot

Cache analysis

Caching analyses is straightforward.Here's example code. On the first run on my computer, this took about 10 seconds.On subsequent runs, this code was not run.

If you want to rerun cached code chunks, just delete the contents of the cache folder

```{r longanalysis, cache=TRUE}for (i in 1:5000) {    lm((i+1)~i)}```

Basic markdown functionality

For those not familiar with standard Markdown, the following may be useful.See the source code for how to produce such points. However, RStudio does include a Markdown quick reference button that adequatly covers this material.

Dot Points

Simple dot points:

  • Point 1
  • Point 2
  • Point 3

and numeric dot points:

  1. Number 1
  2. Number 2
  3. Number 3

and nested dot points:

  • A
    • A.1
    • A.2
  • B
    • B.1
    • B.2

Equations

Equations are included by using LaTeX notation and including them either between single dollar signs (inline equations) or double dollar signs (displayed equations).If you hang around the Q&A site CrossValidated you'll be familiar with this idea.

There are inline equations such as $y_i = \alpha + \beta x_i + e_i$.

And displayed formulas:

$$\frac{1}{1+\exp(-x)}$$

knitr provides self-contained HTML code that calls a Mathjax script to display formulas.However, in order to include the script in my blog posts I took the script and incorporated it into my blogger template.If you are viewing this post through syndication or an RSS reader, this may not work.You may need to view this post on my website.

Tables

Tables can be included using the following notation

ABC
1MaleBlue
2FemalePink

Hyperlinks

  • If you like this post, you may wish to subscribe to my RSS feed.

Images

Here's an example image:

image from redmond barry building unimelb

Code

Here is Markdown R code chunk displayed as code:

```{r}x <- 1:10x```

And then there's inline code such as x <- 1:10.

Quote

Let's quote some stuff:

To be, or not to be, that is the question:Whether 'tis nobler in the mind to sufferThe slings and arrows of outrageous fortune,

Conclusion

  • R Markdown is awesome.
    • The ratio of markup to content is excellent.
    • For exploratory analyses, blog posts, and the like R Markdown will be a powerful productivity booster.
    • For journal articles, LaTeX will presumably still be required.
  • The RStudio team have made the whole process very user friendly.
    • RStudio provides useful shortcut keys for compiling to HTML, and running code chunks. These shortcut keys are presented in a clear way.
    • The incorporated extensions to Markdown, particularly formula and table support, are particularly useful.
    • Jump-to-chunk feature facilitates navigation. It helps if your code chunks have informative names.
    • Code completion on R code chunk options is really helpful. See also chunk options documentation on the knitr website.
  • Other recent posts on R markdown include those by :
    • Christopher Gandrud
    • Markcus Gesmann
    • Rstudio on R Markdown
    • Yihui Xie: I really want to thank him for developing knitr. He has also posted this example of R Markdown.

Questions

The following are a few questions I encountered along the way that might interest others.

Annoying <br/>'s

Question: I asked on the Rstudio discussion site:Why does Markdown to HTML insert <br/> on new lines?

Answer: I just do a find and delete on this text for now.Specifically, I have a sed command that extracts just the content between the body tags and removes br tags.I can then, readily incorporate the result into my blogposts.

sed -i -e '1,/<body>/d' -e'/^<\/body>/,$d' -e 's/<br\/>$//' filename.html

Temporarily disable caching

Question: I asked on StackOverflow about How to set cache=FALSE for a knitr markdown document and override code chunk settings?

Answer: Delete the cache folder. But there are other possible workflows.

Equivalent of Sexpr

Question: I asked on Stack Overvlow about whether there an R Markdown equivalent to Sexpr in Sweave?.

Answer: Include the code between brackets of “backtick r space” and “backtick”. E.g., in the source code I have calculated 2 + 2 = 4 .

Image format

Question: When using the URI scheme images don't appear to display in RSS feeds of my blog.What's a good strategy?

Answer: One strategy is to upload to imgur.The following provides an example of exporting to imgur.

Add the following lines of code near the top of the file:

``` {r optsknit}opts_knit$set(upload.fun = imgur_upload) # upload all images to imgur.com```

I found that the function failed when I was at work behind a firewall, but worked at home.

Example Reproducible Report using R Markdown: Analysis of California Schools Test Data

To contact us Click HERE

This is a quick set of analyses of the California Test Score dataset. The post was produced using R Markdown in RStudio 0.96. The main purpose of this post is to provide a case study of using R Markdown to prepare a quick reproducible report. It provides examples of using plots, output, in-line R code, and markdown. The post is designed to be read along side the R Markdown source code, which is available as a gist on github.

Preliminaries

  • This post builds on my earlier post which provided a guide for Getting Started with R Markdown, knitr, and RStudio 0.96
  • The dataset analysed comes from the AER package which is an accompaniment to the book Applied Econometrics with R written by Christian Kleiber and Achim Zeileis.

Load packages and data

# if necessary uncomment and install packages.  install.packages('AER')# install.packages('psych') install.packages('Hmisc')# install.packages('ggplot2') install.packages('relaimpo')library(AER)  # interesting datasetslibrary(psych)  # describe and psych.panelslibrary(Hmisc)  # describelibrary(ggplot2)  # plots: ggplot and qplotlibrary(relaimpo)  # relative importance in regression
# load the California Schools Dataset and give the dataset a shorter namedata(CASchools)cas <- CASchools# Convert grade to numeric# table(cas$grades)cas$gradesN <- cas$grades == "KK-08"# Get the set of numeric variablesv <- setdiff(names(cas), c("district", "school", "county", "grades"))

Q 1 What does the CASchools dataset involve?

Quoting the help (i.e., ?CASchools), the data is “from all 420 K-6 and K-8 districts in California with data available for 1998 and 1999” and the variables are:

* district: character. District code.* school: character. School name.* county: factor indicating county.* grades: factor indicating grade span of district.* students: Total enrollment.* teachers: Number of teachers.* calworks: Percent qualifying for CalWorks (income assistance).* lunch: Percent qualifying for reduced-price lunch.* computer: Number of computers.* expenditure: Expenditure per student.* income: District average income (in USD 1,000).* english: Percent of English learners.* read: Average reading score.* math: Average math score.

Let's look at the basic structure of the data frame. i.e., the number of observations and the types of values:

str(cas)
## 'data.frame':    420 obs. of  15 variables:##  $ district   : chr  "75119" "61499" "61549" "61457" ...##  $ school     : chr  "Sunol Glen Unified" "Manzanita Elementary" "Thermalito Union Elementary" "Golden Feather Union Elementary" ...##  $ county     : Factor w/ 45 levels "Alameda","Butte",..: 1 2 2 2 2 6 29 11 6 25 ...##  $ grades     : Factor w/ 2 levels "KK-06","KK-08": 2 2 2 2 2 2 2 2 2 1 ...##  $ students   : num  195 240 1550 243 1335 ...##  $ teachers   : num  10.9 11.1 82.9 14 71.5 ...##  $ calworks   : num  0.51 15.42 55.03 36.48 33.11 ...##  $ lunch      : num  2.04 47.92 76.32 77.05 78.43 ...##  $ computer   : num  67 101 169 85 171 25 28 66 35 0 ...##  $ expenditure: num  6385 5099 5502 7102 5236 ...##  $ income     : num  22.69 9.82 8.98 8.98 9.08 ...##  $ english    : num  0 4.58 30 0 13.86 ...##  $ read       : num  692 660 636 652 642 ...##  $ math       : num  690 662 651 644 640 ...##  $ gradesN    : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
# Hmisc::describe(cas) # For more extensive summary statistics

Q. 2 To what extent does expenditure per student vary?

qplot(expenditure, data = cas) + xlim(0, 8000) + xlab("Money spent per student ($)") +     ylab("Count of schools")

plot of chunk cas2

round(t(psych::describe(cas$expenditure)), 1)
##            [,1]## var         1.0## n         420.0## mean     5312.4## sd        633.9## median   5214.5## trimmed  5252.9## mad       487.2## min      3926.1## max      7711.5## range    3785.4## skew        1.1## kurtosis    1.9## se         30.9

The greatest expenditure per student is around double that of the least expenditure per student.

Q. 3a What predicts expenditure per student?

# Compute and format set of correlationscorExp <- cor(cas["expenditure"], cas[setdiff(v, "expenditure")])corExp <- round(t(corExp), 2)corExp[order(corExp[, 1], decreasing = TRUE), , drop = FALSE]
##          expenditure## income          0.31## read            0.22## math            0.15## calworks        0.07## lunch          -0.06## computer       -0.07## english        -0.07## teachers       -0.10## students       -0.11## gradesN        -0.17

More is spent per student in schools :

  1. where people with greater incomes live
  2. reading scores are higher
  3. that are K-6

Q. 4 what is the relationship between district level maths and reading scores?

ggplot(cas, aes(read, math)) + geom_point() + geom_smooth()

plot of chunk cas4

At the district level, the correlation is very strong (r = The correlation is 0.92). From prior experience I'd expect correlations at the individual-level in the .3 to .6 range. Thus, these results are consistent with group-level relationships being much larger than individual-level relationships.

Q. 5 What is the relationship between maths and reading after partialling out other effects?

# command has strange syntax requiring column numbers rather than variable# namespartial.r(cas[v], c(which(names(cas[v]) == "read"), which(names(cas[v]) ==     "math")), which(!names(cas[v]) %in% c("read", "math")))
## partial correlations ##      read math## read 1.00 0.72## math 0.72 1.00

The partial correlation is still very strong but is substantially reduced.

Q. 6 What fraction of a computer does each student have?

cas$compstud <- cas$computer/cas$studentsdescribe(cas$compstud)
## cas$compstud ##       n missing  unique    Mean     .05     .10     .25     .50     .75 ##     420       0     412  0.1359 0.05471 0.06654 0.09377 0.12546 0.16447 ##     .90     .95 ## 0.22494 0.24906 ## ## lowest : 0.00000 0.01455 0.02266 0.02548 0.04167## highest: 0.32770 0.34359 0.34979 0.35897 0.42083 
qplot(compstud, data = cas)
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

plot of chunk unnamed-chunk-4

The mean number of computers per student is 0.136.

Q. 7 What is a good model of the combined effect of other variables on academic performance (i.e., math and read)?

# Examine correlations between variablespsych::pairs.panels(cas[v])

plot of chunk cas7

pairs.panels shows correlations in the upper triangle, scatterplots in the lower triangle, and variable names and distributions on the main diagonal.
After examining the plot several ideas emerge.

# (a) students is a count and could be log transformedcas$studentsLog <- log(cas$students)# (b) teachers is not the variable of interest:#   it is the number of students per teachercas$studteach <- cas$students /cas$teachers# (c) computers is not the variable of interest:#  it is the ratio of computers to students# table(cas$computer==0) # Note some schools have no computers so ratio would be problematic.# Take percentage of a computer insteadcas$compstud <- cas$computer / cas$students # (d) math and reading are correlated highly, reduce to one variablecas$performance <- as.numeric(        scale(scale(cas$read) + scale(cas$math)))

Normally, I'd add all these transformations to an initial data transformation file that I call in the first block, but for the sake of the narrative, I'll leave them here.

Let's examine correlations between predictors and outcome.

m1cor <- cor(cas$performance, cas[c("studentsLog", "studteach", "calworks",     "lunch", "compstud", "income", "expenditure", "gradesN")])t(round(m1cor, 2))
##              [,1]## studentsLog -0.12## studteach   -0.23## calworks    -0.63## lunch       -0.87## compstud     0.27## income       0.71## expenditure  0.19## gradesN     -0.16

Let's examine the multiple regression.

m1 <- lm(performance ~ studentsLog + studteach + calworks + lunch +     compstud + income + expenditure + grades, data = cas)summary(m1)
## ## Call:## lm(formula = performance ~ studentsLog + studteach + calworks + ##     lunch + compstud + income + expenditure + grades, data = cas)## ## Residuals:##     Min      1Q  Median      3Q     Max ## -1.8107 -0.2963 -0.0118  0.2712  1.5662 ## ## Coefficients:##              Estimate Std. Error t value Pr(>|t|)    ## (Intercept)  8.99e-01   4.98e-01    1.80    0.072 .  ## studentsLog -3.83e-02   1.91e-02   -2.01    0.045 *  ## studteach   -1.11e-02   1.59e-02   -0.70    0.487    ## calworks     1.96e-03   2.96e-03    0.66    0.508    ## lunch       -2.65e-02   1.48e-03  -17.97  < 2e-16 ***## compstud     7.88e-01   3.86e-01    2.04    0.042 *  ## income       2.82e-02   4.89e-03    5.77  1.6e-08 ***## expenditure  5.87e-05   4.90e-05    1.20    0.232    ## gradesKK-08 -1.21e-01   6.49e-02   -1.87    0.062 .  ## ---## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.457 on 411 degrees of freedom## Multiple R-squared: 0.795,   Adjusted R-squared: 0.791 ## F-statistic:  199 on 8 and 411 DF,  p-value: <2e-16 ## 

And some indicators of predictor relative importance.

# calc.relimp from relaimpo package.(m1relaimpo <- calc.relimp(m1, type = "lmg", rela = TRUE))
## Response variable: performance ## Total response variance: 1 ## Analysis based on 420 observations ## ## 8 Regressors: ## studentsLog studteach calworks lunch compstud income expenditure grades ## Proportion of variance explained by model: 79.48%## Metrics are normalized to sum to 100% (rela=TRUE). ## ## Relative importance metrics: ## ##                  lmg## studentsLog 0.009973## studteach   0.016695## calworks    0.177666## lunch       0.492866## compstud    0.025815## income      0.251769## expenditure 0.014785## grades      0.010432## ## Average coefficients for different model sizes: ## ##                   1X        2Xs        3Xs        4Xs        5Xs## studentsLog -0.08771 -0.0650133 -0.0558756 -0.0519312 -4.926e-02## studteach   -0.11918 -0.0861199 -0.0629499 -0.0462155 -3.372e-02## calworks    -0.05473 -0.0427576 -0.0324658 -0.0233760 -1.535e-02## lunch       -0.03199 -0.0310310 -0.0301497 -0.0293300 -2.856e-02## compstud     4.15870  3.0673338  2.2639604  1.6844348  1.287e+00## income       0.09860  0.0850555  0.0726892  0.0614726  5.140e-02## expenditure  0.00030  0.0001986  0.0001374  0.0001013  8.061e-05## grades      -0.45677 -0.3345683 -0.2529014 -0.1981200 -1.628e-01##                    6Xs        7Xs        8Xs## studentsLog -4.626e-02 -4.252e-02 -3.833e-02## studteach   -2.418e-02 -1.687e-02 -1.109e-02## calworks    -8.399e-03 -2.612e-03  1.962e-03## lunch       -2.785e-02 -2.718e-02 -2.654e-02## compstud     1.034e+00  8.828e-01  7.884e-01## income       4.250e-02  3.477e-02  2.821e-02## expenditure  6.882e-05  6.206e-05  5.871e-05## grades      -1.414e-01 -1.291e-01 -1.215e-01

Thus, we can conclude that:

  1. Income and indicators of income (e.g., low levels of lunch vouchers) are the two main predictors. Thus, schools with greater average income tend to have better student performance.
  2. Schools with more computers per student have better student performance.
  3. Schools with fewer students per teacher have better student performance.

For more information about relative importance and the relaimpo package measures check out Ulrike Grömping's website.
Of course this is all observational data with the usual caveats regarding causal interpretation.

Now, let's look at some weird stuff.

Q. 8.1 What are common words in Californian School names?

# create a vector of the words that occur in school nameslw <- unlist(strsplit(cas$school, split = " "))# create a table of the frequency of school namestlw <- table(lw)# extract cells of table with count greater than 3tlw2 <- tlw[tlw > 3]# sorted in decreasing ordertlw2 <- sort(tlw2, decreasing = TRUE)# values as proporitionstlw2p <- round(tlw2/nrow(cas), 3)# show this in a bar graphtlw2pdf <- data.frame(word = names(tlw2p), prop = as.numeric(tlw2p),     stringsAsFactors = FALSE)ggplot(tlw2pdf, aes(word, prop)) + geom_bar() + coord_flip()

plot of chunk unnamed-chunk-8

# make it log countsggplot(tlw2pdf, aes(word, log(prop * nrow(cas)))) + geom_bar() +     coord_flip()

plot of chunk unnamed-chunk-9

The word “Elementary” appears in almost all school names (98.3%). The word “Union” appears in around half (43.3%).

Other common words pertain to:

  • Directions (e.g., South, West),
  • Features of the environment (e.g., Creek, Vista, View, Valley)
  • Spanish words (e.g., rio for river; san for saint)

Q. 8.2 Is the number of letters in the school's name related to academic performance?

cas$namelen <- nchar(cas$school)table(cas$namelen)
## ## 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 37 38 39 ##  1  4  9 26 28 31 33 27 30 45 38 28 36 30 18 10  5  4  6  3  1  2  2  2  1 
round(cor(cas$namelen, cas[, c("read", "math")]), 2)
##      read math## [1,] 0.03    0

The answer appears to be “no”.

Q. 8.3 Is the number of words in the school name related to academic performance?

cas$nameWordCount <- sapply(strsplit(cas$school, " "), length)table(cas$nameWordCount)
## ##   2   3   4   5 ## 140 202  72   6 
round(cor(cas$nameWordCount, cas[, c("read", "math")]), 2)
##      read math## [1,] 0.05 0.01

The answer appears to be “no”.

Q. 8.4 Are schools with nice popular nature words in their name doing better academically?

tlw2p  #recall the list of popular names
## lw## Elementary      Union       City     Valley      Joint       View ##      0.983      0.433      0.060      0.040      0.031      0.019 ##   Pleasant        San      Creek        Oak      Santa       Lake ##      0.017      0.017      0.014      0.014      0.014      0.012 ##   Mountain       Park        Rio      Vista      Grove   Lakeside ##      0.012      0.012      0.012      0.012      0.010      0.010 ##      South    Unified       West ##      0.010      0.010      0.010 
# Create a quick and dirty list of popular nature namesnaturenames <- c("Valley", "View", "Creek", "Lake", "Mountain", "Park",     "Rio", "Vista", "Grove", "Lakeside")# work out whether the word is in the school nameschsplit <- strsplit(cas$school, " ")cas$hasNature <- sapply(schsplit, function(X) length(intersect(X,     naturenames)) > 0)round(cor(cas$hasNature, cas[, c("read", "math")]), 2)
##      read math## [1,] 0.09 0.08

So we've found a small correlation.

Let's graph the data to see what it means:

ggplot(cas, aes(hasNature, read)) + geom_boxplot() + geom_jitter(position = position_jitter(width = 0.1)) +     xlab("Has a nature name") + ylab("Mean student reading score")

plot of chunk unnamed-chunk-14

So in the sample nature schools have slightly better reading score (and if we were to graph it, maths scores). However, the number of schools having nature names is actually somewhat small (n= 61) despite the overall quite large sample size.

But is it statistically significant?

t.read <- t.test(cas[cas$hasNature, "read"], cas[!cas$hasNature,     "read"])t.math <- t.test(cas[cas$hasNature, "math"], cas[!cas$hasNature,     "math"])

So, the p-value is less than .05 for reading (p = 0.046) but not quite for maths (p = 0.083). Bingo! After a little bit of data fishing we have found that reading scores are “significantly” greater for those schools with the listed nature names.

But wait: I've asked three separate exploratory questions or perhaps six if we take maths into account.

  • $\frac{.05}{3} =$ 0.0167
  • $\frac{.05}{6} =$ 0.0083

At these Bonferonni corrected p-values, the result is non-significant. Oh well…

Review

Anyway, the aim of this post was not to make profound statements about California schools. Rather the aim was to show how easy it is to produce quick reproducible reports with R Markdown. If you haven't already, you may want to open up the R Markdown file used to produce this post in RStudio, and compile the report yourself.

In particular, I can see R Markdown being my tool of choice for:

  • Blog posts
  • Posts to StackExchange sites
  • Materials for training workshops
  • Short consulting reports, and
  • Exploratory analyses as part of a larger project.

The real question is how far I can push Markdown before I start to miss the control of LaTeX. Markdown does permit arbitrary HTML. Anyway, if you have any thoughts about the scope of R Markdown, feel free to add a comment.

How to Convert Sweave LaTeX to knitr R Markdown: Winter Olympic Medals Example

To contact us Click HERE

The following post shows how to manually convert a Sweave LaTeX document into a knitr R Markdown document. The post (1) reviews many of the required changes; (2) provides an example of a document converted to R Markdown format based on an analysis of Winter Olympic Medal data up to and including 2006; and (3) discusses the pros and cons of LaTeX and Markdown for performing analyses.

Overview

The following analyses of Winter Olympic Medals data have gone through several iterations:

  1. R Script: I originally performed similar analyses in February 2010. It was a simple set of commands where you could see the console output and view the plots.
  2. LaTeX Sweave: In February 2011 I adapted the example to make it a Sweave LaTex document. The source fo this is available on github. With Sweave, I was able to create a document that weaved text, commands, console input, console output, and figures.
  3. R Markdown: Now in June 2012 I'm using the example to review the process of converting a document from Sweave-LaTeX to R Markdown. The souce code is available here on github (see the *.rmd file).

Converting from Sweave to R Markdown

The following changes were required in order to convert my LaTeX Sweave document into an R Markdown document suitable for processing with knitr and RStudio. Many of these changes are fairly obvious if you understand LaTeX and Markdown; but a few are less obvious. And obviously there are many additional changes that might be required on other documents.

R code chunks

  • R code chunk delimiters: Update from << ... >>= and @ to R markdown format ```{r ...} and ```
  • Inline code chunks: Update from \Sexpr{...} to either `r ...` or `r I(...)` format.
  • results=tex: Any results=tex needs to either be removed or converted to results='asis'. Note that string values of knitr options need to be quoted.
  • Boolean options: Sweave tolerates lower case true and false for code chunk options, knitr requires TRUE and FALSE.

Figures and Tables

  • Floats: Remove figure and table floats (e.g., \begin{table}...\end{table}, \begin{figure}...\end{figure}). In R Markdown and HTML, there are no pages and thus content is just placed immediately in the document.
  • Figure captions: Extract content from within the \caption{} command. When using R Markdown, it is often easiest to add captions to the plot itself (e.g., using the main argument in base graphics).
  • Table captions: extract content from within the \caption{} command; Table captions can be included in a caption argument using the caption argument to the xtable function (e.g., print(xtable(MY_DAT_FRAME), "html", caption="MY CAPTION", caption.placement="top") ). Caption placement defaults to "bottom" of table but can be optinally specified as "top" either as a global option or in print.xtable. Alternatively table titles can just be included as Markdown text.
  • References: Delete table and figure lables (e.g., \label{...}). Replace table and figure references (e.g., \ref{...} with actual numbers or other descriptive terminology. It would also be possible to implement something simple in R that stored table and figure numbers (e.g., initialise table and figure numbers at the start of the document; increment table counter each time a table is created and likewise for figures; store the value of counter in variable; include variable in caption text using paste() or something similar. Include counter in text using inline R code chunks.
  • Table content: Markdown supports HTML; so one option is to convert LaTeX tables to HTML tables using a function like print(xtable(MY_DATA_FRAME), type="html"). This is combined with the results='asis' R code chunk option.

Basic formatting

  • Headings: if we assume section is the top level: then \section{...} becomes # ..., \subsection{...} becomes ## ... and \subsubsection{...} becomes ### ...
  • Mathematics: Update latex mathematics to $latex ... and $$latex ... $$ notation if using RStudio.
  • Paragraph delimiters: If using RStudio then remove single line breaks that were not intended to be paragraph breaks.
  • Hyperlinks: Convert LaTeX Hyperlinks from \href or url to [text](url) format.

LaTeX things

  • Comments: Remove any LaTeX comments or switch from % comment to <!-- comment -->
  • LaTeX escaped characters: Remove unnecessary escape characters (e.g., \% is just %).
  • R Markdown escaped characters: Writing about the R Markdown language in R Markdown sometimes requires the use of HTML codes for special characters such as backticks (&#96;) and backslashes (&#92;) to prevent the text from being interpreted; see here for a list of HTML character codes.
  • Header: Remove the LaTeX header information up to and including \begin{document}; extract any incorporate any relevant content such as title, abstract, author, date, etc.

R Markdown Analysis of Winter Olympic Medal Data

The following shows the output of the actual analysis after running the rmd source through Knit HTML in Rstudio. If you're curious, you may wish to view the rmd source code on GitHub side by side this point at this point.

Import Dataset

library(xtable)options(stringsAsFactors = FALSE)medals <- read.csv("data/medals.csv")medals$Year <- as.numeric(medals$Year)medals <- medals[!is.na(medals$Year), ]

The Olympic Medals data frame includes 2311 medals from 1924 to 2006. The data was sourced from The Guardian Data Blog.

Total Medals by Year

# http://www.math.mcmaster.ca/~bolker/emdbook/chap3A.pdfx <- aggregate(medals$Year, list(Year = medals$Year), length)names(x) <- c("year", "medals")x$pos <- seq(x$year)fit <- nls(medals ~ a * pos^b + c, x, start = list(a = 10, b = 1,     c = 50))

In general over the years the number of Winter Olympic medals awarded has increased. In order to model this relationship, year was converted to ordinal position. A three parameter power function seemed plausible, \( y = ax^b + c \), where \( y \) is total medals awarded and \( x \) is the ordinal position of the olympics starting at one. The best fitting parameters by least-squares were

\[ 0.202 x^{2.297 + 50.987}. \]

The figure displays the data and the line of best fit for the model. The model predicts that 2010, 2014, and 2018 would have 271, 295, and 322 medals respectively.

plot(medals ~ pos, x,  las = 1,         ylab = "Total Medals Awarded",         xlab = "Ordinal Position of Olympics",        main="Total medals awarded      by ordinal position of Olympics with     predicted three parameter power function fit displayed.",        las = 1,        bty="l")lines(x$pos, predict(fit))

plot of chunk figure_of_medals

Gender Ratio by Year

medalsByYearByGender <- aggregate(medals$Year, list(Year = medals$Year,     Event.gender = medals$Event.gender), length)medalsByYearByGender <- medalsByYearByGender[medalsByYearByGender$Event.gender !=     "X", ]propf <- list()propf$prop <- medalsByYearByGender[medalsByYearByGender$Event.gender ==     "W", "x"]/(medalsByYearByGender[medalsByYearByGender$Event.gender == "W",     "x"] + medalsByYearByGender[medalsByYearByGender$Event.gender == "M", "x"])propf$year <- medalsByYearByGender[medalsByYearByGender$Event.gender ==     "W", "Year"]propf$propF <- format(round(propf$prop, 2))propf$table <- with(propf, cbind(year, propF))colnames(propf$table) <- c("Year", "Prop. Female")

The figure shows the number of medals won by males and females by year. The table shows the proportion of medals awarded to females by year. It shows a generally similar pattern for males and females. Medals increase gradually until around the late 1980s after which the rate of increase accelerates. However, females started from a much smaller base. Thus, both the absolute difference and the percentage difference has decreased over time to the point where in 2006 46 of medals were won by females.

plot(x ~ Year, medalsByYearByGender[medalsByYearByGender$Event.gender ==     "M", ], ylim = c(0, max(x)), pch = "m", col = "blue", las = 1, ylab = "Total Medals Awarded",     bty = "l", main = "Total Medals Won by Gender and Year")points(medalsByYearByGender[medalsByYearByGender$Event.gender ==     "W", "Year"], medalsByYearByGender[medalsByYearByGender$Event.gender ==     "W", "x"], col = "red", pch = "f")

plot of chunk fgenderRatioByYear_figure

print(xtable(propf$table,             caption="Proportion of Medals that were awarded to Females by Year"),       type="html",       caption.placement="top",      html.table.attributes='align="center"')
Proportion of Medals that were awarded to Females by Year
Year Prop. Female
1 1924 0.07
2 1928 0.08
3 1932 0.08
4 1936 0.12
5 1948 0.18
6 1952 0.23
7 1956 0.26
8 1960 0.38
9 1964 0.37
10 1968 0.37
11 1972 0.36
12 1976 0.35
13 1980 0.34
14 1984 0.36
15 1988 0.37
16 1992 0.43
17 1994 0.43
18 1998 0.44
19 2002 0.45
20 2006 0.46

Countries with the Most Medals

cmm <- list()cmm$medals <- sort(table(medals$NOC), dec = TRUE)cmm$country <- names(cmm$medals)cmm$prop <- cmm$medals/sum(cmm$medals)cmm$propF <- paste(round(cmm$prop * 100, 2), "%", sep = "")cmm$row1 <- c("Rank", "Country", "Total", "%")cmm$rank <- seq(cmm$medals)cmm$include <- 1:10cmm$table <- with(cmm, rbind(cbind(rank[include], country[include],     medals[include], propF[include])))colnames(cmm$table) <- cmm$row1

Norway has won the most medals with 280 (12.12%). The table shows the top 10. Russia, USSR, and EUN (Unified Team in 1992 Olympics) have a combined total of 293. Germany, GDR, and FRG have a combined medal total of 309.

print(xtable(cmm$table, caption="Rankings of Medals Won by Country"),       "html", include.rownames=FALSE, caption.placement='top',      html.table.attributes='align="center"')
Rankings of Medals Won by Country
Rank Country Total %
1 NOR 280 12.12%
2 USA 216 9.35%
3 URS 194 8.39%
4 AUT 185 8.01%
5 GER 158 6.84%
6 FIN 151 6.53%
7 CAN 119 5.15%
8 SUI 118 5.11%
9 SWE 118 5.11%
10 GDR 110 4.76%

Proportion of Gold Medals by Country

Looking only at countries that have won more than 50 medals in the dataset, the figure shows that the proportion of medals won that were gold, silver, or bronze.

NOC50Plus <- names(table(medals$NOC)[table(medals$NOC) > 50])medalsSubset <- medals[medals$NOC %in% NOC50Plus, ]medalsByMedalByNOC <- prop.table(table(medalsSubset$NOC, medalsSubset$Medal),                                  margin = 1)medalsByMedalByNOC <- medalsByMedalByNOC[order(medalsByMedalByNOC[, "Gold"],          decreasing = TRUE), c("Gold", "Silver", "Bronze")]barplot(round(t(medalsByMedalByNOC), 2), horiz = TRUE, las = 1,         col=c("gold", "grey71", "chocolate4"),         xlab = "Proportion of Medals",        main="Proportion of medals won that were gold, silver or bronze.")

plot of chunk proportion_gold

How many different countries have won medals by year?

listOfYears <- unique(medals$Year)names(listOfYears) <- unique(medals$Year)totalNocByYear <- sapply(listOfYears, function(X) length(table(medals[medals$Year ==     X, "NOC"])))

The figure shows the total number of countries winning medals by year.

plot(x = names(totalNocByYear), totalNocByYear, ylim = c(0, max(totalNocByYear)),     las = 1, xlab = "Year", main = "Total Number of Countries Winning Medals By Year",     ylab = "Total Number of Countries", bty = "l")

plot of chunk figure_total_medals

Australia at the Winter Olympics

ausmedals <- list()ausmedals$data <- medals[medals$NOC == "AUS", ]ausmedals$data <- ausmedals$data[, c("Year", "City", "Discipline",     "Event", "Medal")]ausmedals$table <- ausmedals$data

Given that I am an Australian I decided to have a look at the Australian medal count. Australia does not get a lot of snow. Up to and including 2006, Australia has won 6 medals. It won its first medal in 1994. Of the 6 medals, 3 were bronze, 0 were silver, and 3 were gold. The table lists each of these medals.

print(xtable(ausmedals$table,              caption='List of Australian Medals',             digits=0),      type='html',       caption.placement='top',       include.rownames=FALSE,      html.table.attributes='align="center"') 
List of Australian Medals
Year City Discipline Event Medal
1994 Lillehammer Short Track S. 5000m relay Bronze
1998 Nagano Alpine Skiing slalom Bronze
2002 Salt Lake City Short Track S. 1000m Gold
2002 Salt Lake City Freestyle Ski. aerials Gold
2006 Turin Freestyle Ski. aerials Bronze
2006 Turin Freestyle Ski. moguls Gold

Ice Hockey

icehockey <- medals[medals$Sport == "Ice Hockey" & medals$Event.gender ==     "M" & medals$Medal == "Gold", ]icehockeyf <- medals[medals$Sport == "Ice Hockey" & medals$Event.gender ==     "W" & medals$Medal == "Gold", ]# names(table(icehockey$NOC)[table(icehockey$NOC) > 1])

The following are some statistics about Winter Olympics Ice Hockey up to and including the 2006 Winter Olympics.

  • Out of the 20 Winter Olympics that have been staged, Mens Ice Hockey has been held in 20 and the Womens in 3.
  • The USSR has won the most mens gold medals with 7 golds. It goes up to 8 if the 1992 Unified Team is included.
  • Canada has the second most golds with 6.
  • After that the only two nations to win more than one gold are Sweden (2 golds) and the United States (2 golds).
  • The table shows the countries who won gold and silver medals by year.
  • In the case of the Women's Ice Hockey, Canada has won 2 and the United States has won 1.
icehockeygs <- medals[medals$Sport == "Ice Hockey" &     medals$Event.gender == "M" &    medals$Medal %in% c("Silver", "Gold"),  c("Year", "Medal", "NOC")]icetab <- list()icetab$data <- reshape(icehockeygs, idvar="Year", timevar="Medal",    direction="wide")names(icetab$data) <- c("Year", "Gold", "Silver")print(xtable(icetab$data,              caption ="Country Winning Gold and Silver Medals by Year in Mens Ice Hockey",              digits=0),       type="html",           include.rownames=FALSE,      caption.placement="top",      html.table.attributes='align="center"')
Country Winning Gold and Silver Medals by Year in Mens Ice Hockey
Year Gold Silver
1924 CAN USA
1928 CAN SWE
1932 CAN USA
1936 GBR CAN
1948 CAN TCH
1952 CAN USA
1956 URS USA
1960 USA CAN
1964 URS SWE
1968 URS TCH
1972 URS USA
1976 URS TCH
1980 USA URS
1984 URS TCH
1988 URS FIN
1992 EUN CAN
1994 SWE CAN
1998 CZE RUS
2002 CAN USA
2006 SWE FIN

Reflections on the Conversion Process

  • Markdown versus LaTeX:
    • I prefer performing analyses with Markdown than I do with LateX.
    • Markdown is easier to type than LaTeX.
    • Markdown is easier to read than LaTeX.
    • It is easier with Markdown to get started with analyses.
    • Many analyses are only presented on the screen and as such page breaks in LaTeX are a nuisance. This extends to many features of LaTeX such as headers, figure and table placement, margins, table formatting, partiuclarly for long or wide tables, and so on.
    • That said, journal articles, books, and other artefacts that are bound to the model of a printed page are not going anywhere.
    • Furthermore, bibliographies, cross-references, elaborate control of table appearance, and more are all features which LaTeX makes easier than Markdown.
  • R Markdown to Sweave LaTeX:
    • The more common conversion task that I can imagine is taking some simple analyses in R Markdown and having to convert them into knitr LaTeX in order to include the content in a journal article.
    • The first time I converted between the formats, it was good to do it in a relatively manual way to get a sense of all the required changes; however, if I had a large document or was doing the task on subsequent occasions, I would look at more automated solutions using string replacement tools (e.g., sed, or even just replacement commands in a text editor such as Vim), and markup conversion tools (e.g., pandoc).
    • Perhaps if the formats get popular enough, developers will start to build dedicated conversion tools.

Additional Resources

If you liked this post, you may want to subscribe to the RSS feed of my blog. Also see:

  • This post on Getting Started with R Markdown, knitr, and Rstudio 0.96
  • This post for another Example Reproducible Report using R Markdown which analyses California Schools Test Data
  • These Assorted posts using Sweave
  • The knitr home page and knitr options page.
  • the xtable LaTeX table gallery which can also be used to generate HTML tables for inclusion in Markdown.

29 Eylül 2012 Cumartesi

Kustom Note - templates/fields for Evernote Notes

To contact us Click HERE


Evernote, one of my favorite and most used apps, is a great note taking, web clipping, and organizing app. The notes are free form though. You can make your own templates and copy/paste the template into a new note, but there is now a better way. Evernote Trunk is a collection of apps that work with Evernote. Each year, Evernote holds a conference and contest for develops and this year, Kustom Note won a Silver DevCup award. It was well deserved.

Kustom Note is a app that works with and integrates into Evernote and allows you to create custom note templates with fields similar to databases. You can also use pre-made templates. These templates can help make your note taking more organized and efficient.

For example, you could create a template based on how you like to take meeting notes, class notes, or a template for teacher observations, lesson plans and much more. Students could set up class note templates based on the class and the format they like.



Here is a list of many of the features Kustom Note has:

  • User Friendly. No seven-level menus to go through. Once logged in you are 1 click away from taking a note.
  • Set a default notebook for each form template you have.
  • Configurable tags. You can set your tags once for each note form template, new notes will be automatically tagged for you.
  • Preset title prefix and suffix. Set that once and it will be applied to all new note titles created with that template.
  • Include fields in note title. Set fields to be included in the note title which means you get meaningful titles without doing anything.
  • SmartField - Movies. A non-conventional field that shows up as an autocomplete field that searches for movies.Select a movie and it will magically render a beautifully designed snippet with the poster, ratings, stars and links to IMDB, RotternTomatoes and to book tickets on Fandango.
  • SmartField - Music. Selecting one of your template fields to be SmartField - Music, same as for Movies, it will display an autocomplete field that matches against Artists, Tracks, and Albums and renders an amazing snippet about that song or artist, with an album art graphic and links to iTunes.
  • Set your fields to be required, such fields won’t be allowed to be left blank in creating notes.
  • Flexible and constrained. Pick a type for every field and and once set, rules like email format, URL or numeric would be fully enforced.
  • Date and Datetime easy entry. Note creation forms will automatically have date and time selectors for easier and valid values entry.
  • Preset values to select from. For field types like Multiple checkboxes, Single Select, Multiple Select or Radio Group, you create a set of possible values that will be presented on creating notes.
  • Set multiple attachments of files and images with structure and rules for a note template.
  • Take notes in style. No more bland and boring notes. With KustomNote you initially get two themes to select from with color variations (all themes are fully ENML compliant).
  • Icon Stamps. Mark your templates with icons to be easily recognizable. Notes created are automatically stamped, stored and shared with the icon selected.
  • Set email reminders. Once set for a day and time, KustomNote will send you an email reminder to remember that appointment or movie note.
  • Share your templates. Get some good karma and share your amazing templates. You can share your note template forms to Twitter, Facebook and LinkedIn.
  • Browse and use public templates. If can use some help with setting up templates for effective note-taking, browse public templates created by other awesome KustomNoters, clone them and make any changes you like.
  • Everything in one place. With KustomNote you get all those 9 yards, create, update and list your notebooks and notes within the application.
  • KustomNote is mobile and everywhere. With an effective responsive design you can access KustomNote from your desktop computer, iPad or smartphone.
  • Customized notes for applications like Peek is a breeze, it would be a template with two fields Question/Hint & Answer with the Question field set to be used in the note title.
  • Evernote is huge in Japan (also). KustomNote fully supports the Japanese language for the whole interface and data entry.
See all the features for yourself at http://kustomnote.com and start creating note forms for your calories log, workout log, travel itineraries, homework's, your medical appointments, your children art collection and vaccination shots, your pets log, workout and medical records.. well, you get it, for everything in your life!
A new feature is the ability to integrate your Google Contacts and Google Calendar with Kustom Note.

Future updates that are coming include offline access and mobile apps for iOS and Android. The site does have a mobile version for use on your mobile device.

Find Kustom Note in the Evernote Trunk.



Related:

Evernote for Education - resources, tips, help, ideas and more

Evernote Trunk - applications, hardware, and more that works with Evernote

10 Great, Free Apps for Students for Notetaking and Class Planning



Google Docs can import old Office Formats, but exports them as new format

To contact us Click HERE
Google Docs LogoGoogle Docs will no longer export files into the old Microsoft Office Formats (.doc, .xls, .ppt). Instead, it will download the files as .docx, .xlsx, and .pptx. You can also still download files in .odt, .rtf, .pdf. .txt formats. This will also affect Google Apps accounts. (And Google Drive since that's where everyone's Docs are going.



You can still import these older Office file formats, but if you then export them, they will export in the new formats. This may cause a little bit of an issue with some users, including schools still stuck in Windows XP and Office 2003 or earlier.


There is a workaround though - install the free compatibility plugin from Microsoft so you can open the modern Office file types. I've already done this because I have Windows 7 at home and receive a lot of files from people using the newer version of Office. Without this plugin, I wouldn't be able to even open any of these files. This is something everyone should have done by now, especially with more and more people using the newer versions of Office.

Access everywhere
Other options are to use Google Docs instead of Office, convert Office files to Docs files when uploading (which is what I usually do), or keep the Office files in a service like Dropbox or Sugarsync (which is what I do for files that I don't want to risk having formatting losses when being converted).

All of the options are free and only take a few minutes to implement.

I see a lot of people complaining at the beginning, but it really isn't as big a deal as many news outlets are making it out to be.






Edmodo and Common Sense Media release free Digital Citizenship resource for Educators

To contact us Click HERE
EdmodoDigital Citizenship is an important topic and something that we must teach our students (and ourselves) in today's digital world.
Edmodo, a great educational app, and Common Sense Media have created a Digital Citizenship Starter Kit that teachers can download to use with their students. It is an  page PDF file that is free.

The Digital Citizenship Start Kit includes a poster and lessons and activities that cover topics such as privacy, internet safety and security, plagiarism, and cyberbullying.  You can also join the Digital Citizenship Community on Edmodo.




In addition to the starter Starter Kit, Common Sense Media and Edmodo are participating in Digital Citizenship Day in New York on October 2nd. Check out the Edmodo group (for teachers and students) for more details on the day and the Town Hall for teens, hosted by MTV executive news producer and hip-hop artist, Sway Calloway.


Related:

Edmodo - awesome free social learning network - has free Digital Citizenship poster for download(not the same as the one today)

10 Tech Skills Every Student Should Have
10 Important Skills Students need for the Future

Google Launches YouTube curriculum on Digital Citizenship

10 Technology Skills Every Educator Should Have



Google Play Store vs Apple App Store - my experience

To contact us Click HERE


I am an Android fan and user. However, I have a loaner iPad for testing and reviewing apps. I also have a webOS tablet. I've used the app stores for all of them, and Apple's is the worst in terms of customer experience.

1. Google's Play Store and Palm/HP apps were easier to search and easier to install (no need for iTunes!). I could even browse the sites and select the app to be downloaded to any of my devices remotely.

2. iTunes. It is still sad that I had to download something to search and install an app on a different device. Searching isn't great either.

3. Redeeming a gift card/code. In both Google and webOS, you just click "buy" for an app and then it asked you for a code. With Apple, you have to redeem the code first, and then wait for the balance to show up in your account, and then go buy the app.

4. Reversing a mistaken install. Apple - once you buy it, it's a nightmare to get a refund if you mistakenly bought it. Google - one click and it's done. This gives you a chance to make a mistake or realize it's not the app you were looking for.

5. The app information is also better, in my opinion, on Google Play. More info as to it's permissions, features, updates, screen shots, and more. Plus, I love the related apps search and results on Google better than iTunes.

Maybe it's just me, but you would think a company like Apple that prides itself on customer experience would provide a better app store experience.




Google Field Trip - app that finds info about where you are

To contact us Click HERE

Field Trip
Google has a cool new app for Android, called "Field Trip". It helps you find things about what's around you. It runs in the background and when you get close to something interesting, it pops up with details about that location. You can use it to learn more about historical locations, buildings, stores and restaurants and more. It takes information from a variety of sources, along with your location data, to produce these results.



This would be a great app to use with students, as they could explore their own school and neighborhood for interesting locations. It would also be great on real field trips to find out more information about where they are. You never know what they, and you, might find that's interesting or cool.

It's free on Google Play. Go get it and explore your world.



Related:

Android Resources for Education 







28 Eylül 2012 Cuma

Converting Sweave LaTeX to knitr LaTeX: A case study

To contact us Click HERE

The following post documents the steps I needed to take in order to convert aproject using Sweave LaTeX into one using knitr LaTeX.

Additional Resources

It is fairly straightforward to convert a document from Sweave LaTeX to knitrLaTeX. Yihui Xie on the knitr website provides thefollowing useful resources:

  • Transition to Sweave from knitr: Thisdocument describes knitr specifically from the perspective of what is the sameas Sweave and what is different from Sweave.
  • knitr options: This includes discussion ofthe many R code chunk options in knitr. Many are the same as Sweave, but thereare some new ones, and some modifications.
  • knitr minimal examples: These areuseful for getting started with different types of knitr document includingLaTeX.

My conversion from Sweave to knitr

The following documents the steps I needed to do in order to convert a journalarticle that was in Sweave LaTeX into a knitr LaTeX document.Most of this was documented in the above mentioned links on the knitr website,but there were still a few little surprises.

  • Rnw to tex conversion: Convert R CMD Sweave myfile.rnw to Rscript -e"library(knitr); knit('myfile.nw')" in makefile (see this SOquestion ).
  • global options: Replace \SweaveOpts{echo=FALSE} with\Sexpr{opts_chunk$set(echo=FALSE)}; This needed to appear before the firstR code chunk in order to affect all code chunks in the file.
  • case on R code chunk options: Update true and false to TRUE andFALSE in r code chunk options.
  • results option: Update results=tex to results='asis' and in general ensure that textvalues in R code chunks are surrounded by quotation marks.
  • message option: I needed to prevent the display of messages when certainpackages were loaded using \Sexpr{opts_chunk$set(message=FALSE}}.
    These messages did not previously display under sweave.
  • hiding output: I had some R code chunks with options print=FALSE,term=FALSE; I replaced this with results='hide'.
  • methods package: I had a densityplot() (i.e., a lattice plot) thatdidn't display properly. It instead showed an error: Error using packet 1could not find function "hasArg"; apparently this is caused by the fact thatthe methods package doesn't load by default when using Rscript; thus Ineeded to put require(methods) in the first R code chunk.
  • Sweave.sty: I removed Sweave.sty from my project directory and removedthe line \usepackage{Sweave} from my rnw file as both things are not neededin knitr.
  • caching: Although there are packages for enabling caching, I'd neveradopted any of them. knitr makes caching very simple. I just addedcache=TRUE to the global chunk options (i.e.,\Sexpr{opts_chunk$set(echo=FALSE, message=FALSE, cache=TRUE)}. This reducedthe time to build the PDF from around 5 seconds to 1 second. I'm also planningto incorporate some Bayesian analyses with JAGS and rjags, where I'm expectinganalyses will take several minutes or longer to run. At that point, I'llreally appreciate the speed benefits of caching.
  • to make or not to make: I had a custom makefile on the project that kepteverything neat and tidy, copying source files into a build directory, runningall necessary commands to convert from rnw to tex and then to pdf, and thenopening the pdf in a viewer. This still works well. However, the default"Compile to PDF" option in RStudio was also quite good (after setting tools -options - Sweave - Weave Rnw files using knitr). In particular, I liked thesynctex support for Sweave that allows you to move from a position in the sourceto the corresponding position in the PDF viewer. Also, RStudio in combinationwith knitr seems to do a reasonable job of keeping the main project directorytidy. A few auxiliary files are added, but not too many. I also appreciate thesimplicity that a simple button brings to getting started with analyses.However, a makefile does make things more portable.

My main conclusion from this process is that converting an ongoing Sweave LaTeXdocument to knitr LaTeX is fairly straightforward, and there are a number ofuseful benefits that arise. In particular, I really appreciate simple cachingand not having to worry about Sweave.sty. Great work Yihui Xie!

Additional Resources

  • RSS Subscription options
  • Convert Sweave LaTEx to knitr RMarkdown
  • Getting started with R Markdown
  • Getting started with R
  • R Videos
  • Sweave andmakefiles