Sunday, 28 August 2011

Getting started in R

OpenMx is written in a programming language called R, which is free to download.  R is a powerful language for statistical computing, but much of the documentation is written for experts, and so it can be daunting for beginners.  If you go to this website you will see instructions for how to download R.  On the left panel, select CRAN from the Download section. Do not be put off by incomprehensible terms such as "CRAN mirror": just select a download site from the list provided that is geographically close to where you are.
You may then be offered further options that you may not fully understand. Just persevere by selecting the 'Windows' option from the 'Download and install R' section, and then select 'base', which at last takes you to a page with straightforward download instructions, and a link: 'Download R 2.13.1 for Windows', which you should click (the version number of R may be different).This will download an .exe file, and if you double click that, it will start an installer for R, offering the usual options about where to install, etc. I recommend accepting defaults.
Installation of R will create a Start Menu item and an icon for R on your desktop. Double clicking on the R icon starts the program. Whew!

Starting R opens a window called R Console, in which you can type commands.
You will see a > cursor.
This cursor will not be shown in the examples below, but it indicates that the console is awaiting input from you.
At the > cursor, type:
This starts a web-based interface to on-line help. You may want to briefly explore this before going further.  As with other programming languages, you hit Enter at the end of each command.
Just to familiarise yourself with the console, type:
R evaluates the expression and you see output:
   [1]  3
The [1] at the beginning of the output line indicates that the answer is the first element of a vector. If you find this puzzling, don't worry about it for now.
Now type:
   x = 1+2
Nothing happens.  But the variable x has been assigned,  and if you now type x on the console, you will again see the output
   [1] 3
In R, the results of variable assignments are not shown automatically, but you can see them at any time by just typing the name of the variable.
The assigned variable x will remain assigned unless you explicitly remove it using the 'rm' command. Type:
And then type x  again
You now see
  Error: object 'x' not found
You can repeat an earlier command by pressing the up arrow until it reappears.  Use this method to redo the assignment x=1+2, and then type X. Again you get the error message, because R is case-sensitive, and so X and x are different variables. 

Now type:
  y = c (1, 3, 6, 7) 
and then inspect the variable y (by typing y at the console).
You will see that it is a vector of numbers [1 3 6 7].  The 'c' in the previous command is not a variable name, but rather denotes the operation of concatenation. It just instructs R to create a variable consisting of the sequence of material that follows in brackets.
Now type:
 x =
 and hit Enter.
The cursor changs to +
This is R telling you that the command is incomplete. If you now type 1+2 followed by Enter, your regular cursor returns, because the command is completed.
It can happen that you start typing a command and think better of it. To escape from an incomplete command, and restore the > cursor, just hit Escape.
Also, if a program is running and you want to stop it, you can escape with Ctrl+c.

You have been working so far in the Console window. This is used for online computations and to hold results of program output. 
Now select New Script from the File option on the menu bar. A new window opens titled R editor. This is used to hold a script, i.e. a list of R commands that don't execute until you instruct R to run them.
If you type in the Editor
and then select Run All from the Edit option on the Menu Bar, the script is executed, and you see the results of the computations in the Console window.

You can write or edit scripts directly in the Editor, but you may find it easier to use a word processor to do this and then paste the script into the Editor, as this allows for easier editing.
A note on quotes:  If you paste a script into your R console or browser,  quotes may get reformatted, causing an error.  Always check: for R, single quotes should be straight quotes, not 'smart quotes' (i.e. quotes that slant or curl in a different direction at the start and end of a quoted section). You may need to retype them if your system has reformatted them.

Important: Traditionally, R scripts use <- instead of =. The arrow is made by typing a pointed bracket and then a dash.
So, you will see instances of scripts which have commands such as
a <- 1+3
This is equivalent to
 a = 1+3
It is also possible to have the arrow going the other way , i.e., 1+3 -> a, which means the same thing.
My preference is to use '=', as it involves just one keystroke, but you will find both methods for denoting an assignment in this manual, so you become familiar with interchanging them.

Loops: A loop is a way of repeatedly executing the same code.  Suppose we wanted to print out the ten times table, we could type 1*10; 2*10, 3*10, and so on. But a simpler method is to use a loop, where we multiply 10 times a variable, myx, and specify the range of values that myx will take at the start of the loop.   
Try typing this list of commands in the editor, and then Run all:
                for (myx in c(1:10))
The first line specifies the values that myx can take, i.e. c(1:10), which is the values 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. The program executes all the commands between curly brackets repeatedly, incrementing the value of myx each time it does so, until it gets to the final value, whereupon it exits the loop.

Commenting: A good script will contain many lines preceded by #.
This indicates that the line is a comment – it does not contain commands to be executed, but provides explanation of how the script works.  The demonstration scripts in this manual are heavily commented so that you can understand how they work.
Before you start running the scripts in this manual, create a project directory that will contain all of your scripts, data, and workspace for a project.  Then go to the menu and select Change dir from the File menu, and navigate to your new directory.  This means that all your work will be saved in one place.  Whenever you start up R from a file in that directory, it will continue as your working directory.

As you work through this manual you will become more familiar with R, but it is likely that you will also want to extend your knowledge beyond the very basic details given here.
There is a good introductory manual that can be downloaded from the web-page that you saw when you typed help.start(), and a short introduction on:

One last point I will put at the end of this post, so you can find it again easily, because it is a crazy but important feature of R. Other programming languages have a command you can use to clear all variables easily - such as clear(all). It is important to do that sometimes, because otherwise your variables hang around and can cause confusion, especially if you run scripts that use the same variable names. In R, you can clear all variables, but not in an intuitive way. You have to type:
I can never remember this, and have to look it up every time. It goes to show that the guys who developed R may be very smart, but they certainly aren't psychologists.

Finally, the following books are recommended:
Braun, W. J., & Murdoch, D. J. (2007). A first course in statistical programming with R. Cambridge: Cambridge University Press.
Crawley, M. J. (2007). The R Book. Chichester, UK: Wiley.
Venables, W. N., & Ripley, B. D. (2002). Modern applied statistics with S, 4th edition. New York: Springer. (Do not be put off by the title: it really should be entitled 'with S and R')

PS 3rd September 2011
I've just download RStudio. It is free and it looks very promising as an interface to make R much easier to work with.

No comments:

Post a Comment