We have likely already gone through an overview of what RStudio is and how it works with R but just in case you need a refresher on it you can read through the content of this webpage.
RStudio is what is called a Integrated Development Environment (IDE) - these are software suits that allow computer programmers to interact with the language which they are writing in. There are several IDEs for the R language but the most popular one is RStudio. RStudio is great because it comes with many extra features preinstalled or easily installable which you may find useful later on. These include the ability to make Shiny Web Apps, interactive documents, fantastic slide shows, websites and even publish books.
This tutorial will walk you through installing R and RStudio and then give you an overview of the features of RStudio.
The first thing we need to do is install R. The easiest way to do this is to head over to the CRAN Cloud Mirror and click “Download R for insert your OS here”. Save this file to your computer and then run it. It shouldn’t take very long to complete.
RStudio is an interface which we will use to interact with R. You will need to have R installed already so make sure you do. Once R is installed then you can download and install RStudio by selecting the option relevant to you from the list available here.
Now that R and RStudio have been installed on your computer we can open up RStudio - do this now. The RStudio interface is about as close as you can get to a native GUI for R. The basic RStudio window is broken up initially into three panels. These are:
RStudio is a great programme. One of the reasons for this is that it works using projects. Think of projects as exactly that - a task or assignment that you are working on. Projects can be very small or very big. For instance - each practical that you are working on (although each one is relatively small) is a project. But a thesis (which may consist of many datasets and analyses) is also one project. This means that for every project that you do in real life you should also have a virtual RStudio project to match it. RStudio projects are stored in directories (the fancy name for a folder) and all the files related to that project (such as datasets and script files) are then kept neatly in that folder. When you start working on your project RStudio will store all of your variables and objects in the .RData
file. The .Rproj
file is there to link all aspects of your project together within RStudio. Working with projects is a great way to keep your work organised. It also makes it easy to come back to a project after a long time because nothing will get lost. And it also makes it easy to share your project with another person if you need to - all you need to do is copy the folder which contains your project and pass it on to someone else and then they can run the files (provided they have R, RStudio and all the packages that you have used).
It is a good idea to keep all of your projects in the same place. I tend to work with OneDrive and so I have created a folder called “R” in my OneDrive. This means that I will never loose my projects. Alternatively, you could create an “R” folder in your My Documents folder and then in that folder you can create another folder called “Projects” and in there is where you can store your projects. Set up this file structure now.
Suggested file structure
-My documents > R > Projects > Unique files for your projects
-OneDrive > R > Projects > Unique files for your projects
Back to the windows. There is a fourth window that will appear when you call up datasets or when you work with scripts. Let’s go through how to import data and what scripts are.
Working with datasets in R is quite different to how you might work with them in Excel but working with datasets in R will make your data management process much more efficient and neater. R is a very logical language and so your datasets need to be arranged logically and simply. Having complex data structures (i.e. merged cells, misaligned columns, long and flowy column and row names) will cause R to spit out errors at you. Your data need to be arranged neatly beginning in column A1 and going across to the right as more variables are added. Remember to be carefull that you have named all of your data correctly as differently named data will cause headaches later. The following link will give you access to two example Excel files - one is badly formatted and the other is well formatted. Download these to your computer and save them in the project folder you are currently working in (where your .Rproj is located). Your file viewer panel (the bottom right panel) should look something like Figure 2 when the “Files” tab is selected.
Download the example Excel files here.
When the files are visible you can then simply click on them and select import dataset. There are several options available to you on the new import options screen that appears next. In the bottom right is a little window which looks like it contains some code - this is the code that R will run if you import this file. Another thing to note is that there is a text input box which allows you to specify a name for your newly imported data. Name the good.xlsx
file good_df
and name the bad.xlsx
file bad_df
Click import and then you will see two new objects in the top right (Environment) panel of RStudio. Clicking on either of these will then immediately open the dataset up in a new fourth window in the top left of RStudio.
Take carful note of how the two datasets have been imported. Both contain the same data but good_df
is much more logical and easier to work with than bad_df
.
If you would like a recap on scripts you can go through the tutorial from Hands on Programming with R Part 1.2.6.