
– Advertisement –
This is part 2 of an ongoing series on why you should be using R. Future blogs will be linked here as soon as they are released.
– Advertisement –
Why plot in R and not in Excel? To a programmer this may sound like a very obvious question, but it is still a common question asked by Excel users – if you have a data set, can you select it, press a few buttons? and can generate plots? This is one of the trickiest questions to answer, especially if you have limited Excel experience as many new age data scientists do. Hopefully, some of the reasons below will encourage you to switch from Excel to R.
reproducibility
– Advertisement –
How do you view the code used to create the excel graph? Can you tell exactly what is happening? Are you able to control and modify all the aesthetics of the plot, such as changing the length of the axis ticks, or changing the font? If so, are you able to share your work with a colleague and have them easily replicate your plot without you telling them where to click and which modification should be applied?
All this is possible with R. You will automatically have all the code appear as scripts. Reading and understanding the code is possible because of the easy syntax, which allows you to keep track of what the code is doing without worrying about any hidden functions or modifications happening in the background.
– Advertisement –
Do you need help building a Shiny app? Would you like someone to shoulder the burden of maintenance? If so, check out our Shiny and Dash services.
understanding changes
In Excel it is challenging to see with a blind eye what changes have been made to a graph, especially if these were minor changes. With R (and some easy-to-use version control systems), you can see exactly which files were changed. Furthermore, in Excel, a user usually draws a graph on a single Excel document, and if the same graph is needed on a different data set, it is common to copy and paste a bunch of manipulations and configurations into another document. Is. This type of repeated human interaction is prone to errors, as well as time consuming. With R we can avoid this by creating functions that can be used to run the same code on different data sets by simply changing the inputs, producing reliable outputs and saving us a lot of time.
extensibility
Yes, there is a wide range of basic graphics available in Excel, but R has a lot more to offer. Excel has been around for a while, so it has some cool tools that have evolved over the years. However, R is open source, and so extensions are widely available – it’s even easier to make your own. R also has thousands of libraries that can be used to easily create graphics without all the pre-graph work to do some really clever stuff. That being said, Excel is perfectly sufficient when creating basic, simple, straight forward plots. But what if we’re not looking basic?
R’s simplicity
The package {ggplot2} is a plotting package in R that provides us with commands to create complex plots. R’s command line interface lets you quickly select x- and y-axis labels, color by variable, modify grid lines, and more. Each item is added to a new layer, which allows us to add and remove graph elements without affecting the rest of the plot. Interested in changing the color gradient/scale of your plot? No problem, just use a package called {RcolourBrewer} which helps you choose sensible color schemes for your plots. Interested in changing the title of your plot? Just add a layer called ggtitle – and many more.
Compare
Let’s create some simple plots in Excel and then create similar plots in R using the {ggplot2} functions. Hopefully, by the end of this post, we will have inspired you to switch to R. Now, let’s start by loading the data and packages. The data set we used below is data from a selection of films, and includes five columns: country, year, highest profit achieved per film, number of films produced, and number of employees on set during production.
library(“ggplot2”) # library for plotting (“viridis”) # provides a range of color palettes library (“reader”) # data library for loading (“tidvars”) # movie_data for data Let’s start by making a scatter plot, in which we compare the number of employees present in different countries within each year.
scatter plot excel
Creating the generated scatter plot in Excel was easy, but everything had to be done manually: selecting the data and variables for the x- and y-axis, and then selecting the type of plot. I also needed to manually change the headings of the axes. If we are interested in changing the grid lines, this has to be done manually as well. Given this plot, is it something you can easily recreate? Do you know where to point and click to generate this visualization?
R
Here we have created a similar plot in R using the {ggplot2} function. Because the code is visible, we can easily recreate the plot above, but at the same time, we can also easily see what functions and aesthetics were applied to our plot.
ggplot(data = movie_data, aes(x = year, y = no_employee)) + gom_point(aes(color = country)) + labs(x = “year”, y = “number of employees”, color = “country”) + theme_bw()
theming system in {ggplot2}
Theme arguments specify non-data features that you can control. For example, the axis.text argument controls the appearance of the axis text such as font size, color, and text face. Axis.ticks.x controls the ticks on the x-axis and so on. The theme() function allows you to override default theme elements, such as theme(plot.title = element_text(color = “red”)). Absolute themes, such as theme_bw(), set all theme elements to values designed to work together.
We can take this plot even further. Suppose we are interested in creating the same plot as above, but each country has its own plotting panel within the same visualization. We can use the facest function from the {ggplot2} package:
ggplot(data = movie_data, aes(x = years, y = no_employees)) + gom_points() + aspect_wrap (~country, ncol = 4) + labs(x = “years”, y = “number of employees”) + theme_bw () + subject(axis.text.x = element_text(angle = 45, vjust = 1, unjust = 1))
We’ve also used the axis.text.x element to adjust the angle and position of the x-axis labels to ensure they are legible. Are you able to create this in excel without copying and pasting the graph? If so please show us how you managed to do it.
Now, let’s move on to create a histogram using Excel and R. Looking at just the theme() function, we can see that there are a lot more features available in R that we are able to modify, such as axis text, font, legend size, and grid lines. As a data enthusiast, which graph do you find more aesthetically appealing?
histogram plot excel
The histogram generated below was a bit more time consuming. First, we had to change the size of the bars in a normal bar graph in order to generate a histogram. The colors for each column had to be selected and applied manually. Adding a legend to this plot was also a human process. Given this plot, is it something you can easily recreate?
Now, let’s create a histogram using R and its {ggplot2} function.
R
Once again, it is clear that we can easily control all the variables and aesthetics of a histogram plot generated using ggplot. Here we used a new function called scale_fill_varidis(), a function native to {ggplot2}, that allows us to modify the colors that appear on the histogram bars. We also used the theme_classic() function in R to generate a classic-looking plot with x- and y-axis lines and no gridlines. We also edited the size, color and font of the text on the letters (axis.text).
ggplot(data=movie_data, aes(x=highest_profit)) + geom_histogram(aes(fill=country)) + labs(x=”annual profit (in million dollars)”, y=”count”) + scale_fill_viridis(discrete=t ) + theme_classic() + labs(color = “country”) + theme(font.text = element_text(size = 10, color = “black”, family = “serif”))
Now, let’s go ahead and create our final plot.
line plot excel
The most complicated plot was to make the line plot. First, when creating the line graph, it was clear whether the data within the year column was to be rearranged in ascending order or that earlier years would be placed after later years. The line graph was also not able to plot more than one graph showing each country as a separate line because some countries did not have data for all years. After much frustration with Excel we attempted to create a very basic line plot in R.
R
With only three lines of code and very little frustration, we were able to easily recreate the line graph above in R.
ggplot(data = movies_data, aes(x = Year, y = Number_movies)) + geom_line(aes(colour = Country)) +labs(x = “Years”, y = “Number of movies produced”)
Now, let’s add some more aesthetics to our plot as we did for the previous ones by changing the font size (axis.title and Axis.text), changing the panel border (panel.border), as well as editing the size of the legend was ( legend.key.size). Here we decided to use the theme_dark() function in R to create a dark background, which is commonly used to make thin colored lines pop out.
ggplot(data=movie_data,aes(x=year,y=number_movies)) + geom_line(aes(color=country)) + labs(x=”year”, y=”number of movies produced”) + labs(color= “country”) + theme_dark() + theme(panel.border = element_rect(color = “black”, fill = NA, size = 2), axis.title = element_text(size = 12, face = “bold”, family = “Arial”), axis.text = element_text(size = 10, color = “black”, family = “Arial”), legend.key.size = unit(0.50, “cm”))
When comparing R and Excel, it is important to define the level of detail you are looking for. If you want to quickly run basic statistics, Excel may be a better choice. If you’re interested in creating a very basic graph, Excel may be a better choice because of its easy point-and-click system. Before plotting the graph ask yourself; “How detailed should my visualization be? Am I plotting for a publication or not? It is clear in Excel that we can easily select a chunk of data and create a simple chart, however, a more comprehensive When creating plots, using Excel can be extremely frustrating and time-consuming. It all comes down to what you need your graphics to do. For those planning to publish large amounts of complex data, impressive visuals Spending time in R to create the representation will definitely be worth your time. It is also clear that R is not hard, and gives you more customization options than Excel.
R and Excel are beneficial in different ways. Excel is easy to learn and when we come in contact with computers and some of us get stuck there, it is a go-to program. However, making R reproducible is clearly of high importance. It is not a question of choosing between R and Excel, but of deciding which program to use for different needs.
If you’re interested in learning how to graph using R, take part in our Data Visualization with ggplot2 course.
For updates and revisions to this article, see the original post
Connected
Source link
– Advertisement –