R Programming – Shishir Kant Singh https://shishirkant.com Jada Sir जाड़ा सर :) Mon, 25 May 2020 16:34:59 +0000 en-US hourly 1 https://wordpress.org/?v=6.8.1 https://shishirkant.com/wp-content/uploads/2020/05/cropped-shishir-32x32.jpg R Programming – Shishir Kant Singh https://shishirkant.com 32 32 187312365 R Line Graph https://shishirkant.com/r-line-graph/?utm_source=rss&utm_medium=rss&utm_campaign=r-line-graph Mon, 25 May 2020 16:34:55 +0000 http://shishirkant.com/?p=771 Line Graph represents relation between two variables.

Plot a line graph in R

We shall learn to plot a line graph in R programming language with the help of plot() function.

Syntax of plot() function

plot(x, y, ...)

where

  • x is any R object with a plot method. Ex : numeric vector
  • y is any R object with a plot method. Ex : numeric vector
  • … is the extra arguments that could be provided, which may contain any of the following
    • type  – type could be any of the below values
      • ‘p’ – points
      • ‘l’ – lines
      • ‘b’ – both points and lines
      • ‘c’ – for the lines part alone of both points and lines
      • ‘o’ – for both points and lines overplotted
      • ‘h’ – generates kind of histogram view
      • ‘s’ – for stair step look
      • ‘S’ – other steps
      • ‘n’ – no plotting
    • main is the main title for the plot
    • sub is the sub title for the plot
    • xlab is the x label i.e., title for x-axis
    • ylab is the y label i.e., title for y-axis
    • asp is the aspect ratio whose value should be given by y/x
    • lwd is the line width
    • pch is the point character. There are 25 symbols to choose from including alphabetic characters
    • col gives the color to points and line

Example 1 : Line Graph

Line Graph
# R program to plot line graph
x = c(1,2,3,4,5,6,7,8,9,10,11)
y = c(22,13,5,9,25,22,26,1,9,10,2)
 
# plot function
# except x,y ; rest are optional
plot(x, y, type='b', main='First Plot Example', sub='Line Graph', xlab='time', ylab='cost', asp=1)

When the above program is run, plot would be generated as a pdf file, by default.

Line Graph Plot using R programming language
Example 2 : Line, No points, Colored line
Plot Colored Line Graph
# R program to plot line graph
x = c(1,2,3,4,5,6,7,8,9,10,11)
y = c(22,13,5,9,25,22,26,1,9,10,2)
 
# plot function
# except x,y ; rest are optional
plot(x, y, type='l', col='#ff0000')

Output  Line Graph Plot

Colored Line Graph Plot using R programming language

Example 3 : Points, No Line, Colored points

Plot Colored Line Graph
# R program to plot line graph
x = c(1,2,3,4,5,6,7,8,9,10,11)
y = c(22,13,5,9,25,22,26,1,9,10,2)
 
# plot function
# except x,y ; rest are optional
plot(x, y, type='p', col='#0000ff')

Output  Point Graph Plot

Colored Points Graph Plot using R programming language

Example 4 : Stair Step Graph

Plot Colored Line Graph
# R program to plot stair graph
x = c(1,2,3,4,5,6,7,8,9,10,11)
y = c(22,13,5,9,25,22,26,1,9,10,2)
 
# plot function
plot(x, y, type='s', col='#0000ff')

Output  Stair Graph Plot

Stair Plot Graph using R programming language
]]>
771
R Pie Chart https://shishirkant.com/r-pie-chart/?utm_source=rss&utm_medium=rss&utm_campaign=r-pie-chart Mon, 25 May 2020 16:25:33 +0000 http://shishirkant.com/?p=768
Pie Chart is a pictorial representation of proportions in a whole as sectors in a circle.

Draw Pie Chart in R programming language

R programming language provides two functions – pie() and pie3d() to draw pie charts.

Syntax of R pie function

pie(x, labels = NULL, edges = 200, radius = 0.8, clockwise = FALSE, init.angle = 0, density = NULL, angle = 0, col = NULL, border = NULL, lty = NULL, main = NULL, …)

where

  • x [mandatory] is a numerical vector with values >=0. Each values decides the proportion of circle.
  • labels is a character vector. They are names for the slices.
  • edges the circular outline of the pie is approximated by a polygon with this many edges.
  • radius of the circle in pie chart.
  • clockwise represents the logical indicating if slices are drawn clockwise or counter clockwise. counter clockwise is default. {TRUE,FALSE}
  • init.angle is the initial angle(in degrees) to start with for drawing sectors. [0,360]
  • density of shading lines, in lines per inch
  • angle is the slope of shading lines in degrees (counter-clockwise).
  • col is a vector of colors to be used in filling or shading the slices.
  • border, lty – arguments passed to polygon which draws each slice.
  • main is an overall title for the plot.
  • … is graphical parameters can be given as arguments to pie. They will affect the main title and labels only.

Example 1 – Simple Pie Chart

Simple pie chart
# Data Pie chart
x = c(4, 1, 3, 1, 2)
labels = c('breakfast', 'brunch', 'lunch', 'snacks', 'dinner')
colors = c('#4286f4','#bb3af2','#ed2f52','#efc023','#ea7441')
 
# Give the chart file a name.
png(file = "diet.png")
 
# Plot the chart.
pie(x, labels, main='Daily Diet Plan', col=colors, init.angle=180, clockwise=TRUE)
 
# Save the file.
dev.off()
Simple Pie Chart example with R programming language

When the above program is run, following pie chart is created at the location of your Rscript.

Example 2 – Pie Chart with Striped Lines
Pie Chart with Striped Lines
# Draw Pie Chart in R
# Data Pie chart
x = c(4, 1, 3, 1, 2)
labels = c('breakfast', 'brunch', 'lunch', 'snacks', 'dinner')
colors = c('#4286f4','#bb3af2','#ed2f52','#efc023','#ea7441')
 
# Give the chart file a name.
png(file = "diet.png")
 
# Plot the chart.
pie(x, labels, main='Daily Diet Plan',density=30 ,col=colors, angle=45)
 
# Save the file.
dev.off()
Example R Pie Chart with Striped Lines
Example Pie Chart with Edged sectors
Pie Chart with Striped Lines
# Draw Pie Chart in R
# Data Pie chart
x = c(4, 1, 3, 1, 2)
labels = c('breakfast', 'brunch', 'lunch', 'snacks', 'dinner')
colors = c('#4286f4','#bb3af2','#ed2f52','#efc023','#ea7441')
 
# Give the chart file a name.
png(file = "diet.png")
 
# Plot the chart.
pie(x, labels, main='Daily Diet Plan',edges=1 ,col=colors)
 
# Save the file.
dev.off()
Edged R Pie Chart Example

Syntax of R pie3D function

plotrix library is required to use it in an Rscript

pie3D(x,edges=NA,radius=1,height=0.1,theta=pi/6,start=0,border=par(“fg”), col=NULL,labels=NULL,labelpos=NULL,labelcol=par(“fg”),labelcex=1.5, sector.order=NULL,explode=0,shade=0.8,mar=c(4,4,4,4),pty=”s”,…)

x [mandatory] is a numerical vector with values >=0. Each values decides the proportion of circle.where

  • labels is a character vector. They are names for the slices.
  • radius of the circle in pie chart.
  • col is a vector of colors to be used in filling or shading the slices.
  • main is an overall title for the plot.
  • explode is the amount by which sectors are exploded
  • … extra arguments
Example 1 – Simple 3D Pie Chart in R
Simple 3D Pie Chart in R programming language
# Draw Pie Chart in R
# Get the library.
library(plotrix)
 
# Data for Pie chart
x = c(4, 1, 3, 1, 2)
labels = c('breakfast', 'brunch', 'lunch', 'snacks', 'dinner')
colors = c('#4286f4','#bb3af2','#ed2f52','#efc023','#ea7441')
 
# Give the chart file a name.
png(file = "diet3d.png")
 
# Plot the chart.
pie3D(x, labels=labels, explode=0.1, height=0.05, main='Daily Diet Plan', col=colors)
 
# Save the file.
dev.off()

When the above program is run, following pie chart is created at the location of your Rscript.

Simple 3D Pie Chart example with R programming language
]]>
768
R Data Visualization https://shishirkant.com/r-data-visualization/?utm_source=rss&utm_medium=rss&utm_campaign=r-data-visualization Mon, 25 May 2020 16:09:39 +0000 http://shishirkant.com/?p=764 What is R Data Visualization?

Using the diverse functionalities provided by R, one can create visually appealing data visualizations with only a few lines of code. Data visualization is an efficient technique of gaining insights about data through a visual medium.

  • With the help of visualization techniques, humans can easily gain insights about the hidden patterns in data which might otherwise be neglected. 
  • Using data visualization, one can work with large datasets to efficiently obtain key insights about it. 

R Visualization Packages

Following are some of the essential visualization packages in R Programming:

Use of R Programming

For most of our work in R Programming, we will use the environment RStudio.

RStudio of R has four panels:

  • Console – This is the actual R window, you can enter R commands here. And, thus execute them by pressing enter.
  • Source – This is where we can edit scripts. It is where you should always be working. Control-enter sends selected codes to console.
  • Plots/Help – Here plots and help pages will be shown.
  • Workspace – Shows which objects you currently have.

Note – We need R data visualization because it provides a clear understanding of patterns in data. Also, it has the ability to detect hidden structures in data.

R Graphics

1. Standard Graphics

R standard graphics available through package graphics, include several functions that provide statistical plots, like:

  • Scatterplots
  • Boxplots
  • Piecharts
  • Barplots etc.

We use these graphs which are typically a single function call.

2. Graphics Devices

  • Its functions produce output that totally depends on the active graphics device.
  • A screen is the default and more frequently used device. 
  • R graphical devices, like the PDF device, the JPEG device, etc.
  • The user just needs to open the graphics output device that she/he wants. Hence, R takes care of producing the type of output required by the device.
  • This means, to produce a certain plot on the screen or as a GIF R graphics file, the R code should exactly be the same. You only need to open the target output device before!
  • Several devices may be open at the same time, but only one is the active device.

3. The basics of the grammar of graphics

Key elements of a statistical graphic:

  • Data
  • Aesthetic Mappings
  • Geometric Objects
  • Statistical Transformations
  • Scales
  • Coordinates system
  • Faceting

Now, let us discuss each of them.

3.1 Aesthetic Mappings

  • It controls the relation between data variables and graphics variables.
  • Also, it helps to map the temperature variable of a data set into the X variable in a scatter plot.
  • It helps to map the species of a plant into the colour of dots in graphics.

3.2 Geometric Objects

It shows each observation by a point using the aesthetic mappings that map two variables in the data set into the x,y variables of the plot.

3.3 Statistical Transformations

  • It allows us to calculate and also perform a statistical analysis of the data in the plot.
  • Also, the statistical transformation uses the data and approximates it by a regression line x,y coordinates.
  • It counts occurrences of certain values.

3.4 Scales

It maps the data values into values in the coordinate system of the graphics device.

3.5 Coordinate system

We use it to plot the data.

  • Cartesian
  • Plot

3.6 Faceting

It splits the data into subgroups and draws sub-graphs for each group.

Data Visualization in R using ggplot2

“ggplot2 is the most widely used data visualization package of the R programming language.”

What type of data visualization in R should be used for what sort of problem? I will provide you with tips which will help you to choose the right type of chart for your specific objectives. We will also learn to implement data visualization in R using ggplot2.

  • Introduction to ggplot2
  • Customizing the look and feel

1. Introduction to ggplot2

It is a plotting system. We use it to build professional-looking graphs. Also, use plots quickly with minimal code. It helps to take care of many complicated things that make plotting difficult. Hence, ggplot2 is very different from base R plotting but it is also very flexible and powerful.

We can create a histogram using ggplot2 as follows:

library(magrittr)
library(dplyr)
library(ggplot2)
data_histogram <- mtcars %>%
mutate(cyl = factor(cyl)) %>%
group_by(cyl) %>%
summarize(mean_mpg = round(mean(mpg), 2))
ggplot(data_histogram, aes(x = cyl, y = mean_mpg)) +
geom_bar(fill = "coral", stat = "identity")

Output:

Histogram - R Data Visualization

It uses data frames as input:

  • Data must be in long format. This means each row is an observation and each column is a variable.
  • Use reshape2 to get data in long format.

2. Important things to remember for ggplot

  • It was developed by Hadley Wickham as an implementation of the grammar of graphics.
  • ggplot is relatively complete and is a powerful graphics package.
  • It can do many things but cannot build 3D visuals.

3. How to install ggplot2 package

  • ggplot2 can be easily installed by typing:
install.packages(“ggplot2”)
  • Make sure that you are using the latest version of R to get the most recent version of ggplot2.

4. Applications of ggplot2

  • Aesthetics: It refers to visual attributes that affect how data is displayed in a graphic, e.g., color, point size, or line type.
  • Geometric objects: We use it for a visual representation of observations such as points, lines, polygons, etc. 
  • Faceting: It is applied to the same type of graph.
  • Annotation: We use it to add text and/or external graphics to a ggplot.
  • Positional adjustments: It helps to reduce the overplotting of points.

5. Why ggplot2?

  • It is used professionally.
  • Easy to manipulate.
  • Has great support online.
  • It has knowledge transfers to other packages/languages.

What to Learn in Data Visualization in R?

R Programming helps us to learn this art by offering a set of inbuilt functions and also libraries to build visualizations and present data. Before we move forward for the technical implementation of the visualization, let’s see first how to select the right chart type.

Selecting the Right Chart Type

There are four basic presentation types:

  • Comparison
  • Composition
  • Distribution
  • Relationship

Following are the most used charts in data visualization:

  • Scatter Plot
  • Histogram
  • Bar & Stack Bar Chart
  • Box Plot
  • Area Chart
  • Heat Map
  • Correlogram

Now we will discuss when to use each of them:

1. Scatter Plot

To see the relationship between two continuous variables.

Scatter-Plot - Data visualization in R

2. Histogram

A histogram is used to plot a continuous variable. Also, It helps to break the data into bins and shows the frequency distribution of these bins. Thus, we can always change the bin size and see the effect it has on visualization.

Distribution of Average Ratings per User

3. Bar Chart

We use bar charts to plot a categorical variable.

bar-chart-vertical - Data Visualization in R

4. Box Plot

Box plots are used to plot an aggregation of categorical and continuous variables. It is also used for visualizing the spread of the data and detect outliers. Moreover, it shows five statistically significant numbers:

  • Minimum
  • 25th percentile
  • Median
  • 75th percentile and
  • Maximum.

5. Area Chart

We use it to show the continuity across a variable or data set. It is almost same as a line chart. Also, we can use it for time series plots. We can use it alternatively to plot continuous variables and analyze the underlying trends.

Area Chart - Data Visualization in R

6. Heat Map

We use it for the intensity of colours. It is also used to display a relationship between two or three or many variables in a two-dimensional image. Thus, it allows us to explore two dimensions of the axis and the third dimension by an intensity of colour.

Heat-Map - Data Visualization in R

7. Correlogram

We use it to test the level of correlation and also among the variable available in the dataset. Thus, the cells of the matrix can be shaded or coloured to show the co-relation value.

correlogram - Data Visualization in R

Pros and Cons of Data Visualization in R

Let’s have a look at the advantages and disadvantages of data visualization in R programming:

Advantages of Data Visualization in R

1. Understanding

It may be more appealing to look into the business. And, it’s easy to understand through graphics and charts when compared to a written document comprising text and numbers. Thus, it can attract a wider audience. Also, it promotes widespread utilization of those business insights to arrive at better decisions.

2. Efficiency

Its app allows us to display a lot of information in a small space. While the process of decision making in business is inherently complex and multifaceted, displaying evaluation findings in a graph can allow the companies to organize lots of interrelated information in useful ways.

3. Location

Its app that uses features like geographical maps and GIS can be especially relevant for extensive businesses when a location is a very relevant factor. We use maps to show business insights from different places, giving an idea of the severity of issues, the reasons behind them and also the workarounds to address them.

Disadvantages of Data Visualization in R

1. Cost

Its applications cost a decent sum of money, and it may not be possible for especially small companies to spend that many resources upon purchasing them. In order to generate reports, many companies may hire professionals to produce charts which may increase the costs. Small enterprises are often working in resource-limited settings, and also getting evaluation results in a timely manner that can often be of high importance.

2. Distraction

Although at times, the data visualization apps create reports and charts laden with highly complicated and fancy graphics, which may be tempting for the users to focus more on form than on function. The overall value of the graphic representation will be minimal if we first add visual appeal. In a resource-setting, it is also important to think carefully about how resources can be best used. And also not get caught up in the graphics trend without a clear purpose.

]]>
764
R Object Oriented Programming https://shishirkant.com/r-object-oriented-programming/?utm_source=rss&utm_medium=rss&utm_campaign=r-object-oriented-programming Mon, 25 May 2020 15:45:18 +0000 http://shishirkant.com/?p=748 What is Object-Oriented Programming?

Object-oriented programming is a programming technique that manages the complexity of a program more efficiently. It uses techniques like abstractionencapsulation, and polymorphism. It enacts an environment where the tasks are distributed among the different parts of the program or ‘objects’. These objects hide their internal working from other objects and only share concise details regarding what they can do and how to contact them.

For example, think of it as a company. A company may have different departments dealing with different aspects of it. One department does not share its internal workings with another. All a department needs to know is what the other departments do and how to get their help when needed. How a department does whatever it does, is not a concern to the others. This way, the tasks are distributed to appropriate departments and they all focus only on their jobs.

Similarly, in an object-oriented environment, we can think of the programs as interactions between different objects instead of steps of an algorithm.

What is Functional Programming?

Functional programming is a declarative programming style. It uses expressions that it evaluates to produce their values. Statements assign variables and expressions can be grouped together as functions to make organized routines of expressions. Functional programming is algorithmic.

Functional vs Object-Oriented Programming

Here are the key differences between functional and object-oriented programming:

  1. In functional programming, the primary unit is a function while in OOP, the primary unit is an object.
  2. Functional programming focuses on evaluating functions whereas OOP deals with objects and their interactions among each other.
  3. Data in functional programming is immutable that is when we modify data (which can be a function or a variable) the data is not simply modified. It is copied and the new copy has the changes made while the old copy is still saved in the memory. OOP has mutable data.
  4. Functional programming uses recursion for iterative processing while OOP uses loops for it.
  5. Functional programming supports parallel programming. Its functions do not affect code running on other processors. Object-oriented programming does not support parallel programming. Its methods may affect parallelly running parts of the program.
  6. In functional programming, statements can be executed in any order whereas in OOP, execution needs to be in a specific order.

Core Concepts of Object-Oriented Programming

Object-oriented programming has a few core techniques that distinguish it from other programming paradigms. Most OOP languages don’t implement all of them but just some of these techniques instead. The most common ones are:

  1. Polymorphism: Polymorphism means different shapes. In polymorphism, a single name can be used to call different objects based on the context it is used in. Overloading and overriding are good examples of polymorphism techniques.
  2. Encapsulation: Encapsulation deals with abstraction. It ensures that the data encapsulated inside an object is not visible to other objects. Only the data and methods declared to be public knowledge are known to others.
  3. Hierarchies: In OOP, hierarchies can be of two types: Inheritance or Composition.
    • Inheritance is when a class is a ‘type of’ or a ‘special case of’ another class known as the superclass. For example, an eagle is a type of bird. Here, eagle is a subclass and bird is its superclass.
    • In Composition, a class has an instance of another class. For example, a car has wheels but wheels are not a type of car. There is no inheritance here but wheels are a part of the car so the class car should have access to the class wheels which implies composition.

Object-Oriented Programming in R

In R, we solve problems by dividing them into simpler functions and not objects. Hence, functional programming is generally used in R. OOP languages have a single object model in place. These models define the properties and behavior of the objects we create. R has a few different object-oriented programming systems or object models in place. These systems are:

  1. S3: S3 is the first and the simplest OOP system in R. It does not have many restrictions and we can create objects by simply adding a class attribute to it. S3 classes don’t have a formally defined structure. The methods defined in an S3 class belong to generic functions.
  2. S4: S4 classes have a definitive structure to them. This means that different S4 classes have a similar structure. We use the setClass() function to create a new class and the new() function to create an object. Like S3 classes, methods in S4 classes belong to generic functions rather than the classes themselves.
  3. Reference classes(RC): Reference classes in R are similar to object-oriented classes in other programming languages. They have all the features of S4 classes with an added environment. To create a reference class, we use the setRefClass() function. The methods in a reference class belong to the class itself.

While these three systems are in the base R packages, there are other packages on CRAN repository that provide more systems like the R6.

Generic Functions

Generic functions are a topic of much confusion in R. These functions are good examples of polymorphism. The most common example of a generic function is the print() function.

The function prints the arguments provided. The input argument can be a string, a numeric or even an object. It does not need a description of the object, its type or structure. How does it do that? The print() function is a collection of various print functions dedicated to different data types and data structures in R. When we use it, the function finds the type and class of the input object and calls the appropriate function to print it. We can see the different functions under the generic print() function by using the method() command. For example:

Code:

methods(print)

Output:

[1] print.acf*
[2] print.anova*
[3] print.aov*
[4] print.aovlist*
.
.
.
[189] print.xgettext*
[190] print.xngettext*
[191] print.xtabs*
see ‘?methods’ for accessing help and source code

S3 Classes

S3 classes are the most basic object-oriented classes in R. They implement the polymorphism principle of OOP but not much else. To create an S3 class, we need to add a class attribute to an object.

Code:

emp <- list(name="ramesh",age="24",
department="sales",emp_id="00495")
class(emp) <- "employee"
emp

Output:

We can create methods for generic functions as well.

Code:

print.employee <- function(obj){
 cat("name: " ,obj$name, "\n")
 cat("age: ", obj$age, "\n")
 cat("department: ", obj$department, "\n")
 cat("Id: ", obj$emp_id, "\n")
}
print(emp)

Output:

Useful Functions for S3 Objects

1. The is.object() function: the is.object() function returns TRUE if the input argument is an object of any of the object models of R.

Code:

is.object(emp)

Output:

2. The getS3method() function: the getS3method() function returns the S3 method used by a generic function for a specific type of object.

Code:

getS3method('print','employee')

Output:

S4 Classes

S4 classes take OOP in R one step further than S3 classes. They have a definite structure to them which brings some uniformity to objects in R. We can create an S4 class by using the setClass() function and make a new object using the new function.

Code:

setClass("employeeS4", slots=list(name=
"character",age="numeric",department=
"character",emp_id="numeric"))
emp2 <- new("employeeS4",name="james",age=27,
department="accounts",emp_id=00564)
emp2

Output:

Confused about the list in R? Learn to create R list again.

We can access and modify the slots of an S4 object using the @ or the slot() function.

Code:

emp2@name

Code:

emp2@name <- "jerry"
emp2@name

Code:

slot(emp2,"age")

Code:

slot(emp2,"age") <- 28
slot(emp2,"age")

Output:

We can create a method for a generic function for S4 classes using the setMethod() function.

Code:

setMethod("show",
"employeeS4",
function(object){
 cat("name: ", object@name, "\n")
 cat("age: ",object@age,"\n")
 cat("department: ",object@department,"\n")
 cat("Id: ",object@emp_id,"\n")
}
)

Code:

emp2

Output:

Useful S4 Methods in R

Here are a few functions that are useful when dealing with S4 objects:

1. The isS4() function: the isS4() function returns TRUE if the input argument is an S4 object.

Code:

isS4(emp2)

Output:

2. The slotNames() function: the slotNames() function returns the names of all the slots in the input object.

Code:

slotNames(emp2)

Output:

Reference Classes (RC)

Reference classes are like object-oriented classes in other programming languages. They have all the properties of S4 classes with some added structure and restrictions. We can create a reference class using the setRefClass() function.

Code:

employeeRC <- setRefClass("employeeRC",
fields=list(name="character",age="numeric",
department="character",emp_id="numeric"))
emp3 <- employeeRC(name="rahul",age=25,
department="human resources",emp_id=00243)
emp3

Output:

We can access and modify the field in a reference class using the $ symbol.

Code:

emp3$name

Code:

emp3$name <- "suman"
emp3$name

Output:

We can define methods for reference classes using the methods argument of the setRefClass() function.

Code:

employeeRC <- setRefClass("employeeRC",
fields=list(name="character",age="numeric",
department="character",emp_id="numeric"),
methods=list(
show = function(){
 cat("name: ", name, "\n")
 cat("age: ", age,"\n") 
 cat("department: ", department,"\n")
 cat("Id: ", emp_id,"\n")
}
))
emp4 <- employeeRC(name="jimmy",age=26,department="RnD"
,emp_id=00342)
show(emp4)

Output:

]]>
748
R Excel Files https://shishirkant.com/r-excel-files/?utm_source=rss&utm_medium=rss&utm_campaign=r-excel-files Mon, 25 May 2020 14:09:54 +0000 http://shishirkant.com/?p=745 The xlsx is a file extension of a spreadsheet file format which was created by Microsoft to work with Microsoft Excel. In the present era, Microsoft Excel is a widely used spreadsheet program that sores data in the .xls or .xlsx format. R allows us to read data directly from these files by providing some excel specific packages. There are lots of packages such as XLConnect, xlsx, gdata, etc. We will use xlsx package, which not only allows us to read data from an excel file but also allow us to write data in it.

R Excel file

Install xlsx Package

Our primary task is to install “xlsx” package with the help of install.package command. When we install the xlsx package, it will ask us to install some additional packages on which this package is dependent. For installing the additional packages, the same command is used with the required package name. There is the following syntax of install command:

install.packages("package name")   

Example

install.packages("xlsx")  

Output

R Excel file

Verifying and Loading of “xlsx” Package

In R, grepl() and any() functions are used to verify the package. If the packages are installed, these functions will return True else return False. For verifying the package, both the functions are used together.

For loading purposes, we use the library() function with the appropriate package name. This function loads all the additional packages also.

Example

#Installing xlsx package  
install.packages("xlsx")  
# Verifying the package is installed.
any(grepl("xlsx",installed.packages()))  
# Loading the library into R workspace.  
library("xlsx")  

Output

R Excel file

Creating an xlsx File

Once the xlsx package is loaded into our system, we will create an excel file with the following data and named it employee.

R Excel file

Apart from this, we will create another table with the following data and give it a name as employee_info.

R Excel file

Note: Both the files will be saved in the current working directory of the R workspace.

Reading the Excel File

Like the CSV file, we can read data from an excel file. R provides read.xlsx() function, which takes two arguments as input, i.e., file name and index of the sheet. This function returns the excel data in the form of a data frame in the R environment. There is the following syntax of read.xlsx() function:

read.xlsx(file_name,sheet_index)  

Let’s see an example in which we read data from our employee.xlsx file.

Example

#Loading xlsx package
library("xlsx")  
# Reading the first worksheet in the file employee.xlsx.  
excel_data<- read.xlsx("employee.xlsx", sheetIndex = 1)  
print(excel_data)  

Output

R Excel file

Writing data into Excel File

In R, we can also write the data into our .xlsx file. R provides a write.xlsx() function to write data into the excel file. There is the following syntax of write.xlsx() function:

  1. write.xlsx(data_frame,file_name,col.names,row.names,sheetnames,append)  

Here,

  • The data_frame is our data, which we want to insert into our excel file.
  • The file_names is the name of that file in which we want to insert our data.
  • The col.names and row.names are the logical values that are specifying whether the column names/row names of the data frame are to be written to the file.
  • The append is a logical value, which indicates our data should be appended or not into an existing file.

Let’s see an example to understand how write.xlsx() function works with its parameters.

Example

#Loading xlsx package  
library("xlsx")  
#Creating data frame  
emp.data<- data.frame( name = c("Raman","Rafia","Himanshu","jasmine","Yash"),    
salary = c(623.3,915.2,611.0,729.0,843.25),
start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11","2015-03-27")),
dept = c("Operations","IT","HR","IT","Finance"), 
stringsAsFactors = FALSE    
)    

# Writing the first data set in employee.xlsxRscript  
write.xlsx(emp.data, file = "employee.xlsx", col.names=TRUE, row.names=TRUE,sheetName="Sheet2",append = TRUE)  

# Reading the first worksheet in the file employee.xlsx.  
excel_data<- read.xlsx("employee.xlsx", sheetIndex = 1)  
print(excel_data)  

# Reading the first worksheet in the file employee.xlsx. 
excel_data<- read.xlsx("employee.xlsx", sheetIndex = 2)
print(excel_data)  

Output

R Excel file
]]>
745
R CSV Files https://shishirkant.com/r-csv-files/?utm_source=rss&utm_medium=rss&utm_campaign=r-csv-files Mon, 25 May 2020 13:56:24 +0000 http://shishirkant.com/?p=742 R CSV Files

We shall learn R functions to :

  • R Read CSV Files
  • R Process CSV Files
  • R Write CSV Files

Example of a CSV File, that we use in the examples going forward, is given below :sampleCSV.csv

sampleCSV.csv
Andrew,28,25.2
Mathew,23,10.5
Dany,49,11
Philip,29,21.9
John,38,44
Bing,23,11.5
Monica,29,45

You may refer R Working Directory to modify the path of R Workspace to point to the directory containing your input files (CSV Files).

Read CSV Files

CSV Files are those files with values separated by commas in each row. Each row corresponds to a record or observation.

Syntax of function to read CSV File in R programming language :

read.csv(<filename>)

Example to read CSV File in R programming language :

r_readCSVexample.R - R Script File
#Example R program to read CSV File

#set working directory - the directory containing csv file
setwd("/home/arjun/workspace/r")

#read csv file
csvData = read.csv("sampleCSV.csv")

#print the data type of csvData
cat("CSV Data type : ",class(csvData), "\n\n")

print(csvData)

Terminal Output
$ Rscript r_readCSVexample.R
CSV Data type : data.frame
     name age income
1 Andrew 28 25.2
2 Adarsh 23 10.5
3 Dany 49 11.0
4 Philip 29 21.9
5 John 38 44.0
6 Bing 23 11.5
7 Monica 29 45.0

Please observe that the data of csv file is read to an R Data Frame.

Process data read from CSV Files

R programming language reads the CSV File to an R Dataframe. So, you may use all the R Data Frame  functions to process the data.

Some of the examples are given below.

Example R program to retrieve rows based on a condition applied to column

From the csv file, we shall extract rows, whose income is equal to the maximum of income.

r_csv_analyze_data.R - R Script File
# Example R program to analyze CSV File

# set working directory - the directory containing csv file
setwd("/home/arjun/workspace/r")

# read csv file
celebrities = read.csv("sampleCSV.csv")

# retrieve rows based on a condition
maxSalariedCelebrities = subset(celebrities, income==max(income))

# print the result
print(maxSalariedCelebrities)

Terminal Output
$ Rscript r_csv_analyze_data.R
name age income
7 Monica 29 45

Write transformed data to CSV Files

Once we extract the required data or transform the data which is in data frame, we may write data frame to a CSV File.

Example R program to write an R data frame to CSV File

We shall use the above example, where we extracted rows with maximum income, and write the resulting rows to a CSV File.

r_csv_write_data.R - R Script File
# Example R program to write data to a CSV file

# set working directory - the directory containing csv file
setwd("/home/arjun/workspace/r")

# read csv file
celebrities = read.csv("sampleCSV.csv")

# retrieve rows based on a condition
maxSalariedCelebrities = subset(celebrities, income==max(income))

# write filtered data into a new CSV file.
write.csv(maxSalariedCelebrities,"result.csv")

Terminal Output
Rscript r_csv_write_data.R
]]>
742
R Data Reshaping https://shishirkant.com/r-data-reshaping/?utm_source=rss&utm_medium=rss&utm_campaign=r-data-reshaping Mon, 25 May 2020 13:41:22 +0000 http://shishirkant.com/?p=736 R data reshaping is all about changing the way in which data is organized into rows and columns. Most of the time data processing in R is done by taking the input data as a data frame. Also, extracting data from the rows and columns of a data frame is an easy task but there are situations when we need the data frame in a format that is different from the format in which we received it. In R, it has many functions to split, merge and change the rows to columns in a data frame.

Why Reshape R Package?

For analytic functions, the data obtained as a result of an experiment or study is generally different. Usually, the data from a study has one or more columns that can identify a row followed by a number of columns that represent the values measured. The columns that identify the row can be thought of as a composite key of a database column.

Joining Columns and Rows in a Data Frame

We use vectors to create a data frame using the cbind()function.

1. cbind()

We use cbind() function to combine vector, matrix or data frame by columns.

cbind(x1,x2,…)
x1,x2: vector, matrix, data frames

2. rbind()

We use rbind() function to combine vector, matrix or data frame by rows.

rbind(x1,x2,…)
x1,x2: vector, matrix, data frames

3. melt()

We use melt() function in R to convert an object into a molten data frame. It takes input in the form of a wide format and stacks multiple columns into a single column of the data. The melt() function has the following attributes –

melt(data, …, na.rm = FALSE, value.name = “value”)
  • data – The input data that is to be melted.
  • …. – Arguments that are passed to or from.
  • na.rm – Used for converting explicit missings into implicit missings.
  • value.name – Used for storing values in variables.

In the following example, make use of the mtcars data and apply melt() function to the id variables – ‘gears’ and ‘carbs’ and the measured variables – ‘mpg’, ‘cyl’, ‘disp’, ‘hp’. We use this melt function to melt the mtcars data frame.

library(reshape)
library(datasets)
str(mtcars)
molted = melt(mtcars,id.vars=c("gear","carb"),measured.vars=c("mpg","cyl","disp","hp"))
str(molted)
molted[sample(nrow(molted),10),]

Code Display:

melt function - Reshape R

Output:

output melt

4. dcast()

Once you have a molten dataset with you, it is ready to be cast or reshaped. We will construct the original dataset using the dcast() function. The dcast() function:

head(dcast(molted,gear+carb~variable, length))

Output:

Decast Output

There are three arguments in dcast():

  • data – The data attribute taken in the molten data frame.
  • formula – The formula specifies how the data is to be cast. The formula is present in the form of x_variable ~ y_variable, but there can be multiple variables present.
  • fun.aggregate – We use this function if there is data aggregation due to implementation of the casting formula. (example – length(), mean() and sum() ).

What if we use only one of the variables gear or carb in dcast()?

dcast(molted,gear~variable,mean)

Output:

decast mean output

We can also perform a transpose operation on this as follows:

> dcast(molted,variable~gear,mean)

Output:

We can also avail . (dot) which does not signify any variable:

> dcast(molted,variable~.,mean) 

Output:

We can also perform:

> dcast(molted,carb~.,mean) 

Output:

Margins, that are known as column totals can be created by specifying an attribute ‘margin’ and setting it to TRUE.

dcast(molted,variable~gear,mean,margins=TRUE)

Output:

decast margin output - R Data Reshaping

Merging Data Frames in R

In order to combine the two data frames in R, we make use of the merge() function. The data frames must have the same column names on which the merging happens.

Adding Columns

To merge two data frames (datasets) horizontally, we use the merge function. Mostly, we use it to join two data frames by one or more common key variables (i.e., an inner join).

# merge two data frames by ID
total <- merge(data frameA,data frameB,by=”ID”)
# merge two data frames by ID and Country
total <- merge(data frameA,data frameB,by=c(“ID”,”Country”)) .
]]>
736
R – Strings https://shishirkant.com/r-strings/?utm_source=rss&utm_medium=rss&utm_campaign=r-strings Sun, 24 May 2020 17:38:12 +0000 http://shishirkant.com/?p=730 R Strings

Single quotes or double quotes are used to represent Strings in R programming language. We shall learn rules to write a String, embed special characters in it, and some of the common operations like concatenation of two strings, finding length of string, etc.

Rules to write R Strings

Following are the rules to write a valid string in R programming language :

  • Either single quotes or double quotes should be used to enclose a string value.
str1 =‘Hello‘ str2 =“Hello“
  • As only single quote or double quotes are the special characters used to represent an R string, to include them in a String requires special care.
  • To include a single quote inside a string value, surround the single quote with double quotes.
For Who’s hello is that? str1 = “Who”‘“s hello is that?”
  • To include a double quote inside a string value, surround the double quote with single quotes.
ForR says, “Hello”. str1 = “R says, ‘“‘Hello’“‘.”
  • Single quote or double quote could be included at the starting or ending of the string value as is.
For‘Hello.” str = “‘Hello.““

How to find length of String in R

To find the length of a String in R, use nchar() function. Following is the syntax of nchar function.

nchar(x)

where x is of character data type

String Length in R

Example to find length of String in R

In this example, we will initialize a variable with a String value and find its length using nchar(string) function.

# Example R program to find length of string

str = 'Hello World!'

# find length using nchar(c)
length = nchar(str)

print (length)

Output
$ Rscript r_strings_length.R
[1] 12

Example to find length of a number in R

nchar automatically type casts number to character

Find length of String
# Example R program to find length of number

num = 523641

# find length using nchar(c)
length = nchar(num)

print (length)

Output
Rscript r_strings_length.R
[1] 6

You may try nchar(c) function for other data types to find the length.


Extract Substring from a String in R

Syntax of R substring function

substring(c, first, last)

where :

c is the stringfirst is the starting position of substring (in the main string) to be extractedlast is the ending position of substring (in the main string) to be extracted. last is optional, in which case last is taken as whole length of the string.

Example to extract Substring from a String in R

extract Substring from a String
# Example R program to find substring

str = 'Hello World! Welcome to learning R programming language'

substring(c, first, last)
subStr = substring(str, 13, 23)

print (subStr)

Output
$ Rscript r_strings_substring.R
[1] " Welcome to"

Example to extract Substring from a String in R without last position provided

extract Substring from a String
# Example R program to find substring

str = 'Hello World! Welcome to learning R programming language'

substring(c, first, last=nchar(c))
subStr = substring(str, 13)

print (subStr)

Output
$ Rscript r_strings_substring.R
[1] " Welcome to learning R programming language"

Concatenate Strings

To concatenate strings in r programming, use paste() function. The syntax of paste function that is used to concatenate two or more strings.

paste(…, sep="", collapse=NULL)

where

pasteis the keyword
input strings separated by comma
sepis a character that would be appended between two adjacent strings and acts as a separator
collapseis an optional character to separate the results

Concatenate two or more Strings in R

While concatenating strings in R, we can choose the separator and number number of input strings. Following examples demonstrate different scenarios while concatenating strings in R using paste() function.

Example: Concatenate Strings in R

In this example, we will use paste() function with default separator.

Concatenate two strings
#Example R program to concatenate two strings

str1 = 'Hello'
str2 = 'World!'

# concatenate two strings using paste function
result = paste(str1,str2)

print (result)

Output
$ Rscript r_strings_concatenate_two.R
[1] "Hello World!"

The default separator in paste function is space ” “. So ‘Hello’ and ‘World!’ are joined with space in between them.

Example: Concatenate Strings in R with no separator

In this example, we shall use paste() function with the separator as emtpy, i.e., sep=””.

Concatenate two strings with custom separator
#Example R program to concatenate two strings

str1 = 'Hello'
str2 = 'World!'

#concatenate two strings using paste function
result = paste(str1,str2,sep="")

print (result)

Output
$ Rscript r_strings_concatenate_two.R
[1] "HelloWorld!"

Please observe that we have overwritten default separator in paste function with “”. So ‘Hello’ and ‘World!’ are joined with nothing in between them.

Example: Concatenate Multiple Strings

In this example, we shall use paste() function with the separator as hyphen, i.e., sep=”-” and also we take multiple strings at once.

concatenate more than two strings
# Example R program to concatenate two strings

str1 = 'Hello'
str2 = 'World!'
str3 = 'Join'
str4 = 'Me!'

# concatenate two strings using paste function
result = paste(str1,str2,str3,str4,sep="-")

print (result)

Output
$ Rscript r_strings_concatenate_multiple.R
[1] "Hello-World!-Join-Me!"
You may provide as many strings as required followed by the separator to the R paste() function to concatenate.
]]>
730
R – List of Packages https://shishirkant.com/r-list-of-packages/?utm_source=rss&utm_medium=rss&utm_campaign=r-list-of-packages Sun, 24 May 2020 17:14:22 +0000 http://shishirkant.com/?p=727 Introduction to List of R Packages

A package in R programming language is a unit that provides required functionalities that can be utilized by loading it into the R environment. A list of R Packages is similar to a library in C, C++ or Java. So, essentially, a package can have numerous functionalities like functions, constants, etc. that we will allow the user to utilize them in the context of a particular problem. In R, a requisite package can be loaded using library() function. In case, a package is not present, then it can be installed using the install.packages() function. Packages make seemingly difficult tasks easy through its ready-made functionalities.

What are R Packages?

There are many packages in R, and the selection of a package depends on its application. Though there are certain packages that are widely used due to the functionalities they provide, it isn’t the case that other packages are less important. Different packages have different purposes; some are related to statistical techniques, some pertain to visualizations, etc.

In the following section, we will look at some of the important packages in R:

1. Car

This package is Companion to Applied Regression. It is a big package that provides various functionalities for statistical analysis. Importing this package into the R environment imports other related packages such as MASS, stats, graphics, etc. Some of the functions in the package include Anova, avPlots, Boxplot, carPalette, density plots, infIndexPlot, linear hypothesis, logit, outlier test, qqPlot, residual plots, scatterplot, scatterplot matrix, etc. The extensive capabilities of the package can be gauged from the number of functions it provides.

2. Corrplot

The package provides a graphical display of a correlation matrix and a confidence interval. The package also provides algorithms to perform matrix reordering. Numerous options include choosing requisite colors, text labels, color labels, layout, etc.  Various visualization methods or parameter methods in corrplot package are “circle”, “square”, “ellipse”, “number”, “shade”, “color”, and “pie”. The corrplot function incorporating various options gives a visually appealing representation of correlation amongst different variables, which, otherwise, in normal circumstances, like numbers, are difficult to interpret. Positive correlations are displayed in blue and negative correlations in red. The intensity of color and the size of the circle are proportional to the correlation coefficients.

3. DataExplorer

This package deals with automated data exploration and treatment. It provides an automated data exploration process meant for analytic tasks and predictive modeling. This is crucial as it enables the user to understand data and extract insights. Each variable in the analysis is scanned and analyzed by the package. Further, the package provides functionalities for visualization of these variables using typical graphical techniques. It also provides common data processing methods for treat and format data.

4. Gmodels

The gmodels package provides various tools in R for plotting data. It contains various functions such as glh.test which is used to test, print, or summarize a general linear hypothesis for a regression model. The function makes. contrasts convert human-readable contrasts into the form that R requires for computation. The matrix returned by make.contrasts can be used as the argument to the contrasts argument of model functions.  The coefFrame function fits a model to each subgroup defined by, then returns a data frame with one row for each fit and one column for each parameter. The estimable function computes and tests contrasts and other estimable linear functions of model coefficients for lm, glm, etc. The function fit.contrast computes and tests arbitrary contrasts for regression objects.

5. Gplots

This package provides visualizations functionalities through multifarious programming tools. The functions in the package work on the concept of calculation and plotting. The graphical capabilities of the package are demonstrated by various functions such as band plot, boxplot2, col2hex, ci2d, hist2d, text plot, sink plot, balloon pilot, plotCI, plot means, etc. These functions enable working with settings related to color, text, and other intricate graphical aspects of the visualization. They also deal with complex elements involved in statistics-based visualization, e.g. lmplot2, residplot functions that enable the user to drive detailed regression diagnosis through diagnostic plots. If multiple data needs to be plotted in the same region, but with separate axes, then this is possible using over plot function in the package.

6. Ggplot2

It is one of the very famous packages in R that provides extensive visual capabilities and presents the results even of complex statistical and mathematical techniques. The numerous functionalities provided by the package enables the analyst to derive insights from data in the most interactive fashion. The R description for the function is “a system for declaratively creating graphics which is based on the Grammar of Graphics”.  This grammar of graphics means that the user has to tell ‘ggplot2’ about the way variables have to be mapped to aesthetics, so this essentially means that specifying what graphical aspects to using, and ggplot2 will work accordingly based on the details.

7. Lubridate

This R package makes it easier to work with dates and times. The lubridate package enables easy manipulation of date and time data. It parses a number and gives suitable data arrangement, in fact, the parse functions in the package handle a wide variety of formats and separators that simplifies the parsing process. One of the notable features is that the package provides functionalities to handle dates with different time zones.

8. Hmisc

Named Harrell Miscellaneous, the Hmisc package contains many functions that can be leveraged for data analysis, high-level graphics and utility operations. It also includes functions for computing sample size and power, importing and annotating datasets, imputing missing values, providing advanced table functionalities, clustering of variables, manipulation of the character string, conversion of R objects to HTML code, etc.

9. Lattice

The package offers a high-level data visualization system that was inspired by Trellis graphics. It emphasizes on multivariate data. The powerful visualization capabilities of the package provide the needed graphical solution. Some of the notable functions in the package are B_07_cloud which helps produce 3d scatter plot and wireframe surface plot; D_level. colors, a function to compute false colors representing numeric or categorical variable; B_06_levelplot, a function that generates level plots and contour plots; A_01_Lattice, a function that provides Lattice Graphical capabilities. B_09_tmd is a function that generates Tukey Mean – Difference Plot; B_11_oneway, a function that fits the One-way Model. The package, thus, provides extensive functionalities for visualizations through various functions.

10. MatrixModels

This package allows modeling with sparse and dense ‘Matrix’ matrices. To accomplish this it uses modular prediction and response, module classes. All the functions provided by the package are equally important, some of which are lm.fit.sparse which is a fitter function for sparse linear models, solveCoef which solves for the coefficients and coefficient increment, model. A matrix that constructs possibly sparse design or model matrices, glm4 which fits generalized linear models.

11. Multcomp

The package allows for multiple comparisons of k groups in generalized linear models. A list of nine standard procedures viz. Dunnet, Tukey, Sequen, AVE, Changepoint, Williams, Marcus, McDermott, and Tetrade, is provided to the user, and the user selects the comparisons based on the requirement. In addition to this, a free input interface is also provided for the contrast matrix which allows for special comparisons. The noteworthy feature is that the comparisons itself are not restricted to any particular design such as balanced or simple, rather the programs are designed in such a manner that they suit multiple comparisons within the general linear model which allows for covariates, correlated means, missing values, etc.

12. OpenMx

This package basically deals with extended structural equation modeling. It provides functionalities to create structural equation models. These models can be manipulated using programming. The models may be specified with matrices or paths such as LISREL or RAM. Some of the types of models include multiple groups, confirmatory factor, mixture distribution, categorical threshold, differential Fit functions, etc.

13. Plyr

It is a very important package that provides functionalities for data manipulation. It provides tools for splitting, applying and combining data. It comes with a set of tools that helps solve a common set of problems. E.g. sometimes we may need to break a big task into smaller tasks that are manageable, then we operate on each of the pieces and then finally, we put all the pieces back together.

14. Qcc

The package acquires significance owing to various quality analysis functionalities that it provides. It provides Shewhart quality control charts for continuous, attributes and counts data. Among other important charts are Cusum and EWMA charts and Operating characteristics curves. It also offers process capability analysis functionality. Pareto chart and cause-and-effect chart and multivariate control charts are useful tools that are provided by the package.

15. RandomForest

As the name suggests, this package is used to build a random forest algorithm. The package implements Breiman’s random forest algorithm, which is based on Beiman and Cutler’s original FORTRAN code. The algorithm is used for classification and regression. The package can also be used in unsupervised mode to assess proximities among data points.

16. Psych

It is a package meant for a special purpose. The package provides a procedure for psychological, psychometric, and personality research. Functions are primarily for multivariate analysis using various multivariate statistical techniques.

]]>
727
R – Packages https://shishirkant.com/r-packages/?utm_source=rss&utm_medium=rss&utm_campaign=r-packages Sun, 24 May 2020 17:11:27 +0000 http://shishirkant.com/?p=724 Introduction to R Packages

R packages are a set of predefined functions as a library to be used while deploying the R program to care reusability and less code approach R programs. R packages are externally developed and can be imported to the R environment in order to use the available function which belongs to that package. R packages are managed by the R community network known as CRAN for providing and provisioning with the R programming language. Apart from the standard R packages, there are several external packages available for use in the R program. One of the popular graphical packages in R is ggplot2.

Where do we Find Packages?

Packages are available on the internet through different sources. However, there are certain trusted repositories from where we can download the packages.

Here are the two important repositories that are available online.

  • CRAN(Comprehensive R Archive Network): This is the official R community with a network of FTP and webservers that contains the latest code and documentation of R. Before, you post your packages online it goes through a series of tests which adheres to CRAN policy.
  • GitHub: GitHub is another famous repository but not specific to R.The online community can share their packages with other people and it is used for version control is well. GitHub is an open-source and doesn’t have any review process.

List of Useful R Packages

There are several packages in R and can be downloaded from CRAN or GitHub. Below are the packages that can be used for specific purposes.

1. Loading the Data from External Sources

  • Haven: R reads and writes data from SAS.
  • DBI: To establish communication between the relational database and R.
  • RSQlite: It is used to read data from relational databases.

2. Data Manipulation

  • Dplyr: It is used for data manipulation like subsetting, provides shortcuts to access data and generates sql queries.
  • Tidyr – It is used to convert data into tiny formats.
  • stringr– manipulate string expressions and character strings.
  • lubridate- To work with data and time.

3. Data Visualization

  • Rgl: To work on 3D visualizations.
  • ggvis: To create and build grammar of graphics.
  • googlevis: To use google visualization tools in R.

4. Web-Based Packages

  1. XML: To read and write XML documents in R.
  2. Httpr: Work with http connections.
  3. Jsonlite: To read json data tables.

Obtaining R Packages

We can check the available packages that are present in R by using the below code.

  • available.packages(): There are approximately 5200 packages available in the CRAN network.

CRAN has task views which groups packages under a particular topic.

Installing R Packages

We can install packages directly through IDE or through commands. To install packages we use the below function and specify the package name.

Syntax:

install.packages()

Code:

install.packages(“ggplot2”)

The above code installs the ggplot2 package and its dependent packages if any.

We can install several packages at a time by specifying the package’s names under a character vector.

Syntax:

install.packages(c(“package 1”,”package 2”,”package 3”))

Code:

install.packages(c(“ggplot2”,”slidify”,”deplyr”))

Installing using R Studio

The advantage of using an R studio is it is GUI (Graphical User interface). We can choose the packages to install and the source of it.

We can go to tools -> Install packages.

Loading R Packages

After installing the R package we need to load them into R, to start making use of the installed packages.

We use the below function to load the packages.

Syntax:

library(package name)Note: The package name need not be given in quotes.

Code:

library(ggplot2)

There are certain packages that display messages when loaded. Some of them, don’t. We can see the details of the library installed with the help of the below code.

Code:

library(ggplot2)
search()

Output:

“package:lattice”    “package:ggplot2”    “package:makeslides”

“package:knitr”      “package:slidify”    “tools:rstudio”

Creating Your own Package

Before we create our own package. We should keep the below checklist in our mind before we proceed to create a package.

  • Organizing the code is one of the most important things while writing code in the package. We lose half the time searching for the code location instead of improving the code. Put all the files in a folder that is easily accessible.
  • Documenting the code helps you understand the purpose of code. When we don’t revisit the code often, we forget why we have written the code in a certain way. It can also help people to understand your code better when shared with them.
  • Sharing the scripts through email has become archaic. The easy way is to upload your code and distribute it on GitHub. It is possible you get feedback that can help you enhance the code.

To create your own package, we have to install the devtools package.

Code:

install.packages("devtools")

To help with the documentation we can use the below package.

Code:

install.packages("roxygen2")

After installing the package devtools. You can create your own package.

Code:

devtools::create ("packagename")

In the place of “packagename”, you can give the name you wish. You can now add your functions under this package.

You can create the same filename as your function name.

Syntax:

Devtools:create(“firstpackage”)

Distributing Package

You can distribute your package on github by using the devtools package.

We use the below code to distribute our package on github.

Code:

devtools::install_github("yourusername/firstpackage")

You can give your github username and package name you have created above.

Here are the Required Files for a Package

  • Functions
  • Documentation
  • Data
]]>
724