R Packages – Shishir Kant Singh https://shishirkant.com Jada Sir जाड़ा सर :) Sun, 24 May 2020 17:14:26 +0000 en-US hourly 1 https://wordpress.org/?v=6.8.1 https://shishirkant.com/wp-content/uploads/2020/05/cropped-shishir-32x32.jpg R Packages – Shishir Kant Singh https://shishirkant.com 32 32 187312365 R – List of Packages https://shishirkant.com/r-list-of-packages/?utm_source=rss&utm_medium=rss&utm_campaign=r-list-of-packages Sun, 24 May 2020 17:14:22 +0000 http://shishirkant.com/?p=727 Introduction to List of R Packages

A package in R programming language is a unit that provides required functionalities that can be utilized by loading it into the R environment. A list of R Packages is similar to a library in C, C++ or Java. So, essentially, a package can have numerous functionalities like functions, constants, etc. that we will allow the user to utilize them in the context of a particular problem. In R, a requisite package can be loaded using library() function. In case, a package is not present, then it can be installed using the install.packages() function. Packages make seemingly difficult tasks easy through its ready-made functionalities.

What are R Packages?

There are many packages in R, and the selection of a package depends on its application. Though there are certain packages that are widely used due to the functionalities they provide, it isn’t the case that other packages are less important. Different packages have different purposes; some are related to statistical techniques, some pertain to visualizations, etc.

In the following section, we will look at some of the important packages in R:

1. Car

This package is Companion to Applied Regression. It is a big package that provides various functionalities for statistical analysis. Importing this package into the R environment imports other related packages such as MASS, stats, graphics, etc. Some of the functions in the package include Anova, avPlots, Boxplot, carPalette, density plots, infIndexPlot, linear hypothesis, logit, outlier test, qqPlot, residual plots, scatterplot, scatterplot matrix, etc. The extensive capabilities of the package can be gauged from the number of functions it provides.

2. Corrplot

The package provides a graphical display of a correlation matrix and a confidence interval. The package also provides algorithms to perform matrix reordering. Numerous options include choosing requisite colors, text labels, color labels, layout, etc.  Various visualization methods or parameter methods in corrplot package are “circle”, “square”, “ellipse”, “number”, “shade”, “color”, and “pie”. The corrplot function incorporating various options gives a visually appealing representation of correlation amongst different variables, which, otherwise, in normal circumstances, like numbers, are difficult to interpret. Positive correlations are displayed in blue and negative correlations in red. The intensity of color and the size of the circle are proportional to the correlation coefficients.

3. DataExplorer

This package deals with automated data exploration and treatment. It provides an automated data exploration process meant for analytic tasks and predictive modeling. This is crucial as it enables the user to understand data and extract insights. Each variable in the analysis is scanned and analyzed by the package. Further, the package provides functionalities for visualization of these variables using typical graphical techniques. It also provides common data processing methods for treat and format data.

4. Gmodels

The gmodels package provides various tools in R for plotting data. It contains various functions such as glh.test which is used to test, print, or summarize a general linear hypothesis for a regression model. The function makes. contrasts convert human-readable contrasts into the form that R requires for computation. The matrix returned by make.contrasts can be used as the argument to the contrasts argument of model functions.  The coefFrame function fits a model to each subgroup defined by, then returns a data frame with one row for each fit and one column for each parameter. The estimable function computes and tests contrasts and other estimable linear functions of model coefficients for lm, glm, etc. The function fit.contrast computes and tests arbitrary contrasts for regression objects.

5. Gplots

This package provides visualizations functionalities through multifarious programming tools. The functions in the package work on the concept of calculation and plotting. The graphical capabilities of the package are demonstrated by various functions such as band plot, boxplot2, col2hex, ci2d, hist2d, text plot, sink plot, balloon pilot, plotCI, plot means, etc. These functions enable working with settings related to color, text, and other intricate graphical aspects of the visualization. They also deal with complex elements involved in statistics-based visualization, e.g. lmplot2, residplot functions that enable the user to drive detailed regression diagnosis through diagnostic plots. If multiple data needs to be plotted in the same region, but with separate axes, then this is possible using over plot function in the package.

6. Ggplot2

It is one of the very famous packages in R that provides extensive visual capabilities and presents the results even of complex statistical and mathematical techniques. The numerous functionalities provided by the package enables the analyst to derive insights from data in the most interactive fashion. The R description for the function is “a system for declaratively creating graphics which is based on the Grammar of Graphics”.  This grammar of graphics means that the user has to tell ‘ggplot2’ about the way variables have to be mapped to aesthetics, so this essentially means that specifying what graphical aspects to using, and ggplot2 will work accordingly based on the details.

7. Lubridate

This R package makes it easier to work with dates and times. The lubridate package enables easy manipulation of date and time data. It parses a number and gives suitable data arrangement, in fact, the parse functions in the package handle a wide variety of formats and separators that simplifies the parsing process. One of the notable features is that the package provides functionalities to handle dates with different time zones.

8. Hmisc

Named Harrell Miscellaneous, the Hmisc package contains many functions that can be leveraged for data analysis, high-level graphics and utility operations. It also includes functions for computing sample size and power, importing and annotating datasets, imputing missing values, providing advanced table functionalities, clustering of variables, manipulation of the character string, conversion of R objects to HTML code, etc.

9. Lattice

The package offers a high-level data visualization system that was inspired by Trellis graphics. It emphasizes on multivariate data. The powerful visualization capabilities of the package provide the needed graphical solution. Some of the notable functions in the package are B_07_cloud which helps produce 3d scatter plot and wireframe surface plot; D_level. colors, a function to compute false colors representing numeric or categorical variable; B_06_levelplot, a function that generates level plots and contour plots; A_01_Lattice, a function that provides Lattice Graphical capabilities. B_09_tmd is a function that generates Tukey Mean – Difference Plot; B_11_oneway, a function that fits the One-way Model. The package, thus, provides extensive functionalities for visualizations through various functions.

10. MatrixModels

This package allows modeling with sparse and dense ‘Matrix’ matrices. To accomplish this it uses modular prediction and response, module classes. All the functions provided by the package are equally important, some of which are lm.fit.sparse which is a fitter function for sparse linear models, solveCoef which solves for the coefficients and coefficient increment, model. A matrix that constructs possibly sparse design or model matrices, glm4 which fits generalized linear models.

11. Multcomp

The package allows for multiple comparisons of k groups in generalized linear models. A list of nine standard procedures viz. Dunnet, Tukey, Sequen, AVE, Changepoint, Williams, Marcus, McDermott, and Tetrade, is provided to the user, and the user selects the comparisons based on the requirement. In addition to this, a free input interface is also provided for the contrast matrix which allows for special comparisons. The noteworthy feature is that the comparisons itself are not restricted to any particular design such as balanced or simple, rather the programs are designed in such a manner that they suit multiple comparisons within the general linear model which allows for covariates, correlated means, missing values, etc.

12. OpenMx

This package basically deals with extended structural equation modeling. It provides functionalities to create structural equation models. These models can be manipulated using programming. The models may be specified with matrices or paths such as LISREL or RAM. Some of the types of models include multiple groups, confirmatory factor, mixture distribution, categorical threshold, differential Fit functions, etc.

13. Plyr

It is a very important package that provides functionalities for data manipulation. It provides tools for splitting, applying and combining data. It comes with a set of tools that helps solve a common set of problems. E.g. sometimes we may need to break a big task into smaller tasks that are manageable, then we operate on each of the pieces and then finally, we put all the pieces back together.

14. Qcc

The package acquires significance owing to various quality analysis functionalities that it provides. It provides Shewhart quality control charts for continuous, attributes and counts data. Among other important charts are Cusum and EWMA charts and Operating characteristics curves. It also offers process capability analysis functionality. Pareto chart and cause-and-effect chart and multivariate control charts are useful tools that are provided by the package.

15. RandomForest

As the name suggests, this package is used to build a random forest algorithm. The package implements Breiman’s random forest algorithm, which is based on Beiman and Cutler’s original FORTRAN code. The algorithm is used for classification and regression. The package can also be used in unsupervised mode to assess proximities among data points.

16. Psych

It is a package meant for a special purpose. The package provides a procedure for psychological, psychometric, and personality research. Functions are primarily for multivariate analysis using various multivariate statistical techniques.

]]>
727
R – Packages https://shishirkant.com/r-packages/?utm_source=rss&utm_medium=rss&utm_campaign=r-packages Sun, 24 May 2020 17:11:27 +0000 http://shishirkant.com/?p=724 Introduction to R Packages

R packages are a set of predefined functions as a library to be used while deploying the R program to care reusability and less code approach R programs. R packages are externally developed and can be imported to the R environment in order to use the available function which belongs to that package. R packages are managed by the R community network known as CRAN for providing and provisioning with the R programming language. Apart from the standard R packages, there are several external packages available for use in the R program. One of the popular graphical packages in R is ggplot2.

Where do we Find Packages?

Packages are available on the internet through different sources. However, there are certain trusted repositories from where we can download the packages.

Here are the two important repositories that are available online.

  • CRAN(Comprehensive R Archive Network): This is the official R community with a network of FTP and webservers that contains the latest code and documentation of R. Before, you post your packages online it goes through a series of tests which adheres to CRAN policy.
  • GitHub: GitHub is another famous repository but not specific to R.The online community can share their packages with other people and it is used for version control is well. GitHub is an open-source and doesn’t have any review process.

List of Useful R Packages

There are several packages in R and can be downloaded from CRAN or GitHub. Below are the packages that can be used for specific purposes.

1. Loading the Data from External Sources

  • Haven: R reads and writes data from SAS.
  • DBI: To establish communication between the relational database and R.
  • RSQlite: It is used to read data from relational databases.

2. Data Manipulation

  • Dplyr: It is used for data manipulation like subsetting, provides shortcuts to access data and generates sql queries.
  • Tidyr – It is used to convert data into tiny formats.
  • stringr– manipulate string expressions and character strings.
  • lubridate- To work with data and time.

3. Data Visualization

  • Rgl: To work on 3D visualizations.
  • ggvis: To create and build grammar of graphics.
  • googlevis: To use google visualization tools in R.

4. Web-Based Packages

  1. XML: To read and write XML documents in R.
  2. Httpr: Work with http connections.
  3. Jsonlite: To read json data tables.

Obtaining R Packages

We can check the available packages that are present in R by using the below code.

  • available.packages(): There are approximately 5200 packages available in the CRAN network.

CRAN has task views which groups packages under a particular topic.

Installing R Packages

We can install packages directly through IDE or through commands. To install packages we use the below function and specify the package name.

Syntax:

install.packages()

Code:

install.packages(“ggplot2”)

The above code installs the ggplot2 package and its dependent packages if any.

We can install several packages at a time by specifying the package’s names under a character vector.

Syntax:

install.packages(c(“package 1”,”package 2”,”package 3”))

Code:

install.packages(c(“ggplot2”,”slidify”,”deplyr”))

Installing using R Studio

The advantage of using an R studio is it is GUI (Graphical User interface). We can choose the packages to install and the source of it.

We can go to tools -> Install packages.

Loading R Packages

After installing the R package we need to load them into R, to start making use of the installed packages.

We use the below function to load the packages.

Syntax:

library(package name)Note: The package name need not be given in quotes.

Code:

library(ggplot2)

There are certain packages that display messages when loaded. Some of them, don’t. We can see the details of the library installed with the help of the below code.

Code:

library(ggplot2)
search()

Output:

“package:lattice”    “package:ggplot2”    “package:makeslides”

“package:knitr”      “package:slidify”    “tools:rstudio”

Creating Your own Package

Before we create our own package. We should keep the below checklist in our mind before we proceed to create a package.

  • Organizing the code is one of the most important things while writing code in the package. We lose half the time searching for the code location instead of improving the code. Put all the files in a folder that is easily accessible.
  • Documenting the code helps you understand the purpose of code. When we don’t revisit the code often, we forget why we have written the code in a certain way. It can also help people to understand your code better when shared with them.
  • Sharing the scripts through email has become archaic. The easy way is to upload your code and distribute it on GitHub. It is possible you get feedback that can help you enhance the code.

To create your own package, we have to install the devtools package.

Code:

install.packages("devtools")

To help with the documentation we can use the below package.

Code:

install.packages("roxygen2")

After installing the package devtools. You can create your own package.

Code:

devtools::create ("packagename")

In the place of “packagename”, you can give the name you wish. You can now add your functions under this package.

You can create the same filename as your function name.

Syntax:

Devtools:create(“firstpackage”)

Distributing Package

You can distribute your package on github by using the devtools package.

We use the below code to distribute our package on github.

Code:

devtools::install_github("yourusername/firstpackage")

You can give your github username and package name you have created above.

Here are the Required Files for a Package

  • Functions
  • Documentation
  • Data
]]>
724