**Data Types**

Unlike SAS and SPSS, R has several different data types (structures) including vectors, factors, data frames, matrices, arrays, and lists. The data frame is most like a dataset in SAS.

**1. Vectors**

A vector is an object that contains a set of values called its elements.

**Numeric vector**

x <- c(1,2,3,4,5,6)

*The operator <– is equivalent to “=” sign.*

**Character vector**

State <- c(“DL”, “MU”, “NY”, “DL”, “NY”, “MU”)

*To calculate frequency for State vector, you can use table function.*

*To calculate mean for a vector, you can use mean function.*

*Since the above vector contains a NA (not available) value, the mean function returns NA.**To calculate mean for a vector excluding NA values, you can include *

**na.rm = TRUE**parameter in mean*function.*

You can use subscripts to refer elements of a vector.

**Convert a column “x” to numeric**

data$x = as.numeric(data$x)

**2. Factors**

R has a special data structure to store ** categorical variables**. It tells R that a variable is nominal or ordinal by making it a factor.

Simplest form of the factor function :

Ideal form of the factor function :

The factor function has three parameters:

- Vector Name
- Values (Optional)
- Value labels (Optional)

**Convert a column “x” to factor**

data$x = as.factor(data$x)

**3. ****Matrices**

All values in columns in a matrix must have the same mode (numeric, character, etc.) and the same length.

The** cbind **function joins columns together into a matrix. See the usage below

The numbers to the left side in brackets are the row numbers. The form [1, ] means that it is row number one and the blank following the comma means that R has displayed all the columns.

To see dimension of the matrix, you can use **dim** function.

To see correlation of the matrix, you can use **cor** function.

You can use subscripts to identify rows or columns.

**4. Arrays**

Arrays are similar to matrices but can have more than two dimensions.

**5. Data Frames**

A data frame is similar to SAS and SPSS datasets. It contains variables and records.

It is more general than a matrix, in that different columns can have different modes (numeric, character, factor, etc.

The **data.frame** function is used to combine variables (vectors and factors) into a data frame.

**6. Lists**

A list allows you to store a variety of objects.

You can use subscripts to select the specific component of the list.

**How to know data type of a column**

1. **‘class’** is a property assigned to an object that determines how generic functions operate with it. It is not a mutually exclusive classification.

2. **‘mode’ **is a mutually exclusive classification of objects according to their basic structure. The ‘atomic’ modes are numeric, complex, charcter and logical.

> x <- 1:16

> x <- factor(x)

> class(x)

[1] “factor”

> mode(x)

[1] “numeric”