R Data Types

No Comments

List

An immediate thing to remember is, a List is also called a Vector. A list is a collection of numbers, strings. A list is also a collections of lists. See below:

** A collection of numbers **

> a = c(1,2,3,4,5)   
> a   
[1] 1 2 3 4 5

** A collection of strings **

>b = c("Hello", "World!") # Character Vector. 
> b
[1] "Hello" "World!"

**A Collection of logical values **

>l = c(TRUE, TRUE, FALSE, FALSE, TRUE)  #Logical Vector
> l
[1] TRUE TRUE FALSE FALSE TRUE

** A collection of lists, strings, and numbers **

>f = c(a, b, 'Bingo') 
> f 
[1] "1"      "2"      "3"      "4"      "5"      "Hello"  "World!" "Bingo" 

> d = list(a, b, f)
> d
[[1]]
[1] 1 2 3 4 5

[[2]]
[1] "Hello" "World!"

[[3]]
[1] "1"      "2"      "3"      "4"      "5"      "Hello"  "World!" "Bingo" 

> e = list(First=1, Last='name', age=17.5)
> e
$First
[1] 1

$Last
[1] "name"

$age
[1] 17.5

** Identify the elements of a List using the convention [] or [[]] **

> f[1]
[1] "1"
> d[[1]]
[1] 1 2 3 4 5
> d
[[1]]
[1] 1 2 3 4 5

[[2]]
[1] "Hello"  "World!"

[[3]]
[1] "1"      "2"      "3"      "4"      "5"      "Hello"  "World!" "Bingo" > 

> e$age
[1] 17.5

Note: For strings, both the double and the single quotes work.

 

Factor

Factor is an important object type in R. In order to understand its usage and importance, you must first understand the difference between Categorical and Ordinal variable.

A Categorical variable, also known as Nominal variable, is classified into categories and there is no intrinsic ordering of the categories. For ex. Gender variable has two category values,  Male, Female, and but there is no agreed way of ordering the variables. Hair Color variable has different category values but cannot be ordered. Hence a categorical variable only allows you to assign a category but does not order the values in the ascending or descending order.

An Ordinal variable is similar to the Categorical variable, but the difference is Ordinal variable can be arranged in an order. For ex. Volume variable has the categories Low,Medium,High and can be arranged in an order.

In R terminology, the different categories are called Factors or Levels. A nominal factor is used to represent a nominal variable and an ordered factor is used to represent a ordinal variable.

R stores the nominal values as a vector of integers from [1..n], n being the total number of unique values in nominal variable, and an internal vector of character strings, which are the original values, is mapped to these integers.

The number of times a factor or a level appears for a given variable determines their frequency.

** hairColor as a nominal variable **

>hairColor = c(rep("Black", 5), rep("Brown", 10))
> hairColor
[1] "Black" "Black" "Black" "Black" "Black" "Brown" "Brown" "Brown" "Brown"
[10] "Brown" "Brown" "Brown" "Brown" "Brown" "Brown"
> hairColor = factor(hairColor)
> hairColor
[1] Black Black Black Black Black Brown Brown Brown Brown Brown Brown Brown
[13] Brown Brown Brown
Levels: Black Brown

> summary(hairColor)
Black Brown 
    5    10 
> attributes(hairColor)
$levels
[1] "Black" "Brown"

$class
[1] "factor"
#Internally R alphabetically assigned the values 1 = Black and 2 = Brown
#R now treats “hairColor” as a nominal variable

** volume as a ordinal variable **

>volume = c("small", "medium", "large")
> volume
[1] "small"  "medium" "large" 
> volume = ordered(volume)
> volume
[1] small  medium large 
Levels: large < medium < small
> attributes(volume)
$levels
[1] "large"  "medium" "small" 

$class
[1] "ordered" "factor" 
# R reorders the variable and associates 1 = large, 2 = medium, 3 = small 
# R now treats "volume" as a ordinal variable 

 

Data Frame

A Data Frame is like a matrix with rows and columns and is a collection of vectors. While each column has the same mode (numeric, character, factor, logical, etc.) and same length, different columns have different modes of data.

>a = c(1,2,3,4)
> b = c("Jack", "Jill", "Peter", "Mickey")
> c = c(TRUE, TRUE, FALSE, TRUE)
> dframe = data.frame(a,b,c)
> dframe
 a      b     c
1 1   Jack  TRUE
2 2   Jill  TRUE
3 3  Peter FALSE
4 4 Mickey  TRUE

> attributes(dframe)
$names
[1] "a" "b" "c"

$row.names
[1] 1 2 3 4

$class
[1] "data.frame"

> names(dframe) = c("ID", "FName", "Bool")
> names(dframe)
[1] "ID"    "FName" "Bool" 
> summary(dframe)
            ID          FName      Bool        
 Min.   :1.00   Jack  :1   Mode :logical  
 1st Qu.:1.75   Jill  :1   FALSE:1        
 Median :2.50   Mickey:1   TRUE :3        
 Mean   :2.50   Peter :1   NA's :0        
 3rd Qu.:3.25                             
 Max.   :4.00    

> dframe$ID
[1] 1 2 3 4
> dframe$FName
[1] Jack   Jill   Peter  Mickey
Levels: Jack Jill Mickey Peter
#FName is a Nominal variable
> dframe$Bool
[1]  TRUE  TRUE FALSE  TRUE

 

 

Tables

One way to create a table is using the table command.

One Way Table – A table with one row

From the above variable, hairColor

>hairColor = table(hairColor)
hairColor
Black Brown 
    5    10  <p>1&amp;#160; 1</p> <p>
>$dim
[1] 2

$dimnames
$dimnames$hairColor
[1] "Black" "Brown"


$class
[1] "table"
>summary(hairColor)
Number of cases in table: 15 
Number of factors: 1 

 

Another way to create a one way table is to create a matrix of numbers. The format to create a matrix is

 

mymatrix <- matrix(vector, nrow=r, ncol=c, byrow=FALSE, dimnames=list(char_vector_rownames, char_vector_colnames))

> mymatrix = matrix(c(2,3,4), ncol=3, byrow=TRUE) 
> colnames(mymatrix) = c("C1","C2","C3") 
> mymatrix
    C1 C2 C3
[1,]  2  3  4

 

Two way Table – A table with more than one row Note: All arguments must have same length

 

> color = c("red", "red", "blue", "pink") 
> shade = c("light", "dark", "dark", "light")
> mytable = table(color, shade)
> mytable
      shade
color  dark light
  blue    1     0
  pink    0     1
  red     1     1

Creating a table using a matrix of numbers

> myvector = c(1:20) 
> rownames = c("r1", "r2", "r3", "r4")
> colnames = c("c1", "c2", "c3", "c4", "c5") 
> mymatrix = matrix(myvector, ncol=5, byrow=TRUE, dimnames=list(rownames, colnames)) 
> mymatrix
  c1 c2 c3 c4 c5
r1  1  2  3  4  5
r2  6  7  8  9 10
r3 11 12 13 14 15
r4 16 17 18 19 20

 

Array

Arrays are similar to matrices but can have more than two dimensions. At the prompt, type help(array) for more information.

Categories: Data Types

Leave a Reply

Your email address will not be published. Required fields are marked *