Category: R-Stats

R Packages

 

  1. Hmiscrcorr() function in the package produces correlations/covariances and significance levels for pearson and spearman correlations. Input must be a matrix and pairwise deletion is used.
  2. pastecs
  3. psychThe package facilitates the summing the statistics by grouping variable
  4. doBy
  5. corrgramContains correlation visualization function
  6. latticeProvides options to condition the scatterplot matrix on a factor
  7. gclusProvides options to rearrange the variables so that the higher corralations are closer to the principal diagonal
  8. scatterplot3d
  9. rgl Creates interactive and spinning 3d scatterplots

Others R descriptive statistics functions include sapply(), fivenum(), summary()

corrgram(), pairs(), splom() are correlation visualizing functions

Categories: Packages, R-Stats

R-Color Palettes

No Comments

R has many built-in color palettes. They can be used as graphical parameters as below.

  1. col = rainbow(n)
  2. col = heat.colors(n)
  3. col = topo.colors(n)
  4. col – terrain.colors(n)
  5. col = cm.colors(n)

n is the size of the values being plotted. For more info enter the following help commands:

1. help(package=colorspace)

2. ?rainbow

Categories: Graphics, R-Stats

Connecting R to Oracle Database

No Comments

The following three packages are required to connect to Oracle Database:

  1. DBI
  2. rJava
  3. RJDBC
    The  next steps:
    1. Need appropriate Oracle JDBC Drivers compatible with the Java version you like to support.
    2. To connect to the source database you need the following: Hostname or IP, Port, Service name or SID, Username, Password
    3. Use the information from Number 2 to construct a Data Source Name or DSN in the format:

    jdbc:oracle:thin:@hostname:port:sid

    Now, execute the following steps to connect to the database:
                                       

library(RJDBC)
# Create connection driver
jdbcDriver = JDBC(driverClass="oracle.jdbc.OracleDriver",
classPath="lib/classes12.jar")

#Open connection
jdbcConn = dbConnect(jdbcDriver,"jdbc:oracle:thin:@database.hostname.com:port:sid",
"username","password")

#Close connection
dbDisconnect(jdbcConn)
Categories: Data Sources Tags: Tags: , ,

R Data Types

No Comments

List

An immediate thing to remember is, a List is also called a Vector. A list is a collection of numbers, strings. A list is also a collections of lists. See below:

** A collection of numbers **

> a = c(1,2,3,4,5)   
> a   
[1] 1 2 3 4 5

** A collection of strings **

>b = c("Hello", "World!") # Character Vector. 
> b
[1] "Hello" "World!"

**A Collection of logical values **

>l = c(TRUE, TRUE, FALSE, FALSE, TRUE)  #Logical Vector
> l
[1] TRUE TRUE FALSE FALSE TRUE

** A collection of lists, strings, and numbers **

>f = c(a, b, 'Bingo') 
> f 
[1] "1"      "2"      "3"      "4"      "5"      "Hello"  "World!" "Bingo" 

> d = list(a, b, f)
> d
[[1]]
[1] 1 2 3 4 5

[[2]]
[1] "Hello" "World!"

[[3]]
[1] "1"      "2"      "3"      "4"      "5"      "Hello"  "World!" "Bingo" 

> e = list(First=1, Last='name', age=17.5)
> e
$First
[1] 1

$Last
[1] "name"

$age
[1] 17.5

** Identify the elements of a List using the convention [] or [[]] **

> f[1]
[1] "1"
> d[[1]]
[1] 1 2 3 4 5
> d
[[1]]
[1] 1 2 3 4 5

[[2]]
[1] "Hello"  "World!"

[[3]]
[1] "1"      "2"      "3"      "4"      "5"      "Hello"  "World!" "Bingo" > 

> e$age
[1] 17.5

Note: For strings, both the double and the single quotes work.

 

Factor

Factor is an important object type in R. In order to understand its usage and importance, you must first understand the difference between Categorical and Ordinal variable.

A Categorical variable, also known as Nominal variable, is classified into categories and there is no intrinsic ordering of the categories. For ex. Gender variable has two category values,  Male, Female, and but there is no agreed way of ordering the variables. Hair Color variable has different category values but cannot be ordered. Hence a categorical variable only allows you to assign a category but does not order the values in the ascending or descending order.

An Ordinal variable is similar to the Categorical variable, but the difference is Ordinal variable can be arranged in an order. For ex. Volume variable has the categories Low,Medium,High and can be arranged in an order.

In R terminology, the different categories are called Factors or Levels. A nominal factor is used to represent a nominal variable and an ordered factor is used to represent a ordinal variable.

R stores the nominal values as a vector of integers from [1..n], n being the total number of unique values in nominal variable, and an internal vector of character strings, which are the original values, is mapped to these integers.

The number of times a factor or a level appears for a given variable determines their frequency.

** hairColor as a nominal variable **

>hairColor = c(rep("Black", 5), rep("Brown", 10))
> hairColor
[1] "Black" "Black" "Black" "Black" "Black" "Brown" "Brown" "Brown" "Brown"
[10] "Brown" "Brown" "Brown" "Brown" "Brown" "Brown"
> hairColor = factor(hairColor)
> hairColor
[1] Black Black Black Black Black Brown Brown Brown Brown Brown Brown Brown
[13] Brown Brown Brown
Levels: Black Brown

> summary(hairColor)
Black Brown 
    5    10 
> attributes(hairColor)
$levels
[1] "Black" "Brown"

$class
[1] "factor"
#Internally R alphabetically assigned the values 1 = Black and 2 = Brown
#R now treats “hairColor” as a nominal variable

** volume as a ordinal variable **

>volume = c("small", "medium", "large")
> volume
[1] "small"  "medium" "large" 
> volume = ordered(volume)
> volume
[1] small  medium large 
Levels: large < medium < small
> attributes(volume)
$levels
[1] "large"  "medium" "small" 

$class
[1] "ordered" "factor" 
# R reorders the variable and associates 1 = large, 2 = medium, 3 = small 
# R now treats "volume" as a ordinal variable 

 

Data Frame

A Data Frame is like a matrix with rows and columns and is a collection of vectors. While each column has the same mode (numeric, character, factor, logical, etc.) and same length, different columns have different modes of data.

>a = c(1,2,3,4)
> b = c("Jack", "Jill", "Peter", "Mickey")
> c = c(TRUE, TRUE, FALSE, TRUE)
> dframe = data.frame(a,b,c)
> dframe
 a      b     c
1 1   Jack  TRUE
2 2   Jill  TRUE
3 3  Peter FALSE
4 4 Mickey  TRUE

> attributes(dframe)
$names
[1] "a" "b" "c"

$row.names
[1] 1 2 3 4

$class
[1] "data.frame"

> names(dframe) = c("ID", "FName", "Bool")
> names(dframe)
[1] "ID"    "FName" "Bool" 
> summary(dframe)
            ID          FName      Bool        
 Min.   :1.00   Jack  :1   Mode :logical  
 1st Qu.:1.75   Jill  :1   FALSE:1        
 Median :2.50   Mickey:1   TRUE :3        
 Mean   :2.50   Peter :1   NA's :0        
 3rd Qu.:3.25                             
 Max.   :4.00    

> dframe$ID
[1] 1 2 3 4
> dframe$FName
[1] Jack   Jill   Peter  Mickey
Levels: Jack Jill Mickey Peter
#FName is a Nominal variable
> dframe$Bool
[1]  TRUE  TRUE FALSE  TRUE

 

 

Tables

One way to create a table is using the table command.

One Way Table – A table with one row

From the above variable, hairColor

>hairColor = table(hairColor)
hairColor
Black Brown 
    5    10  <p>1&amp;#160; 1</p> <p>
>$dim
[1] 2

$dimnames
$dimnames$hairColor
[1] "Black" "Brown"


$class
[1] "table"
>summary(hairColor)
Number of cases in table: 15 
Number of factors: 1 

 

Another way to create a one way table is to create a matrix of numbers. The format to create a matrix is

 

mymatrix <- matrix(vector, nrow=r, ncol=c, byrow=FALSE, dimnames=list(char_vector_rownames, char_vector_colnames))

> mymatrix = matrix(c(2,3,4), ncol=3, byrow=TRUE) 
> colnames(mymatrix) = c("C1","C2","C3") 
> mymatrix
    C1 C2 C3
[1,]  2  3  4

 

Two way Table – A table with more than one row Note: All arguments must have same length

 

> color = c("red", "red", "blue", "pink") 
> shade = c("light", "dark", "dark", "light")
> mytable = table(color, shade)
> mytable
      shade
color  dark light
  blue    1     0
  pink    0     1
  red     1     1

Creating a table using a matrix of numbers

> myvector = c(1:20) 
> rownames = c("r1", "r2", "r3", "r4")
> colnames = c("c1", "c2", "c3", "c4", "c5") 
> mymatrix = matrix(myvector, ncol=5, byrow=TRUE, dimnames=list(rownames, colnames)) 
> mymatrix
  c1 c2 c3 c4 c5
r1  1  2  3  4  5
r2  6  7  8  9 10
r3 11 12 13 14 15
r4 16 17 18 19 20

 

Array

Arrays are similar to matrices but can have more than two dimensions. At the prompt, type help(array) for more information.

Categories: Data Types

Customize Startup

No Comments

There are a lot of ways you can customize the R startup. To begin with I kept it simple as highlighted in blue. The customization can be done via the initialization file called Rprofile.site. R always sources this file first and is found under the install directory’s \etc folder. I’ve added two functions .First() and .Last(). .First() executes at the start of the R session and .Last() executes at the end of the session. The changes made to the Rprofile.site as below.

# Things you might want to change

# options(papersize="a4")
# options(editor="notepad")
# options(pager="internal")

# set the default help type
# options(help_type="text")
  options(help_type="html")

# set a site library
# .Library.site <- file.path(chartr("\\", "/", R.home()), "site-library")

# set a CRAN mirror
# local({r <- getOption("repos")
#       r["CRAN"] <- "http://my.local.cran"
#       options(repos=r)})

# Give a fortune cookie, but only to interactive sessions
# (This would need the fortunes package to be installed.)
#  if (interactive())
#    fortunes::fortune()

.First <- function(){
library(Hmisc)
library(R2HTML)
library(Rcmdr)
setwd("C:/Softwares/RWorkspace")
load(".RData")
cat("\nWelcome at", date(), "\n")
cat("\nYour Current Working Directory is:", getwd(), "\n")
}

.Last <- function(){
save.image()
savehistory(".Rhistory")
cat("\nGoodbye at ", date(), "\n")
}

Workspace

No Comments

The Workspace is the R working environment which contains, user-defined objects. The object can be lists, vectors, matrices, data frames, functions.

Note: The following are the different ways to specify the path on Windows –

    # R Accepts Unix style path specification as below

    C:/RWorkspace/MyFile.txt 

    #R sees a single back slash as an escape character, hence for Windows style specification you needs to specify the path as below

C:\\RWorkspace\\MyFile.txt

The following are the list of commands to manage your workspace. As you enter the commands you can use the Up and Down arrow keys to scroll through the command history.

 

      1. getwd()                                     # Get current working directory
      2. setwd(“C:/RWorkspace”)           # Set current working directory
      3. ls()                                            # List the objects in the current workspace
      4. help(“ggplot2”)                         # Display info on the search string. In this case ggplot2.

    5. options()                                   # Display the options set for the current session

    6. history()                                    # Display last 25 commands

    7. history(max.show=Inf)             # Display all previous commands

    8. savehistory(file=”MyFile”)         # Default file is “.Rhistory”

    9. loadhistory(file=”MyFile”)          # Default file is “.Rhistory”

    10. save.image()                           # Save the workspace to the default “.RData”

    11. save(x,y, file=”MyFile.RData”) # Save specific object(s) to a file. “x” and “y” are objects

    12. load(“MyFile.RData”)               # Load the workspace into the current session

    13. library(“packagename”)          # Loads installed packages. For ex. ggplot2,  Rcmdr.

    14. q()                                           # Quit R with a prompt to save the workspace

        

    Note that R is case-sensitive, and assumes the current working directory if path is not specified.

Categories: Interface, Workspace

Quick Installation of R and R Commander on Windows

No Comments

The following is a quick four step installation for R and RCommander:

1. Download R for windows here.

2. To install R Commander, at the R command prompt issue the command:

      install.packages(‘Rcmdr’, dependencies=TRUE)

3. R Commander works best with single-document Rgui interface (SDI). Since the default installation is multiple-document interface (MDI), you can change the Rconsole file in R’s /etc directory to enable SDI. Look for the lines below in the file. 

     ## Style
    # This can be `yes’ (for MDI) or `no’ (for SDI).
    #MDI = yes
    MDI = no

4. To load R Commander issue the command at the R command prompt : 

     library(Rcmdr)