Econ 265: Introduction to Econometrics

Lecture 2: Getting Started with R

Moshi Alam

Prerequisites

  • R and RStudio is installed
  • Update R packages regularly
  • Required packages for today:
    • Base R (primary focus)

Today’s Agenda

  • Live coding
  • Focus on typing commands yourself
  • Avoid copy-paste to build muscle memory

Note

This is not a course in R!

RStudio Interface

RStudio Interface

Basic Operations

Basic Arithmetic

# Addition
1 + 2

# Subtraction
6 - 7

# Division
5 / 2

# Exponentiation
2^3

# Integer division
100 %/% 60  # How many whole hours in 100 minutes?

# Modulo (remainder)
100 %% 60   # How many minutes are left over?

Logic Operations

# Basic comparisons
1 > 2
1 > 2 & 1 > 0.5  # AND
1 > 2 | 1 > 0.5  # OR

# Truth testing
isTRUE(1 < 2)

Important Operators:

  • >, <: Greater/less than
  • >=, <=: Greater/less equal
  • ==: Equality test
  • !=: Not equal
  • &: AND
  • |: OR

Logic: Important Details

Order of Precedence:

  1. Logical operators (>, <, etc.)
  2. Boolean operators (&, |)
# Be explicit!
1 > 0.5 & 1 > 2

# Not
1 > 0.5 & 2  # Can be confusing

Floating Point Numbers:

# This returns FALSE!
0.1 + 0.2 == 0.3

# Use instead:
all.equal(0.1 + 0.2, 0.3)

Value Matching

# Check if value is in a vector
4 %in% 1:10

# Create a "not in" operator
`%ni%` = Negate(`%in%`)
4 %ni% 5:10

Object-Oriented Programming

Everything is an Object

Common object types in R:

  • Vectors
  • Matrices
  • Data frames
  • Lists
  • Functions
# Create different objects
vec <- 1:5
mat <- matrix(1:9, nrow=3)
df <- data.frame(x=1:3, y=4:6)

Objects Have Classes

# Create data frame
d <- data.frame(
  x = 1:2,
  y = 3:4
)

# Check properties
class(d)
typeof(d)
str(d)

Understanding Objects:

  • class(): Object’s class
  • typeof(): Object’s type
  • str(): Object’s structure
  • View(): Interactive viewing

Global Environment

# Create data frame
d <- data.frame(
  x = 1:2,
  y = 3:4
)

# This fails:
lm(y ~ x)

# This works:
lm(y ~ x, data = d)

Key Points:

  • Objects live in the global environment
  • Must specify data source
  • Different from Stata’s single-dataset approach
  • Multiple objects can exist simultaneously

Assignment

Assignment Operators

Using Arrow (<-):

# Standard assignment
a <- 10 + 5

# Right assignment
10 + 5 -> a

Embodies the idea of assigned to

Using Equals (=):

# Alternative assignment
b = 10 + 10

# Must be on left with =
# This won't work:
# 10 + 10 = b

Assignment Choice

  • Most R users prefer <- for assignment
  • = has specific role in function evaluation
  • Personal choice, but be consistent
  • = is quicker to type and familiar from other languages

Programming Basics

Variables and Assignment

# Variable assignment
x <- 1
y <- "roses"
z <- function(x) { sqrt(x) }

# Using variables
z(9)
[1] 3

Control Flow: if/else

x <- 5
y <- 3

if (x > y) {
  print("x is larger than y")
} else {
  print("x is less than or equal to y")
}
[1] "x is larger than y"

Loops

# For loop
for(i in 1:5) {
  print(paste("Iteration", i))
}
[1] "Iteration 1"
[1] "Iteration 2"
[1] "Iteration 3"
[1] "Iteration 4"
[1] "Iteration 5"
# Loop with conditions
for(fruit in c("mangos","bananas","apples")) {
  print(paste("I love", fruit))
}
[1] "I love mangos"
[1] "I love bananas"
[1] "I love apples"

Functions

# Define function
greet <- function(name = "Lord Vader") {
  paste("Hello,", name)
}

# Use function
greet()
[1] "Hello, Lord Vader"
greet("Luke")
[1] "Hello, Luke"

Working with Variables

Creating Variables

# Price of a good
price <- 50

# Quantity sold
quantity <- 10

# Calculate revenue
revenue <- price * quantity
revenue
[1] 500

Multiple Assignments

# Assign same value to multiple variables
x <- y <- 5

# Check values
x
[1] 5
y
[1] 5

Data Types

Numeric Data

# GDP growth rate
gdp_growth <- 2.5
class(gdp_growth)
[1] "numeric"
# Population (whole number)
population <- 1000L  # L makes it integer
class(population)
[1] "integer"

Text and Logical Data

# Country name (character/string)
country <- "United States"
class(country)
[1] "character"
# Logical (TRUE/FALSE)
is_developed <- TRUE
class(is_developed)
[1] "logical"

Namespace Issues

Reserved Words

Strictly Reserved:

  • if
  • else
  • while
  • function
  • for
  • TRUE/FALSE
  • NULL
  • Inf
  • NA

Semi-Reserved:

  • c() (concatenate)
  • pi
  • Many function names
# Don't do this!
pi <- 2
c <- 4

Namespace Conflicts

library(dplyr)

# Shows conflicts:
# filter masked from 'package:stats'
# lag masked from 'package:stats'

Two solutions:

  1. Use package::function()
stats::filter(1:10, rep(1, 2))
  1. Assign permanently
filter <- stats::filter

Indexing

Using Square Brackets

Basic Indexing:

# Vector indexing
a <- 1:10
a[4]        # 4th element
a[c(4, 6)]  # 4th and 6th

# Matrix/dataframe
d[1, 1]     # First row & column

List Indexing:

my_list <- list(
  a = "hello",
  b = 1:3
)

my_list[[1]]     # First element
my_list[[2]][3]  # Third item of second element

Using Dollar Sign

# List example
my_list$a           # Access 'a' element
my_list$b[3]        # Third item of 'b'

# Data frame
starwars$name[1]    # First name

Cleaning Up

Removing Objects

# Remove specific objects
rm(a, b)

# Remove all objects (not recommended)
rm(list = ls())

# Detach package
detach(package:dplyr)

# Clear plots
dev.off()

Tip

Better to restart R session than use rm(list = ls())

Data Structures

Overview of Data Structures

R has several basic data structures:

Dimension Homogeneous Heterogeneous
1 Vector List
2 Matrix Data Frame
3+ Array nested Lists

Vectors

Vectors are containers for objects of identical type:

# Create vectors with c()
x <- c(1, 3, 5, 7, 8, 9)
print(x)
[1] 1 3 5 7 8 9
# Create sequence
y <- 1:10
print(y)
 [1]  1  2  3  4  5  6  7  8  9 10
# Repeat values
rep("A", times = 5)
[1] "A" "A" "A" "A" "A"

Vector Operations

# Subsetting vectors
x <- c(1, 3, 5, 7, 8, 9)
x[1]      # First element
[1] 1
x[1:3]    # First three elements
[1] 1 3 5
x[c(1,3,4)] # Selected elements
[1] 1 5 7

Vector Logic

x <- c(1, 3, 5, 7, 8, 9)
# Logical operations
x > 3
[1] FALSE FALSE  TRUE  TRUE  TRUE  TRUE
x[x > 3]  # Subsetting with logic
[1] 5 7 8 9
sum(x > 3) # Count values > 3
[1] 4

Matrices

Two-dimensional arrays with same data type:

# Create matrix
X <- matrix(1:9, nrow = 3, ncol = 3)
print(X)
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9
# Matrix by rows
Y <- matrix(1:9, nrow = 3, ncol = 3, byrow = TRUE)
print(Y)
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9

Matrix Operations

# Matrix arithmetic
X + Y  # Element-wise addition
     [,1] [,2] [,3]
[1,]    2    6   10
[2,]    6   10   14
[3,]   10   14   18
X * Y  # Element-wise multiplication
     [,1] [,2] [,3]
[1,]    1    8   21
[2,]    8   25   48
[3,]   21   48   81
X %*% Y  # Matrix multiplication
     [,1] [,2] [,3]
[1,]   66   78   90
[2,]   78   93  108
[3,]   90  108  126
# Subsetting
X[1, 2]  # Element at row 1, column 2
[1] 4
X[1, ]   # First row
[1] 1 4 7
X[, 1]   # First column
[1] 1 2 3

Lists

Creating Lists

Lists can contain elements of different types:

# Create a list
ex_list <- list(
  a = c(1, 2, 3, 4),
  b = TRUE,
  c = "Hello!",
  d = matrix(1:4, 2, 2)
)
print(ex_list)
$a
[1] 1 2 3 4

$b
[1] TRUE

$c
[1] "Hello!"

$d
     [,1] [,2]
[1,]    1    3
[2,]    2    4

List Operations

# Access list elements
ex_list$a  # Using $
[1] 1 2 3 4
ex_list[[1]]  # Using [[]]
[1] 1 2 3 4
ex_list["a"]  # Using []
$a
[1] 1 2 3 4
# Add new elements
ex_list$e <- "New element"

Packages

Installing Packages

# Install a package
install.packages("tidyverse")

# Load the package
library(tidyverse)

Note

You only need to install a package once, but you need to load it each session

Help System

Getting Help

Basic Help:

# Full help
help(plot)

# Shorthand
?plot

# Examples
example(plot)

Package Help:

# Package vignettes
vignette("dplyr")

# List all vignettes
vignette(all = FALSE)

# Package demos
demo(package = "graphics")

Data Frames

Introduction to Data Frames

Data frames are table-like structures:

# Create a data frame
df <- data.frame(
  id = 1:3,
  name = c("John", "Jane", "Bob"),
  score = c(85, 92, 78)
)
print(df)
  id name score
1  1 John    85
2  2 Jane    92
3  3  Bob    78

Data Frame Operations

# Basic operations
names(df)  # Column names
[1] "id"    "name"  "score"
nrow(df)   # Number of rows
[1] 3
ncol(df)   # Number of columns
[1] 3
# Access columns
df$name
[1] "John" "Jane" "Bob" 
df[["score"]]
[1] 85 92 78

Subsetting Data Frames

# Subset rows
df[df$score > 80, ]
  id name score
1  1 John    85
2  2 Jane    92
# Subset columns
df[, c("name", "score")]
  name score
1 John    85
2 Jane    92
3  Bob    78
# Using subset()
subset(df, score > 80, select = c("name", "score"))
  name score
1 John    85
2 Jane    92

Additional Resources

Practice on your own

  1. Create vectors using different methods (c(), :, seq(), rep())
  2. Practice logical operations and understand operator precedence
  3. Create a list with different types of elements and practice indexing
  4. Load a package and resolve a namespace conflict
  5. Create and remove objects from your environment

Questions?