Reading a Data Frame or Matrix from a File in R
Reading Data into a Data Frame
The read.table() and read.csv() functions are commonly used to read data into a data frame.
read.table() Function
The read.table() function is versatile and can handle various file formats by specifying parameters.
Basic Usage:
# Read data from a tab-delimited file df <- read.table("data.txt", header=TRUE, sep="\t") print(df)
Parameters:
- file: Path to the file to read.
- header: Logical; TRUE if the first line contains column names.
- sep: The field separator (e.g., “\t” for tab, “,” for comma).
- quote: Character(s) to be treated as quotes (e.g., “” for none).
- stringsAsFactors: Logical; should character vectors be converted to factors?
Example with Specific Delimiter:
# Read data from a comma-separated file df <- read.table("data.csv", header=TRUE, sep=",") print(df)
read.csv() Function
The read.csv() function is a wrapper around read.table() with default settings for comma-separated files.
Basic Usage:
# Read data from a CSV file df <- read.csv("data.csv", header=TRUE) print(df)
Additional Parameters:
- file: Path to the file.
- header: Logical; TRUE if the file has headers.
- sep: Default is “,” for CSV files.
- stringsAsFactors: Logical; default is TRUE (convert strings to factors).
Reading Data into a Matrix
The matrix() function combined with scan() or read.table() can be used to read data into a matrix.
Using scan()
Basic Usage:
# Read a matrix from a space-separated file matrix_data <- matrix(scan("matrix.txt"), nrow=3, byrow=TRUE) print(matrix_data)
Parameters:
- scan() reads the data into a vector, which is then reshaped into a matrix using matrix().
- nrow: Number of rows in the matrix.
- byrow: Logical; if TRUE, fills the matrix by rows.
Using read.table()
Basic Usage:
# Read a matrix from a tab-delimited file matrix_data <- as.matrix(read.table("matrix.txt", header=FALSE, sep="\t")) print(matrix_data)
Parameters:
- header: Logical; FALSE if the file does not have headers.
- sep: The delimiter used in the file.
Additional Options
Reading from Different File Formats
Excel Files: Use the readxl package.
library(readxl) df <- read_excel("data.xlsx") print(df)
JSON Files: Use the jsonlite package.
library(jsonlite) df <- fromJSON("data.json") print(df)
Handling Large Files
fread() from the data.table package: Efficiently handles large files.
library(data.table) df <- fread("large_data.csv") print(df)
File Paths and URLs
Reading from a URL:
df <- read.csv("https://example.com/data.csv") print(df)
Summary
To read a data frame or matrix from a file in R:
- For Data Frames:
- Use read.table() for general text files with custom delimiters.
- Use read.csv() for comma-separated values with default settings.
- For Matrices:
- Use scan() with matrix() for simple text files.
- Use read.table() to directly read into a matrix, converting the data to a matrix format.
- Additional File Formats:
- Use packages like readxl for Excel files, jsonlite for JSON files, and data.table for large CSV files.