In this post, we will look at how to use the R programming language to merge numerous/multiple Excel files in a folder.
Modules Required
dplyr package:
The dplyr package in R contains several tools for manipulating data in R that provide a uniform set of verbs, helping to resolve the most frequent data manipulation challenges.
Syntax:
install.packages(“dplyr”)
Output:
Installing package into ‘C:/Users/cirus/AppData/Local/R/win-library/4.2’ (as ‘lib’ is unspecified) trying URL 'https://cloud.r-project.org/bin/windows/contrib/4.2/dplyr_1.0.9.zip' Content type 'application/zip' length 1297402 bytes (1.2 MB) downloaded 1.2 MB package ‘dplyr’ successfully unpacked and MD5 sums checked The downloaded binary packages are in C:\Users\cirus\AppData\Local\Temp\Rtmpo5UGo0\downloaded_packages
plyr package:
In R, the “plyr” package is used to work with data, including enhancements/improvements and modifications.
Syntax:
install.packages("plyr")
Output:
Installing package into ‘C:/Users/cirus/AppData/Local/R/win-library/4.2’ (as ‘lib’ is unspecified) trying URL 'https://cloud.r-project.org/bin/windows/contrib/4.2/plyr_1.8.7.zip' Content type 'application/zip' length 1152098 bytes (1.1 MB) downloaded 1.1 MB package ‘plyr’ successfully unpacked and MD5 sums checked The downloaded binary packages are in C:\Users\cirus\AppData\Local\Temp\Rtmpo5UGo0\downloaded_packages
readxl package:
In R, the readxl package is used to work with excel files.
Syntax:
install.packages("readxl")
Output:
Installing package into ‘C:/Users/cirus/AppData/Local/R/win-library/4.2’ (as ‘lib’ is unspecified) --- Please select a CRAN mirror for use in this session --- trying URL 'https://cloud.r-project.org/bin/windows/contrib/4.2/readxl_1.4.0.zip' Content type 'application/zip' length 1197616 bytes (1.1 MB) downloaded 1.1 MB package ‘readxl’ successfully unpacked and MD5 sums checked The downloaded binary packages are in C:\Users\cirus\AppData\Local\Temp\Rtmpo5UGo0\downloaded_packages
readr package:
The readr package in R is used to read files.
Syntax:
install.packages("readr")
Output:
Installing package into ‘C:/Users/cirus/AppData/Local/R/win-library/4.2’ (as ‘lib’ is unspecified) trying URL 'https://cloud.r-project.org/bin/windows/contrib/4.2/readr_2.1.2.zip' Content type 'application/zip' length 1207797 bytes (1.2 MB) downloaded 1.2 MB package ‘readr’ successfully unpacked and MD5 sums checked The downloaded binary packages are in C:\Users\cirus\AppData\Local\Temp\Rtmpo5UGo0\downloaded_packages
Functions Required
list.files() Function:
This gives a character vector of the names of files or directories in the named directory.
Syntax:
list.files(path = ".", pattern = NULL, all.files = FALSE,full.names = FALSE, recursive = FALSE, ignore.case = FALSE, include.dirs = FALSE, no.. = FALSE)
lapply() Function:
The lapply() function can be used to do operations on list objects and returns a list object with the same length as the original set.
lappy() returns a list of the same length as the input list object, with each element the result of applying FUN to the corresponding element of list. Lapply in R accepts a list, vector, or data frame as input and returns a list as output.
Syntax:
lapply(X, FUN)
bind_rows() function:
The bind_rows() function is a fast implementation of the common pattern of do.call(rbind, dfs) or do.call(cbind, dfs) for combining/binding many data frames into one.
Syntax:
bind_rows(…, .id = NULL)
Let us use the below given two Excel Files here:
Excel Files
plotExcelFile.xlsx:
excelFile.xlsx:
sampleExcelFile.xlsx:
How to Merge Multiple Excel files in R
Approach:
- Import plyr package using the library() function.
- Import dplyr package using the library() function.
- Import readr package using the library() function.
- Import readxl package using the library() function.
- Pass the location of the Folder that contains excel files, pattern ( In our case it is xlsx files) to the files() function
apply lapply() and bindrows to it. - Print the above result Data
- The Exit of the Program.
Below is the implementation:
# Import plyr package using the library() function library(plyr) # Import dplyr package using the library() function library(dplyr) # Import readr package using the library() function library(readr) # Import readxl package using the library() function library(readxl) # Pass the location of Folder that containing excel files, pattern ( In our case it is xlsx files) to the files() function # apply lapply() and bindrows to it resultData <- list.files(path = "C:/Users/cirus/Downloads/excelFilesFolder", pattern = "*.xlsx", full.names = TRUE) %>% lapply(read_excel) %>% bind_rows # Print the above result Data resultData
Output:
# A tibble: 31 × 14 `Website name` Name Age City State Student Employed Married Scholorship Id x y <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> 1 Sheets Tips Vikram 21 Hyder… Tela… Yes No No Not Applic… 1234 NA NA 2 Python-Programs Ashritha 22 Pune Mumb… Yes No No Already Ta… 2344 NA NA 3 BTechGeeks Manish 19 Indore Madh… Yes No No Already Ta… 1345 NA NA 4 PythonArray Anish 20 Rajkot Guja… Yes No No Already Ta… 1356 NA NA 5 Sheets Tips Vishal 18 Ranchi Jhar… Yes No No Already Ta… 1938 NA NA 6 Python-Programs Pavan 23 Luckn… Utta… No Yes No Not Applic… 2345 NA NA 7 Sheets Tips John 32 Los A… Amer… No Yes Yes Not Applic… 5622 NA NA 8 Python-Programs Virat 34 Agra Delhi No Yes No Not Applic… 1734 NA NA 9 BTechGeeks Cary 36 Tokyo Japan No Yes Yes Already Ta… 4562 NA NA 10 PythonArray Tara 39 Helsi… Finl… No Yes Yes Not Applic… 2365 NA NA # … with 21 more rows, and 2 more variables: Gender <chr>, Passport <chr>