In this article, we will look at how to use the Pandas library to read Excel files into Pandas dataframe objects.
excelFile.xlsx:
Creating a dataframe using Excel files
- Using read_excel() function to read an Excel file
- Read Specific Sheet of an Excel file using the “sheet_name” attribute
- Read Specific Columns of an Excel File using “usecols” parameter
- Using the ‘na_values’ Argument to Handle Missing Data
- Using “skiprows” argument to Skip starting rows when Reading an Excel File
- Setting header to any row and start reading from that row, using the “header” argument
- Read Multiple Excel Sheets using “sheet_name” argument
- Read all Sheets of the Excel file together using ‘sheet_name’ argument
Method #1: Using read_excel() function to read an Excel file
Here we use the read_excel() function of the pandas module of python to read an excel file.
Approach:
- Import pandas module using the import keyword
- Pass the Excel filepath to be read as an argument to the read_excel() function of the pandas module to read an excel file.
- Here it reads 1st sheet of an excel file by default.
- Print the data of the above given excel file.
- The Exit of the Program.
Below is the implementation:
# Import pandas module using the import keyword import pandas as pd # Pass the Excel filepath to be read as an argument to the read_excel() function # of the pandas module to read an excel file. # Here it reads 1st sheet of an excel file By default gvn_exceldata = pd.read_excel('excelFile.xlsx') # Print the data of the above given excel file print(gvn_exceldata)
Output:
Website name Name Age City State Student \ 0 Sheets Tips Vikram 21 Hyderabad Telangana Yes 1 Python-Programs Ashritha 22 Pune Mumbai Yes 2 BTechGeeks Manish 19 Indore Madhya Pradesh Yes 3 PythonArray Anish 20 Rajkot Gujarat Yes 4 Sheets Tips Vishal 18 Ranchi Jharkhand Yes 5 Python-Programs Pavan 23 Lucknow Uttarpradesh No 6 Sheets Tips John 32 Los Angeles America No 7 Python-Programs Virat 34 Agra Delhi No 8 BTechGeeks Cary 36 Tokyo Japan No 9 PythonArray Tara 39 Helsinki Finland No 10 Sheets Tips Smith 42 Nairobi Kenya No 11 Python-Programs Elizabeth 66 Paris Italy No Employed Married Scholorship Id 0 No No Not Applicable 1234 1 No No Already Taken 2344 2 No No Already Taken 1345 3 No No Already Taken 1356 4 No No Already Taken 1938 5 Yes No Not Applicable 2345 6 Yes Yes Not Applicable 5622 7 Yes No Not Applicable 1734 8 Yes Yes Already Taken 4562 9 Yes Yes Not Applicable 2365 10 Yes Yes Not Applicable 2463 11 Yes Yes Not Applicable 2677
Method #2: Read Specific Sheet of an Excel file using the “sheet_name” attribute
Here we use the sheet_name
attribute of the read_excel() function to read a particular sheet from an excel file.
Approach:
- Import pandas module using the import keyword
- Pass the Excel filepath, sheet name you want to be read as arguments to the read_excel() function of the pandas module to read a specific sheet of an excel file.
- Here it reads the 2nd sheet of an excel file.
- Print the data of a particular sheet of an excel file.
- The Exit of the Program.
Below is the implementation:
# Import pandas module using the import keyword import pandas as pd # Pass the Excel filepath, sheet name you want to be read as arguments # to the read_excel() function of the pandas module to # read a specific sheet of an excel file. # Here it reads the 2nd sheet of an excel file. gvn_exceldata = pd.read_excel('excelFile.xlsx', sheet_name = "Sheet1") # Print the data of a particular sheet of an excel file print(gvn_exceldata )
Output:
Website name Name Age City State Student \ 0 Sheets Tips Vikram 21 Hyderabad Telangana Yes 1 Python-Programs Ashritha 22 Pune Mumbai Yes 2 BTechGeeks Manish 19 Indore Madhya Pradesh Yes 3 PythonArray Anish 20 Rajkot Gujarat Yes 4 Sheets Tips Vishal 18 Ranchi Jharkhand Yes 5 Python-Programs Pavan 23 Lucknow Uttarpradesh No 6 Sheets Tips John 32 Los Angeles America No 7 Python-Programs Virat 34 Agra Delhi No 8 BTechGeeks Cary 36 Tokyo Japan No 9 PythonArray Tara 39 Helsinki Finland No 10 Sheets Tips Smith 42 Nairobi Kenya No 11 Python-Programs Elizabeth 66 Paris Italy No Employed Married Scholorship Id 0 No No Not Applicable 1234 1 No No Already Taken 2344 2 No No Already Taken 1345 3 No No Already Taken 1356 4 No No Already Taken 1938 5 Yes No Not Applicable 2345 6 Yes Yes Not Applicable 5622 7 Yes No Not Applicable 1734 8 Yes Yes Already Taken 4562 9 Yes Yes Not Applicable 2365 10 Yes Yes Not Applicable 2463 11 Yes Yes Not Applicable 2677
Method #3: Read Specific Columns of an Excel File using “usecols” parameter
Here we use the usecols
argument of a read_excel() function to read the data from the specific columns of an excel file.
Approach:
- Import pandas module using the import keyword.
- Give the list of columns(you want to read) and store it in a variable.
- Pass the excel filepath, above list of columns to the usecols argument to the read_excel() function of the pandas module and store it in a variable
- Here it reads the data from the specific columns of an excel file.
- Print the data of the above given specific columns of an excel file.
- The Exit of the Program.
Below is the implementation:
# Import pandas module using the import keyword import pandas as pd # Give the list of columns(you want to read) and store it in a variable require_cols = [1, 4, 6] # Pass the excel filepath, above list of columns to the usecols argument # to the read_excel() function of the pandas module and store it in a variable # Here it reads the data from the specific columns of an excel file. specific_coldata= pd.read_excel('excelFile.xlsx', usecols = require_cols) # Print the data of the above given specific columns of an excel file. print(specific_coldata)
Output:
Name State Employed 0 Vikram Telangana No 1 Ashritha Mumbai No 2 Manish Madhya Pradesh No 3 Anish Gujarat No 4 Vishal Jharkhand No 5 Pavan Uttarpradesh Yes 6 John America Yes 7 Virat Delhi Yes 8 Cary Japan Yes 9 Tara Finland Yes 10 Smith Kenya Yes 11 Elizabeth Italy Yes
Method #4: Using the ‘na_values’ Argument to Handle Missing Data
We use the na_values argument of the read_excel() function of pandas module to handle all the missing data in a specific sheet of an excel file or complete excel file.
Here we replace all the na/NA values present in the 2nd sheet of an excel file with “No Data”.
Approach:
- Import pandas module using the import keyword
- Pass the excel filepath, na_values as “No Data”, sheet_name as arguments to the read_excel() function of the pandas module and handle the missing data.
- Here it replaces all the na/NA values in the 2nd sheet of an excel file with “No Data”.
- Print the 2nd sheet of the excel file after handling the missing data.
- The Exit of the Program.
Below is the implementation:
# Import pandas module using the import keyword import pandas as pd # Pass the excel filepath, na_values as "No Data", sheet_name as arguments # to the read_excel() function of the pandas module and handle the missing data. # Here it replaces all the na/NA values in the 2nd sheet of an excel file # with "No Data". gvn_exceldata = pd.read_excel('excelFileMissingValues.xlsx', na_values = "No Data", sheet_name ="Sheet1" ) # Print the 2nd sheet of the excel file after handling the missing data print(gvn_exceldata)
Output:
Website name Name Age City State Student \ 0 Sheets Tips Vikram 21.0 Hyderabad Telangana Yes 1 Python-Programs Ashritha 22.0 Pune Mumbai Yes 2 BTechGeeks Manish 19.0 Indore Madhya Pradesh Yes 3 PythonArray Anish 20.0 Rajkot Gujarat Yes 4 NaN NaN NaN NaN NaN NaN 5 NaN NaN NaN NaN NaN NaN 6 Sheets Tips Vishal 18.0 Ranchi Jharkhand Yes 7 Python-Programs Pavan 23.0 Lucknow Uttarpradesh No 8 Sheets Tips John 32.0 Los Angeles America No 9 Python-Programs Virat 34.0 Agra Delhi No 10 BTechGeeks Cary 36.0 Tokyo Japan No 11 PythonArray Tara 39.0 Helsinki Finland No 12 Sheets Tips Smith 42.0 Nairobi Kenya No 13 Python-Programs Elizabeth 66.0 Paris Italy No Employed Married Scholorship Id 0 No No Not Applicable 1234.0 1 No No Already Taken 2344.0 2 No No Already Taken 1345.0 3 No No Already Taken 1356.0 4 NaN NaN NaN NaN 5 NaN NaN NaN NaN 6 No No Already Taken 1938.0 7 Yes No Not Applicable 2345.0 8 Yes Yes Not Applicable 5622.0 9 Yes No Not Applicable 1734.0 10 Yes Yes Already Taken 4562.0 11 Yes Yes Not Applicable 2365.0 12 Yes Yes Not Applicable 2463.0 13 Yes Yes Not Applicable 2677.0
Method #5: Using “skiprows” argument to Skip starting rows when Reading an Excel File
Here we use the skiprows argument of the read_excel() function of the pandas module to skip the starting rows of the excel file while reading.
Approach:
- Import pandas module using the import keyword
- Pass the excel file path, sheet_name, no of rows to be skipped as arguments to the read_excel() function of the pandas module and store it in a variable.
- Here it skips the starting 3 rows of the 1st sheet while reading an excel file.
- Print the data of the 1st sheet of the excel file after skipping the first 3 rows.
- The Exit of the Program.
Below is the implementation:
# Import pandas module using the import keyword import pandas as pd # Pass the excel filepath, sheet_name, no of rows to be skipped as arguments # to the read_excel() function of the pandas module and store it in a variable. # Here it skips the starting 5 rows of the 1st sheet while reading an excel file. gvn_exceldata = pd.read_excel('excelFile.xlsx', sheet_name ="Sheet1", skiprows = 3) # Print the data of the 1st sheet of the excel file after skipping the first 5 rows. print(gvn_exceldata)
Output:
BTechGeeks Manish 19 Indore Madhya Pradesh Yes No No.1 \ 0 PythonArray Anish 20 Rajkot Gujarat Yes No No 1 Sheets Tips Vishal 18 Ranchi Jharkhand Yes No No 2 Python-Programs Pavan 23 Lucknow Uttarpradesh No Yes No 3 Sheets Tips John 32 Los Angeles America No Yes Yes 4 Python-Programs Virat 34 Agra Delhi No Yes No 5 BTechGeeks Cary 36 Tokyo Japan No Yes Yes 6 PythonArray Tara 39 Helsinki Finland No Yes Yes 7 Sheets Tips Smith 42 Nairobi Kenya No Yes Yes 8 Python-Programs Elizabeth 66 Paris Italy No Yes Yes Already Taken 1345 0 Already Taken 1356 1 Already Taken 1938 2 Not Applicable 2345 3 Not Applicable 5622 4 Not Applicable 1734 5 Already Taken 4562 6 Not Applicable 2365 7 Not Applicable 2463 8 Not Applicable 2677
Method #6: Setting header to any row and start reading from that row, using the “header” argument
Here we use the header
argument of the read_excel() function of the pandas module to set the specific row as a header.
Approach:
- Import pandas module using the import keyword
- Pass the excel file path, sheet_name, the row number to be set as header as arguments to the read_excel() function of the pandas module and store it in a variable.
- Here we set the 8th row as the header.
- Print the data of the 1st sheet of the excel file after setting a specific row as a header.
- The Exit of the Program.
Below is the implementation:
# Import pandas module using the import keyword import pandas as pd # Pass the excel filepath, sheet_name, the row number to be set as header as arguments # to the read_excel() function of the pandas module and store it in a variable. # Here we set the 8th row as header. gvn_exceldata = pd.read_excel('excelFile.xlsx', sheet_name = "Sheet1", header = 8) # Print the data of the 1st sheet of the excel file after setting specific row as a header print(gvn_exceldata)
Output:
Python-Programs Virat 34 Agra Delhi No Yes No.1 \ 0 BTechGeeks Cary 36 Tokyo Japan No Yes Yes 1 PythonArray Tara 39 Helsinki Finland No Yes Yes 2 Sheets Tips Smith 42 Nairobi Kenya No Yes Yes 3 Python-Programs Elizabeth 66 Paris Italy No Yes Yes Not Applicable 1734 0 Already Taken 4562 1 Not Applicable 2365 2 Not Applicable 2463 3 Not Applicable 2677
Method #7: Read Multiple Excel Sheets using “sheet_name” argument
Here we use the sheet_name argument of the read_excel() function of the pandas module to read the data from multiple sheets in an excel file.
We must pass the sheet name as a list to read multiple sheets.
Approach:
- Import pandas module using the import keyword
- Pass the excel file path, and multiple sheet names as a list using the sheet_name argument as arguments to the read_excel() function of the pandas module and store it in a variable.
- Here it reads the data of both the 1st and 2nd sheets.
- Print the data of both the 1st and 2nd sheets in an excel file.
- The Exit of the Program.
Below is the implementation:
# Import pandas module using the import keyword import pandas as pd # Pass the excel filepath, multiple sheet names as a list using the sheet_name argument # as arguments to the read_excel() function of the pandas module and store it in a variable. # Here it reads tha data of both the 1st and 2nd sheets. gvn_exceldata = pd.read_excel('sampleExcelFile.xlsx', sheet_name =["Sheet1","Sheet2"]) # Print the data of both the 1st and 2nd sheets in an excel file print(gvn_exceldata)
Output:
{'Sheet1': Website name Name Age City State Student \ 0 Sheets Tips Vikram 21 Hyderabad Telangana Yes 1 Python-Programs Ashritha 22 Pune Mumbai Yes 2 BTechGeeks Manish 19 Indore Madhya Pradesh Yes 3 PythonArray Anish 20 Rajkot Gujarat Yes 4 Sheets Tips Vishal 18 Ranchi Jharkhand Yes 5 Python-Programs Pavan 23 Lucknow Uttarpradesh No 6 Sheets Tips John 32 Los Angeles America No 7 Python-Programs Virat 34 Agra Delhi No 8 BTechGeeks Cary 36 Tokyo Japan No 9 PythonArray Tara 39 Helsinki Finland No 10 Sheets Tips Smith 42 Nairobi Kenya No 11 Python-Programs Elizabeth 66 Paris Italy No Employed Married Scholorship Id 0 No No Not Applicable 1234 1 No No Already Taken 2344 2 No No Already Taken 1345 3 No No Already Taken 1356 4 No No Already Taken 1938 5 Yes No Not Applicable 2345 6 Yes Yes Not Applicable 5622 7 Yes No Not Applicable 1734 8 Yes Yes Already Taken 4562 9 Yes Yes Not Applicable 2365 10 Yes Yes Not Applicable 2463 11 Yes Yes Not Applicable 2677 , 'Sheet2': Website Id 0 SheetsTips 4 1 BtechGeeks 2 2 PythonPrograms 5 3 PythonArray 1}
Method #8: Read all Sheets of the Excel file together using ‘sheet_name’ argument
Here we pass the sheet_name argument of the read_excel() function as None to read the data from all the sheets of an excel file.
Approach:
- Import pandas module using the import keyword
- Pass the excel filepath, sheet_name = None as arguments to the read_excel() function of the pandas module and store it in a variable.
- Here it reads the data from all the sheets of an excel file.
- Print the data from all the sheets of an excel file.
- The Exit of the Program.
Below is the implementation:
# Import pandas module using the import keyword import pandas as pd # Pass the excel filepath, sheet_name = None as arguments to the # read_excel() function of the pandas module and store it in a variable. # Here it reads the data from all the sheets of an excel file. gvn_exceldata = pd.read_excel('sampleExcelFile.xlsx', sheet_name =None) # Print the data from all the sheets of an excel file. print(gvn_exceldata)
Output:
{'Sheet1': Website name Name Age City State Student \ 0 Sheets Tips Vikram 21 Hyderabad Telangana Yes 1 Python-Programs Ashritha 22 Pune Mumbai Yes 2 BTechGeeks Manish 19 Indore Madhya Pradesh Yes 3 PythonArray Anish 20 Rajkot Gujarat Yes 4 Sheets Tips Vishal 18 Ranchi Jharkhand Yes 5 Python-Programs Pavan 23 Lucknow Uttarpradesh No 6 Sheets Tips John 32 Los Angeles America No 7 Python-Programs Virat 34 Agra Delhi No 8 BTechGeeks Cary 36 Tokyo Japan No 9 PythonArray Tara 39 Helsinki Finland No 10 Sheets Tips Smith 42 Nairobi Kenya No 11 Python-Programs Elizabeth 66 Paris Italy No Employed Married Scholorship Id 0 No No Not Applicable 1234 1 No No Already Taken 2344 2 No No Already Taken 1345 3 No No Already Taken 1356 4 No No Already Taken 1938 5 Yes No Not Applicable 2345 6 Yes Yes Not Applicable 5622 7 Yes No Not Applicable 1734 8 Yes Yes Already Taken 4562 9 Yes Yes Not Applicable 2365 10 Yes Yes Not Applicable 2463 11 Yes Yes Not Applicable 2677 , 'Sheet2': Website Id 0 SheetsTips 4 1 BtechGeeks 2 2 PythonPrograms 5 3 PythonArray 1}