How to Create a dataframe using Excel files?

In this article, we will look at how to use the Pandas library to read Excel files into Pandas dataframe objects.

excelFile.xlsx:

sample-excel-file-sheet1-image

Creating a dataframe using Excel files

Method #1: Using read_excel() function to read an Excel file

Here we use the read_excel() function of the pandas module of python to read an excel file.

 Approach:

  • Import pandas module using the import keyword
  • Pass the Excel filepath to be read as an argument to the read_excel() function of the pandas module to read an excel file.
  • Here it reads 1st sheet of an excel file by default.
  • Print the data of the above given excel file.
  • The Exit of the Program.

Below is the implementation:

# Import pandas module using the import keyword
import pandas as pd

# Pass the Excel filepath to be read as an argument to the read_excel() function
# of the pandas module to read an excel file.
# Here it reads 1st sheet of an excel file By default 
gvn_exceldata = pd.read_excel('excelFile.xlsx')

# Print the data of the above given excel file
print(gvn_exceldata)

Output:

      Website name        Name  Age         City           State Student  \
0       Sheets Tips     Vikram   21    Hyderabad       Telangana     Yes   
1   Python-Programs   Ashritha   22         Pune          Mumbai     Yes   
2        BTechGeeks     Manish   19       Indore  Madhya Pradesh     Yes   
3       PythonArray      Anish   20       Rajkot         Gujarat     Yes   
4       Sheets Tips     Vishal   18       Ranchi       Jharkhand     Yes   
5   Python-Programs      Pavan   23      Lucknow    Uttarpradesh      No   
6       Sheets Tips       John   32  Los Angeles         America      No   
7   Python-Programs      Virat   34         Agra           Delhi      No   
8        BTechGeeks       Cary   36        Tokyo           Japan      No   
9       PythonArray       Tara   39     Helsinki         Finland      No   
10      Sheets Tips      Smith   42      Nairobi           Kenya      No   
11  Python-Programs  Elizabeth   66        Paris           Italy      No   

   Employed Married     Scholorship    Id  
0        No      No  Not Applicable  1234  
1        No      No   Already Taken  2344  
2        No      No   Already Taken  1345  
3        No      No   Already Taken  1356  
4        No      No   Already Taken  1938  
5       Yes      No  Not Applicable  2345  
6       Yes     Yes  Not Applicable  5622  
7       Yes      No  Not Applicable  1734  
8       Yes     Yes   Already Taken  4562  
9       Yes     Yes  Not Applicable  2365  
10      Yes     Yes  Not Applicable  2463  
11      Yes     Yes  Not Applicable  2677  

Method #2:  Read Specific Sheet of an Excel file using the “sheet_name” attribute

Here we use the sheet_name attribute of the read_excel() function to read a particular sheet from an excel file.

Approach:

  • Import pandas module using the import keyword
  • Pass the Excel filepath, sheet name you want to be read as arguments to the read_excel() function of the pandas module to read a specific sheet of an excel file.
  • Here it reads the 2nd sheet of an excel file.
  • Print the data of a particular sheet of an excel file.
  • The Exit of the Program.

Below is the implementation:

# Import pandas module using the import keyword
import pandas as pd

# Pass the Excel filepath, sheet name you want to be read as arguments 
# to the read_excel() function of the pandas module to 
# read a specific sheet of an excel file.
# Here it reads the 2nd sheet of an excel file.
gvn_exceldata  = pd.read_excel('excelFile.xlsx', sheet_name = "Sheet1")

# Print the data of a particular sheet of an excel file
print(gvn_exceldata )

Output:

      Website name        Name  Age         City           State Student  \
0       Sheets Tips     Vikram   21    Hyderabad       Telangana     Yes   
1   Python-Programs   Ashritha   22         Pune          Mumbai     Yes   
2        BTechGeeks     Manish   19       Indore  Madhya Pradesh     Yes   
3       PythonArray      Anish   20       Rajkot         Gujarat     Yes   
4       Sheets Tips     Vishal   18       Ranchi       Jharkhand     Yes   
5   Python-Programs      Pavan   23      Lucknow    Uttarpradesh      No   
6       Sheets Tips       John   32  Los Angeles         America      No   
7   Python-Programs      Virat   34         Agra           Delhi      No   
8        BTechGeeks       Cary   36        Tokyo           Japan      No   
9       PythonArray       Tara   39     Helsinki         Finland      No   
10      Sheets Tips      Smith   42      Nairobi           Kenya      No   
11  Python-Programs  Elizabeth   66        Paris           Italy      No   

   Employed Married     Scholorship    Id  
0        No      No  Not Applicable  1234  
1        No      No   Already Taken  2344  
2        No      No   Already Taken  1345  
3        No      No   Already Taken  1356  
4        No      No   Already Taken  1938  
5       Yes      No  Not Applicable  2345  
6       Yes     Yes  Not Applicable  5622  
7       Yes      No  Not Applicable  1734  
8       Yes     Yes   Already Taken  4562  
9       Yes     Yes  Not Applicable  2365  
10      Yes     Yes  Not Applicable  2463  
11      Yes     Yes  Not Applicable  2677

Method #3: Read Specific Columns of an Excel File using “usecols” parameter

Here we use the usecols argument of a read_excel() function to read the data from the specific columns of an excel file.

Approach:

  • Import pandas module using the import keyword.
  • Give the list of columns(you want to read) and store it in a variable.
  • Pass the excel filepath, above list of columns to the usecols argument to the read_excel() function of the pandas module and store it in a variable
  • Here it reads the data from the specific columns of an excel file.
  • Print the data of the above given specific columns of an excel file.
  • The Exit of the Program.

Below is the implementation:

# Import pandas module using the import keyword
import pandas as pd

# Give the list of columns(you want to read) and store it in a variable
require_cols = [1, 4, 6]

# Pass the excel filepath, above list of columns to the usecols argument
# to the read_excel() function of the pandas module and store it in a variable
# Here it reads the data from the specific columns of an excel file.
specific_coldata= pd.read_excel('excelFile.xlsx', usecols = require_cols)

# Print the data of the above given specific columns of an excel file.
print(specific_coldata)

Output:

         Name           State Employed
0      Vikram       Telangana       No
1    Ashritha          Mumbai       No
2      Manish  Madhya Pradesh       No
3       Anish         Gujarat       No
4      Vishal       Jharkhand       No
5       Pavan    Uttarpradesh      Yes
6        John         America      Yes
7       Virat           Delhi      Yes
8        Cary           Japan      Yes
9        Tara         Finland      Yes
10      Smith           Kenya      Yes
11  Elizabeth           Italy      Yes

Method #4: Using the ‘na_values’ Argument to Handle Missing Data

We use the na_values argument of the read_excel() function of pandas module to handle all the missing data in a specific sheet of an excel file or complete excel file.

Here we replace all the na/NA values present in the 2nd sheet of an excel file with “No Data”.

Approach:

  • Import pandas module using the import keyword
  • Pass the excel filepath, na_values as “No Data”, sheet_name as arguments to the read_excel() function of the pandas module and handle the missing data.
  • Here it replaces all the na/NA values in the 2nd sheet of an excel file with “No Data”.
  • Print the 2nd sheet of the excel file after handling the missing data.
  • The Exit of the Program.

Below is the implementation:

# Import pandas module using the import keyword
import pandas as pd

# Pass the excel filepath, na_values as "No Data", sheet_name as arguments 
# to the read_excel() function of the pandas module and handle the missing data.
# Here it replaces all the na/NA values in the 2nd sheet of an excel file
# with "No Data".
gvn_exceldata  = pd.read_excel('excelFileMissingValues.xlsx', na_values = "No Data",
                                                    sheet_name ="Sheet1" )

# Print the 2nd sheet of the excel file after handling the missing data
print(gvn_exceldata)

Output: 

     Website name        Name   Age         City           State Student  \
0       Sheets Tips     Vikram  21.0    Hyderabad       Telangana     Yes   
1   Python-Programs   Ashritha  22.0         Pune          Mumbai     Yes   
2        BTechGeeks     Manish  19.0       Indore  Madhya Pradesh     Yes   
3       PythonArray      Anish  20.0       Rajkot         Gujarat     Yes   
4               NaN        NaN   NaN          NaN             NaN     NaN   
5               NaN        NaN   NaN          NaN             NaN     NaN   
6       Sheets Tips     Vishal  18.0       Ranchi       Jharkhand     Yes   
7   Python-Programs      Pavan  23.0      Lucknow    Uttarpradesh      No   
8       Sheets Tips       John  32.0  Los Angeles         America      No   
9   Python-Programs      Virat  34.0         Agra           Delhi      No   
10       BTechGeeks       Cary  36.0        Tokyo           Japan      No   
11      PythonArray       Tara  39.0     Helsinki         Finland      No   
12      Sheets Tips      Smith  42.0      Nairobi           Kenya      No   
13  Python-Programs  Elizabeth  66.0        Paris           Italy      No   

   Employed Married     Scholorship      Id  
0        No      No  Not Applicable  1234.0  
1        No      No   Already Taken  2344.0  
2        No      No   Already Taken  1345.0  
3        No      No   Already Taken  1356.0  
4       NaN     NaN             NaN     NaN  
5       NaN     NaN             NaN     NaN  
6        No      No   Already Taken  1938.0  
7       Yes      No  Not Applicable  2345.0  
8       Yes     Yes  Not Applicable  5622.0  
9       Yes      No  Not Applicable  1734.0  
10      Yes     Yes   Already Taken  4562.0  
11      Yes     Yes  Not Applicable  2365.0  
12      Yes     Yes  Not Applicable  2463.0  
13      Yes     Yes  Not Applicable  2677.0

Method #5: Using “skiprows” argument to Skip starting rows when Reading an Excel File

Here we use the skiprows argument of the read_excel() function of the pandas module to skip the starting rows of the excel file while reading.

Approach:

  • Import pandas module using the import keyword
  • Pass the excel file path, sheet_name, no of rows to be skipped as arguments to the read_excel() function of the pandas module and store it in a variable.
  • Here it skips the starting 3 rows of the 1st sheet while reading an excel file.
  • Print the data of the 1st sheet of the excel file after skipping the first 3 rows.
  • The Exit of the Program.

Below is the implementation:

# Import pandas module using the import keyword
import pandas as pd

# Pass the excel filepath, sheet_name, no of rows to be skipped as arguments 
# to the read_excel() function of the pandas module and store it in a variable.
# Here it skips the starting 5 rows of the 1st sheet while reading an excel file.
gvn_exceldata = pd.read_excel('excelFile.xlsx', sheet_name ="Sheet1", skiprows = 3)

# Print the data of the 1st sheet of the excel file after skipping the first 5 rows.
print(gvn_exceldata)

Output:

        BTechGeeks     Manish  19       Indore Madhya Pradesh  Yes   No No.1  \
0      PythonArray      Anish  20       Rajkot        Gujarat  Yes   No   No   
1      Sheets Tips     Vishal  18       Ranchi      Jharkhand  Yes   No   No   
2  Python-Programs      Pavan  23      Lucknow   Uttarpradesh   No  Yes   No   
3      Sheets Tips       John  32  Los Angeles        America   No  Yes  Yes   
4  Python-Programs      Virat  34         Agra          Delhi   No  Yes   No   
5       BTechGeeks       Cary  36        Tokyo          Japan   No  Yes  Yes   
6      PythonArray       Tara  39     Helsinki        Finland   No  Yes  Yes   
7      Sheets Tips      Smith  42      Nairobi          Kenya   No  Yes  Yes   
8  Python-Programs  Elizabeth  66        Paris          Italy   No  Yes  Yes   

    Already Taken  1345  
0   Already Taken  1356  
1   Already Taken  1938  
2  Not Applicable  2345  
3  Not Applicable  5622  
4  Not Applicable  1734  
5   Already Taken  4562  
6  Not Applicable  2365  
7  Not Applicable  2463  
8  Not Applicable  2677

Method #6: Setting header to any row and start reading from that row, using the “header” argument

Here we use the header argument of the read_excel() function of the pandas module to set the specific row as a header.

Approach:

  • Import pandas module using the import keyword
  • Pass the excel file path, sheet_name, the row number to be set as header as arguments to the read_excel() function of the pandas module and store it in a variable.
  • Here we set the 8th row as the header.
  • Print the data of the 1st sheet of the excel file after setting a specific row as a header.
  • The Exit of the Program.

Below is the implementation:

# Import pandas module using the import keyword
import pandas as pd

# Pass the excel filepath, sheet_name, the row number to be set as header as arguments 
# to the read_excel() function of the pandas module and store it in a variable.
# Here we set the 8th row as header.
gvn_exceldata = pd.read_excel('excelFile.xlsx', sheet_name = "Sheet1", header = 8)

# Print the data of the 1st sheet of the excel file after setting specific row as a header
print(gvn_exceldata)

Output:

  Python-Programs      Virat  34      Agra    Delhi  No  Yes No.1  \
0       BTechGeeks       Cary  36     Tokyo    Japan  No  Yes  Yes   
1      PythonArray       Tara  39  Helsinki  Finland  No  Yes  Yes   
2      Sheets Tips      Smith  42   Nairobi    Kenya  No  Yes  Yes   
3  Python-Programs  Elizabeth  66     Paris    Italy  No  Yes  Yes   

   Not Applicable  1734  
0   Already Taken  4562  
1  Not Applicable  2365  
2  Not Applicable  2463  
3  Not Applicable  2677

Method #7:  Read Multiple Excel Sheets using “sheet_name” argument

Here we use the sheet_name argument of the read_excel() function of the pandas module to read the data from multiple sheets in an excel file.

We must pass the sheet name as a list to read multiple sheets.

Approach:

  • Import pandas module using the import keyword
  • Pass the excel file path, and multiple sheet names as a list using the sheet_name argument as arguments to the read_excel() function of the pandas module and store it in a variable.
  • Here it reads the data of both the 1st and 2nd sheets.
  • Print the data of both the 1st and 2nd sheets in an excel file.
  • The Exit of the Program.

Below is the implementation:

# Import pandas module using the import keyword
import pandas as pd

# Pass the excel filepath, multiple sheet names as a list using the sheet_name argument
# as arguments to the read_excel() function of the pandas module and store it in a variable.
# Here it reads tha data of both the 1st and 2nd sheets.
gvn_exceldata = pd.read_excel('sampleExcelFile.xlsx', sheet_name =["Sheet1","Sheet2"])

# Print the data of both the 1st and 2nd sheets in an excel file
print(gvn_exceldata)

Output:

{'Sheet1':       Website name        Name  Age         City           State Student  \
0       Sheets Tips     Vikram   21    Hyderabad       Telangana     Yes   
1   Python-Programs   Ashritha   22         Pune          Mumbai     Yes   
2        BTechGeeks     Manish   19       Indore  Madhya Pradesh     Yes   
3       PythonArray      Anish   20       Rajkot         Gujarat     Yes   
4       Sheets Tips     Vishal   18       Ranchi       Jharkhand     Yes   
5   Python-Programs      Pavan   23      Lucknow    Uttarpradesh      No   
6       Sheets Tips       John   32  Los Angeles         America      No   
7   Python-Programs      Virat   34         Agra           Delhi      No   
8        BTechGeeks       Cary   36        Tokyo           Japan      No   
9       PythonArray       Tara   39     Helsinki         Finland      No   
10      Sheets Tips      Smith   42      Nairobi           Kenya      No   
11  Python-Programs  Elizabeth   66        Paris           Italy      No   

   Employed Married     Scholorship    Id  
0        No      No  Not Applicable  1234  
1        No      No   Already Taken  2344  
2        No      No   Already Taken  1345  
3        No      No   Already Taken  1356  
4        No      No   Already Taken  1938  
5       Yes      No  Not Applicable  2345  
6       Yes     Yes  Not Applicable  5622  
7       Yes      No  Not Applicable  1734  
8       Yes     Yes   Already Taken  4562  
9       Yes     Yes  Not Applicable  2365  
10      Yes     Yes  Not Applicable  2463  
11      Yes     Yes  Not Applicable  2677  , 'Sheet2':           Website  Id
0      SheetsTips   4
1      BtechGeeks   2
2  PythonPrograms   5
3     PythonArray   1}

Method #8: Read all Sheets of the Excel file together using ‘sheet_name’ argument

Here we pass the sheet_name argument of the read_excel() function as None to read the data from all the sheets of an excel file.

Approach:

  • Import pandas module using the import keyword
  • Pass the excel filepath, sheet_name = None as arguments to the read_excel() function of the pandas module and store it in a variable.
  • Here it reads the data from all the sheets of an excel file.
  • Print the data from all the sheets of an excel file.
  • The Exit of the Program.

Below is the implementation:

# Import pandas module using the import keyword
import pandas as pd

# Pass the excel filepath, sheet_name = None as arguments to the 
# read_excel() function of the pandas module and store it in a variable.
# Here it reads the data from all the sheets of an excel file.
gvn_exceldata = pd.read_excel('sampleExcelFile.xlsx', sheet_name =None)

# Print the data from all the sheets of an excel file.
print(gvn_exceldata)

Output:

{'Sheet1':       Website name        Name  Age         City           State Student  \
0       Sheets Tips     Vikram   21    Hyderabad       Telangana     Yes   
1   Python-Programs   Ashritha   22         Pune          Mumbai     Yes   
2        BTechGeeks     Manish   19       Indore  Madhya Pradesh     Yes   
3       PythonArray      Anish   20       Rajkot         Gujarat     Yes   
4       Sheets Tips     Vishal   18       Ranchi       Jharkhand     Yes   
5   Python-Programs      Pavan   23      Lucknow    Uttarpradesh      No   
6       Sheets Tips       John   32  Los Angeles         America      No   
7   Python-Programs      Virat   34         Agra           Delhi      No   
8        BTechGeeks       Cary   36        Tokyo           Japan      No   
9       PythonArray       Tara   39     Helsinki         Finland      No   
10      Sheets Tips      Smith   42      Nairobi           Kenya      No   
11  Python-Programs  Elizabeth   66        Paris           Italy      No   

   Employed Married     Scholorship    Id  
0        No      No  Not Applicable  1234  
1        No      No   Already Taken  2344  
2        No      No   Already Taken  1345  
3        No      No   Already Taken  1356  
4        No      No   Already Taken  1938  
5       Yes      No  Not Applicable  2345  
6       Yes     Yes  Not Applicable  5622  
7       Yes      No  Not Applicable  1734  
8       Yes     Yes   Already Taken  4562  
9       Yes     Yes  Not Applicable  2365  
10      Yes     Yes  Not Applicable  2463  
11      Yes     Yes  Not Applicable  2677  , 'Sheet2':           Website  Id
0      SheetsTips   4
1      BtechGeeks   2
2  PythonPrograms   5
3     PythonArray   1}

 

 

Leave a Comment