Web scraping is the process of grabbing data online. With plenty of tools available on the market, we can easily collect all the website information and save it in a doc. However, have you ever been taught how to scrape information from websites using Google Sheets? If not, then don’t worry.
Here is a detailed post, which tells you everything about web scraping information into the spreadsheet using the Sheets Tips provided on this page. Read on to find out more.
Also Read: Java Program to Print nth Element of the Array
Table of Contents |
Web Scraping in Google Sheets
Let us say, I am pulling out the top-performing pages published by Sheets Tips and want to find the authors who are driving the page views to the web. Since I want to find the authors of more than 1000+ articles, it will be extremely difficult for me to open each page to find the author’s name. To save time from this tedious task, we can simply use web scraping techniques. Let us understand how to get this done in Google Sheets using an example.
- How to Publish Google Sheets as Web Page: Share Google Sheets to Public
- Google Sheets Features 2021: Check Five Most Commonly Used Functions in Google Sheets
- How to Use IMPORTFEED in Google Sheets: Fetch Feed from URL with IMPORTFEED
Web Scraping with Google Sheets: Pull Data from Google Spreadsheet to Website
In this post, I am taking the example of the education blog, Learn CBSE. Here I want to pull out the authors’ names from the single spreadsheet for a specific set of blogs published by Learn CBSE. The steps to get this done in Google Sheets are explained below.
- 1st Step: I am taking up the random article published by Learn CBSE. Now copy the random article URL and paste it into the spreadsheet as shown in the image below.
- 2nd Step: Now the next step is to move to the page from where you have copied the URL.
- 3rd Step: Select the author’s name and right-click anywhere on the screen.
- 4th Step: Now a sub menu will open on the screen. Now choose the Inspect Element from the drop-down menu.
- 5th Step: This opens the developer inspection window, which allows us to look at the HTML element for the byline. Since we have selected the author’s name and started inspecting the website, there is 1 line of HTML code that has been highlighted.
- 6th Step: Copy the code that has been highlighted. The code that was highlighted here is as follows:<span class=”entry-author”><a href=”https://www.learncbse.in/author/veerendra/” class=”entry-author-link” rel=”author”><span class=”entry-author-name”>Veerendraspan>a>span>
- 7th Step: Now we will have to use the IMPORTXML function along with a second argument called “x-path query.” The xpath-query, //span [@class=’entry-author’], searches for span elements with the class “byline-author,” then returns the value of that element, which is our author’s name.
- 8th Step: Now copy the formula “=IMPORTXML (A1,”//span [@class=’entry-author’]”)” and paste it in cell B1, i.e., next to our URL as shown in the image below.
- 9th Step: Press the “Return” key and now you will see the results as shown below. Now drag the formula-applied cell to other parts of the cell to see other authors’ names.
This technique is extremely handy when you want to import thousands of author names into the Google Spreadsheet.
How to Pull a Table from a Website to Google Sheets?
Web scrapping in Google Sheets isn’t limited to the author names. We can import various types of data into the spreadsheet, such as tables, published dates, last modified dates, and much more.
In this example, I want to import a table from Wikipedia from the page of the “India men’s national field hockey team” which is picturized below.
Usually, most people try to copy and paste the table. However, if the table is very large, it would be difficult for the users to copy and paste the entire table. To overcome this issue, we can scrape the table from the desired website to the spreadsheet using the IMPORTHTML function.
The steps to get this done in Google Sheets are as follows:
- 1st Step: Open the Google Spreadsheet and paste the URL from which you would like to import the table.
- 2nd Step: Now the next step is to enter the formula. Move to the cell where you want to enter the formula.
- 3rd Step: Enter the formula as “=importhtml (A1, “table,” 7,”)“. We have used the number 7 since this is the 7th table on the page. Use the IMPORT function to find the table number that you would like to copy.
- 4th Step: Now press the “Return” key and that’s it. The table has been imported from the website to Google Sheets as shown below.