Scrape Google Search Results using Python BeautifulSoup With Examples

Hello, readers! Here, we will be learning How to Scrape Google Search Results using BeautifulSoup in Python.

In this article, we will be having a look at one of the most interesting concept in Python — Scraping a website.

So, let us begin!


What is Web Scraping?

At times, when we surf through the web, we come across some user-related data that we believe would be beneficial for us in the future. And, then we try to copy it and save it to clipboard each time.

Now, let’s analyze the next scenario

We often need data to analyze the behavior of certain factors in terms of data modeling. Thus, we begin creating a dataset from scratch by copy-pasting the data.

This is when, Web Scraping or Web Crawling comes into picture.

Web Scraping is an easy way to perform the repetitive task of copy and pasting data from the websites. With web scraping, we can crawl/surf through the websites and save and represent the necessary data in a customized format.

Let us now understand the working of Web Scraping in the next section.


How Does Web Scraping Work?

Let us try to understand the functioning of Web Scraping through the below steps:

  • Initially, we write a piece of code that requests the server for the information with regards to the website we want to crawl or the information we want to scrape on the web.
  • Like a browser, the code would let us download the source code of the webpage.
  • Further, instead of visualizing the page in the manner that the browser does, we can filter the values based on the HTML tags and scrape only the needed information in a customized manner.

By this, we can load the source code of the webpage in a fast and customized manner.

Let us now try to implement Web Scraping in the upcoming section.


Bulk Scraping APIs

If you are looking to build some service by scraping bulk search, chances are high that Google will block you because of an unusually high number of requests. In that case, online APIs like Zenserp is a big help.

Zenserp performs searches through various IPs and proxies and allows you to focus on your logic rather than infrastructure. It also makes your job easier by supporting image search, shopping search, image reverse search, trends, etc. You can try it out here, just fire any search result and see the JSON response.


Implementing steps to Scrape Google Search results using BeautifulSoup

We will be implementing BeautifulSoup to scrape Google Search results here.

BeautifulSoup is a Python library that enables us to crawl through the website and scrape the XML and HTML documents, webpages, etc.


Scrape Google Search results for Customized search

Example 1:

Line by line explanation of the above code:

  1. Importing the necessary libraries In order to make use of BeautifulSoup for scraping, we need to import the library through the below code:

Further, we need the Python requests library to download the webpage. The request module sends a GET request to the server, which enables it to download the HTML contents of the required webpage.

2. Set the URL: We need to provide the url i.e. the domain wherein we want our information to be searched and scraped. Here, we have provided the URL of google and appended the text ‘Python’ to scrape the results with respect to text=’Python’.

3. Setting User-Agent: We need to specify the User Agent Headers which lets the server identify the system and application, browsers wherein we want the data to be downloaded as shown below–

4. The requests.get(url, header) sends the request to the web server so as to download the requested HTML content of the web page or the search results.

5. Create an object of BeautifulSoup with the requested data from ‘lxml‘ parsing headers. The ‘lxml‘ package must be installed for the below code to work.

6. Further, we use object.find_all('h3') to scrape and display all the Header 3 content of the web browser for the text=’Python’.

Output:


Scrape Search results from a Particular Webpage

In this example, we have scraped the HTML tag values from the website as shown:

Example 2:

Further, we have scraped the title tag values and all the a href values present in the div tag of class value = site. Here, the class value differs for each website according to the structure of the code.

Output:


Conclusion

By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question.

For more such posts related to Python, stay tuned and till then, Happy Learning!! 🙂


References

By admin

2 thoughts on “Scrape Google Search Results using Python BeautifulSoup With Examples [Latest]”
  1. Hi,

    Thank you very much for you amazing work, it’s really helpful!

    I was wondering if it would be possible to scrape all Google Titles for a given set of URLs.

    I’m not a developer but I mixed your scripts 🙂

    import requests
    from bs4 import BeautifulSoup
    import random
    url=”https://www.google.es/search?q=site:https://yourURL.html
    A = (“Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36”,
    “Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.1 Safari/537.36”,
    “Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.0 Safari/537.36”,
    )
    Agent = A[random.randrange(len(A))]
    headers = {‘user-agent’: Agent}
    r = requests.get(url, headers=headers)
    soup = BeautifulSoup(r.content, ‘lxml’)

    for info in soup.find_all(‘h3’):
    print(info.text)
    print(‘#######’)

    It works, but what will be really useful, is to make this process at scale, for example for 1.000 URLs

    Do you think is it possible?

    Best regards

  2. Congratulations, amazing work!

    I’m not a developer but just unified both scripts to a new one to extract the SERP Title for the given URL:

    import requests
    from bs4 import BeautifulSoup
    import random
    url=”https://www.google.es/search?q=site:https://www.yoururl.html
    A = (“Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36”,
    “Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.1 Safari/537.36”,
    “Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.0 Safari/537.36”,
    )
    Agent = A[random.randrange(len(A))]
    headers = {‘user-agent’: Agent}
    r = requests.get(url, headers=headers)
    soup = BeautifulSoup(r.content, ‘lxml’)

    for info in soup.find_all(‘h3’):
    print(info.text)
    print(‘#######’)

    What could be really helpful is to run this script at scale (1000 URLs foe example). I’ve tried to do that with an array and foe statement but It didn’t work (I don’t know python at all).

    Do you think it could pe possible?

    Best regards and thanks again

Leave a Reply

%d bloggers like this: