February 17, 2024 No Comments

Introduction

Data scraping with Selenium in Robot Framework lets you automate web scraping jobs using Selenium, a tool for web automation. It helps you navigate websites, interact with elements, and gather data. Here’s a simple guide to start with data scraping using Selenium in Robot Framework. Robot Framework’s keyword-driven approach simplifies test creation and maintenance, while Selenium provides the capability to interact with web elements and perform actions such as clicking buttons, filling forms, and validating text.

If you’re new to Robot Framework or would like to explore its basics, you might find our previous blog on Robot Framework Basics: Step-By-Step Test Tutorial For Beginners helpful. It will provide you with a comprehensive guide to getting started with Robot Framework, covering installation steps, executing basic test cases, and understanding key concepts.

Selenium Script Using Robot Framework

Here’s a simple script demonstrating how to use a Selenium script for a specific purpose.

Install SeleniumLibrary

SeleniumLibrary is a tool for Robot Framework that enables you to control web browsers. You can install it using the following command.

				
					pip install robotframework-seleniumlibrary

				
			
				
					*** Settings ***
Library   SeleniumLibrary

*** Variables ***
${username}            <your email address of Postify account>  # Example: abc@gmail.com
${password}            <password of the Postify account>  # 12345678

*** Test Cases ***
Login Test
   Log    Hello
   [Documentation]  Test the login functionality
   [Tags]  login_check
   Open Browser  https://www.postify.ai/login  chrome
   Input Text  email  ${username}
   Input Text  password  ${password}
   Click Button  class=login100-form-btn
   Sleep    3s
   ${element_text}    Get Text    xpath:/html/body/div[6]/div/div/div/div/div/div/div[1]/form/div/h2

   Log    ${element_text}
   Close Browser 
				
			
    • We begin by importing the SeleniumLibrary in the settings section. Then, we define the username and password variables, storing our Postify account credentials. If you haven’t signed up for Postify yet, you can do so here.
    • Next, we create a test case called Login Test for logging into Postify. We open the Chrome browser and navigate to the login page. We enter the login details using the IDs ’email’ and ‘password’. Then, we click on the login button using the login100-form-btn class.
    • After that, we extract the text from a specific element on the page using the XPath ("/html/body/div[6]/div/div/div/div/div/div/div[1]/form/div/h2") and log it for verification.
How to get XPath?

To locate an element on a web page, you can right-click on it and choose ‘Inspect‘ or ‘Inspect Element‘ from the menu that appears. This action opens Chrome Developer Tools. In the Elements tab within these tools, you’ll see the HTML code of the selected element highlighted. Right-click on this highlighted code, and from the menu that appears, hover over ‘Copy‘ and then select ‘Copy XPath‘ or ‘Copy full XPath‘ to copy the XPath expression of the element.

We finally closed the Chrome browser to complete the test.

You will see an output similar to the provided image in your terminal.

Error Case

In summary, the error indicates that the test case Login Test failed because it couldn’t locate the expected element (an h2 element within a form) using the specified XPath locator. This could be due to various reasons such as a change in the web page’s structure, an incorrect XPath, or the element not being present as expected.

Robot Framework Script to Get The URLs Of Google Images

				
					
*** Settings *** 
Library   SeleniumLibrary
Library   OperatingSystem
*** Variables ***
${BROWSER}    Chrome
${URL}        https://www.google.com/
${IMAGE_LOCATOR}    css:img.rg_i.Q4LuWd
${SAVE_PATH}    image_src.txt
*** Test Cases ***
Get Images Using Selenium
    Open Browser    ${URL}    ${BROWSER}
    Sleep    3s
    Maximize Browser Window
    Sleep    5s
    Input Text    name=q    Google images
    Sleep    5s
    Click Button    name=btnK
    Sleep    3s
    Click Element     class=LC20lb
    Sleep    3s
    Input Text    name=q    falcon
    Sleep    3s
    Click Button    xpath://button[@aria-label='Google Search']
    Sleep    5s
    ${image_elements}=    Get WebElements    ${IMAGE_LOCATOR}
    FOR    ${element}    IN    @{image_elements}
         ${image_source}=    Get Element Attribute    ${element}    src
         Log    Image Source: ${image_source}
         Log To Console    Image Source: ${image_source}
         Append To File    ${SAVE_PATH}    ${image_source}\n
    END
				
			
Explanation of the above code:
    • In the Settings section, we import the ‘SeleniumLibrary’ for browser automation and the ‘OperatingSystem’ library for file operations.

    • The Variables section defines the variables used in the test cases.

    • BROWSER holds the browser for automation (Chrome in this case).

    • URL stores the URL of the Google website.

    • IMAGE_LOCATOR specifies the CSS selector for the image elements on the Google Images search page.

    • SAVE_PATH defines the file path where the image sources will be saved.

    • This test case Get Images Using Selenium automates the process of searching for images of ‘falcons‘ on Google.

    • It opens the Google homepage, searches for Google Images, enters ‘falcon‘ in the search bar, and clicks on the search button.

    • After getting the search results, it extracts the source URLs of the images and logs them to the console.

    • Additionally, it appends each image source URL to the file image_src.txt that we have assigned to the variable SAVE_PATH.

Verifying the output
    • Inspect the directory located at /media/project/Test, it will contain the directory name specified by the user in the search query. Within this directory, you’ll find a urls.txt file containing the URLs of Google Images corresponding to the search query.

    • The following is a sample image of the urls.txt file.

Here is the GitHub link containing the code mentioned above.

Conclusion

This script demonstrates how Robot Framework, combined with SeleniumLibrary, can automate the process of fetching URLs from Google Images. However, it’s important to use such automation responsibly and ensure compliance with website terms of service and legal regulations regarding web scraping. Whether you’re testing software or scraping data from the web, Robot Framework saves time and effort, making automation efficient and effective in various situations.

Write a comment

Your email address will not be published. Required fields are marked *

Pragnakalp Techlabs: Your trusted partner in Python, AI, NLP, Generative AI, ML, and Automation. Our skilled experts have successfully delivered robust solutions to satisfied clients, driving innovation and success.