Introduction
Data scraping with Selenium in Robot Framework lets you automate web scraping jobs using Selenium, a tool for web automation. It helps you navigate websites, interact with elements, and gather data. Here’s a simple guide to start with data scraping using Selenium in Robot Framework. Robot Framework’s keyword-driven approach simplifies test creation and maintenance, while Selenium provides the capability to interact with web elements and perform actions such as clicking buttons, filling forms, and validating text.
If you’re new to Robot Framework or would like to explore its basics, you might find our previous blog on Robot Framework Basics: Step-By-Step Test Tutorial For Beginners helpful. It will provide you with a comprehensive guide to getting started with Robot Framework, covering installation steps, executing basic test cases, and understanding key concepts.
Selenium Script Using Robot Framework
Here’s a simple script demonstrating how to use a Selenium script for a specific purpose.
Install SeleniumLibrary
SeleniumLibrary is a tool for Robot Framework that enables you to control web browsers. You can install it using the following command.
pip install robotframework-seleniumlibrary
*** Settings ***
Library SeleniumLibrary
*** Variables ***
${username} # Example: abc@gmail.com
${password} # 12345678
*** Test Cases ***
Login Test
Log Hello
[Documentation] Test the login functionality
[Tags] login_check
Open Browser https://www.postify.ai/login chrome
Input Text email ${username}
Input Text password ${password}
Click Button class=login100-form-btn
Sleep 3s
${element_text} Get Text xpath:/html/body/div[6]/div/div/div/div/div/div/div[1]/form/div/h2
Log ${element_text}
Close Browser
-
- We begin by importing the
SeleniumLibrary
in the settings section. Then, we define theusername
andpassword
variables, storing our Postify account credentials. If you haven’t signed up for Postify yet, you can do so here.
- We begin by importing the
-
- Next, we create a test case called
Login Test
for logging into Postify. We open the Chrome browser and navigate to the login page. We enter the login details using the IDs ’email’ and ‘password’. Then, we click on the login button using thelogin100-form-btn
class.
- Next, we create a test case called
-
- After that, we extract the text from a specific element on the page using the XPath
("/html/body/div[6]/div/div/div/div/div/div/div[1]/form/div/h2")
and log it for verification.
- After that, we extract the text from a specific element on the page using the XPath
How to get XPath?
To locate an element on a web page, you can right-click on it and choose ‘Inspect‘ or ‘Inspect Element‘ from the menu that appears. This action opens Chrome Developer Tools. In the Elements tab within these tools, you’ll see the HTML code of the selected element highlighted. Right-click on this highlighted code, and from the menu that appears, hover over ‘Copy‘ and then select ‘Copy XPath‘ or ‘Copy full XPath‘ to copy the XPath expression of the element.
We finally closed the Chrome browser to complete the test.
You will see an output similar to the provided image in your terminal.
In summary, the error indicates that the test case Login Test
failed because it couldn’t locate the expected element (an h2 element within a form) using the specified XPath locator. This could be due to various reasons such as a change in the web page’s structure, an incorrect XPath, or the element not being present as expected.
Robot Framework Script to Get The URLs Of Google Images
*** Settings ***
Library SeleniumLibrary
Library OperatingSystem
*** Variables ***
${BROWSER} Chrome
${URL} https://www.google.com/
${IMAGE_LOCATOR} css:img.rg_i.Q4LuWd
${SAVE_PATH} image_src.txt
*** Test Cases ***
Get Images Using Selenium
Open Browser ${URL} ${BROWSER}
Sleep 3s
Maximize Browser Window
Sleep 5s
Input Text name=q Google images
Sleep 5s
Click Button name=btnK
Sleep 3s
Click Element class=LC20lb
Sleep 3s
Input Text name=q falcon
Sleep 3s
Click Button xpath://button[@aria-label='Google Search']
Sleep 5s
${image_elements}= Get WebElements ${IMAGE_LOCATOR}
FOR ${element} IN @{image_elements}
${image_source}= Get Element Attribute ${element} src
Log Image Source: ${image_source}
Log To Console Image Source: ${image_source}
Append To File ${SAVE_PATH} ${image_source}\n
END
Explanation of the above code:
In the
Settings
section, we import the ‘SeleniumLibrary’ for browser automation and the ‘OperatingSystem’ library for file operations.
The
Variables
section defines the variables used in the test cases.
BROWSER
holds the browser for automation (Chrome in this case).
URL
stores the URL of the Google website.
IMAGE_LOCATOR
specifies the CSS selector for the image elements on the Google Images search page.
SAVE_PATH
defines the file path where the image sources will be saved.
This test case
Get Images Using Selenium
automates the process of searching for images of ‘falcons‘ on Google.
It opens the Google homepage, searches for Google Images, enters ‘falcon‘ in the search bar, and clicks on the search button.
After getting the search results, it extracts the source URLs of the images and logs them to the console.
Additionally, it appends each image source URL to the file
image_src.txt
that we have assigned to the variableSAVE_PATH
.
Verifying the output
Inspect the directory located at
/media/project/Test
, it will contain the directory name specified by the user in the search query. Within this directory, you’ll find aurls.txt
file containing the URLs of Google Images corresponding to the search query.
The following is a sample image of the
urls.txt
file.
Here is the GitHub link containing the code mentioned above.
Conclusion
This script demonstrates how Robot Framework, combined with SeleniumLibrary, can automate the process of fetching URLs from Google Images. However, it’s important to use such automation responsibly and ensure compliance with website terms of service and legal regulations regarding web scraping. Whether you’re testing software or scraping data from the web, Robot Framework saves time and effort, making automation efficient and effective in various situations.