Automating Web Tasks with Playwright in Python
In the dynamic landscape of web development, automating repetitive tasks is not just a luxury—it’s a necessity. One of the most powerful tools for web automation is Playwright, a Node library extended to support Python. It allows for robust end-to-end testing, automating interactions with web pages in a way that simulates real user behaviors. In this blog, we’ll explore how to use Playwright in Python to automate a common web task: logging into a website and verifying its functionalities.
Why Choose Playwright?
Playwright stands out for its ability to support multiple browsers (like Chrome, Firefox, and WebKit) using a single API. It is fast, reliable, and capable of running tests in headless mode (without a GUI) which is great for automated test environments.
Setting Up Your Environment
Before diving into the code, ensure your environment is set up. You’ll need Python installed on your system, along with the Playwright package and its browser binaries.
1. Install Playwright using pip:
pip install playwright
2. Run the Playwright command to install the necessary browser binaries:
playwright install
Writing the Automation Script
Here, we are going to test the functionality of the resume parser and invoice parser on the DocSaar domain. (https://www.docsaar.com/).
Let’s break down the task into a simple script that logs into a domain and checks certain functionalities. We’ll assume the website has a login form and several pages or functions we want to verify.
Step 1: Import Playwright and Start a Browser Session
Here in the code, in the “playwright_setup” function, we are initiating the browser and session for Playwright.
Step 2: Log In to the Website
In the “login” function in the code, we attempt to log in to the domain using the email ID and password, and we check whether we can successfully log in to the domain or not.
Step 3: Perform and Verify Functionalities
After logging in, you would typically want to check if certain functionalities are working as expected. This could include navigating to different sections and performing some tasks specific to your domain.
Here we are testing the functionality of the resume parser and invoice parser. First, we navigate to the respective functionality pages and upload the files accordingly – resumes on the resume parser page and invoices on the invoice parser page. After uploading the file, we click on the submit button to process it. In the code, we wait for a maximum of 40 seconds (which can be adjusted further) to check if the file has been processed. Once the file is processed, we verify if we receive a response for the file. If we receive a successful response, then we can conclude that the functionality is working as expected.
For performing resume parsing, we will go to the resume parser page, upload a resume, and then click on the submit button.
After clicking on the submit button, the resume will be processed to generate a response. It can take 30-40 seconds to generate the response.
After the resume is processed, we should be able to get the results for the parsed resume.
We can perform a similar functionality for the invoice parser by uploading the invoice file and getting the result for it.
Step 4: Clean Up
Always make sure to close the browser session after your tasks are completed to free up resources.
context.close()
browser.close()
Code
import time
from playwright.sync_api import sync_playwright
LOGIN_EMAIL = ''
LOGIN_PASSWORD = ''
RESUME_FILE_PATH = '/path/to/resume/file'
INVOICE_FILE_PATH = '/path/to/invoice/file'
class DocsaarFunctionality:
# Domain login url
BASE_URL = "https://www.docsaar.com/login"
def __init__(self, playwright, browser_type="chromium", headless=True):
self.playwright = playwright
self.browser_type = browser_type
self.headless = headless
self.page = None
def getting_request(self):
response = self.page.goto(self.BASE_URL)
time.sleep(1)
return response
def check_status(self, response):
return response.status
def playwright_setup(self):
if self.browser_type == "chromium":
browser = self.playwright.chromium.launch(headless=self.headless)
elif self.browser_type == "firefox":
browser = self.playwright.firefox.launch(headless=self.headless)
elif self.browser_type == "webkit":
browser = self.playwright.webkit.launch(headless=self.headless)
else:
raise ValueError(f"Unsupported browser type: {self.browser_type}")
context = browser.new_context()
page = context.new_page()
self.page = page
self.browser = browser
return browser, context, page
# Login functionality to log in to the domain with the provided email and password.
def login(self):
json = {}
try:
response = self.getting_request()
status_code = self.check_status(response)
if status_code == 200:
self.page.locator("input[name=\"email\"]").click()
self.page.locator("input[name=\"email\"]").fill(LOGIN_EMAIL) # Add email for login
self.page.locator("input[name=\"password\"]").click()
self.page.locator("input[name=\"password\"]").fill(LOGIN_PASSWORD) # Add password for login
self.page.get_by_role("button", name="Login").click() # Click on login button
time.sleep(5)
try:
# Check whether we are able to successfully log in to the domain and are redirected to the Home page (Dashboard visible).
dashboard_title_element = page.query_selector('h1[side-menuclass="page-title"]')
if dashboard_title_element:
is_visible = dashboard_title_element.is_visible()
if is_visible:
json["success"] = True
json["error"] = None
return json
json["success"] = False
json["error"] = 'Invalid user name or password'
return json
except Exception as e:
json["success"] = False
json["error"] = f"Playwright error :{e}"
return json
else:
json["success"] = False
json["error"] = status_code
return json
except Exception as e:
json["success"] = False
json["error"] = f"Playwright Error: {e}"
return json
# Resume Parser
def resume_parser(self):
json = {}
try:
# Navigate to the resume parser page.
page.click('a.side-menu__item[href="/chatgpt_resume_parsing"]')
time.sleep(5)
# locate the upload file section and upload the file
file_input = page.query_selector('input[type="file"]')
if file_input:
file_input.set_input_files(RESUME_FILE_PATH)
else:
json["success"] = False
json["error"] = "Unable to find the file upload path"
return json
time.sleep(3)
# Check for the file processing
uploading_element = page.query_selector('.ulProgress')
page.get_by_role("button", name="Submit").click()
time.sleep(5)
start_time = time.time()
max_duration = 40
while time.time() - start_time < max_duration:
uploading_element = page.query_selector('.ulProgress')
if uploading_element:
is_visible = uploading_element.is_visible()
if is_visible:
print("pdf is processing ..")
else:
print("prf processing is completed !")
break
else:
json["success"] = False
json["error"] = "Unable to find the file processing"
return json
page.wait_for_timeout(1000)
# Check wheather we are able to get the result for the file or not.
result_element = page.query_selector('#chatgpt_resume_parsing_result_data')
if result_element:
is_visible = result_element.is_visible()
if is_visible:
json["success"] = True
json["error"] = "sucessfully able to get the results for resume"
time.sleep(10)
return json
else:
json["success"] = False
json["error"] = "Unsucessfull to get the result for resume"
return json
else:
print("Unsucessfull to complete the process")
json["success"] = False
json["error"] = "Unsucessfull to complete the process"
return json
except Exception as e:
json["success"] = False
json["error"] = f"Playwright Error: {e}"
return json
# Invoice Parser
def invoice_parser(self):
json = {}
try:
# Navigate to the invoice parser page.
page.click('a.side-menu__item[href="/chatgpt_invoice_parsing"]')
time.sleep(5)
file_input = page.query_selector('input[type="file"]')
if file_input:
file_input.set_input_files(INVOICE_FILE_PATH)
else:
json["success"] = False
json["error"] = "Unable to find the file upload path"
return json
time.sleep(3)
# Check for the file processing
uploading_element = page.query_selector('.ulProgress')
page.get_by_role("button", name="Submit").click()
time.sleep(5)
start_time = time.time()
max_duration = 40
while time.time() - start_time < max_duration:
uploading_element = page.query_selector('.ulProgress')
if uploading_element:
is_visible = uploading_element.is_visible()
if is_visible:
print("pdf is processing ..")
else:
print("prf processing is completed !")
break
else:
json["success"] = False
json["error"] = "Unable to find the file processing"
return json
page.wait_for_timeout(1000)
# Check wheather we are able to get the result for the file or not.
result_element = page.query_selector('#chatgpt_invoice_parsing_result_data')
if result_element:
is_visible = result_element.is_visible()
if is_visible:
json["success"] = True
json["error"] = "sucessfully able to get the results for invoice"
time.sleep(10)
return json
else:
json["success"] = False
json["error"] = "Unsucessfull to get the result for invoice"
return json
else:
print("Unsucessfull to complete the process")
json["success"] = False
json["error"] = "Unsucessfull to complete the process"
return json
except Exception as e:
json["success"] = False
json["error"] = f"Playwright Error: {e}"
return json
if __name__=="__main__":
context = None
browser = None
try:
with sync_playwright() as playwright:
docsaar_functionality = DocsaarFunctionality(playwright, browser_type='chromium', headless=False)
browser, context, page = docsaar_functionality.playwright_setup()
login_response = docsaar_functionality.login()
print("login_response-->",login_response)
if login_response['success'] == True:
resume_parser_response = docsaar_functionality.resume_parser()
print("resume_parser_response-->",resume_parser_response)
time.sleep(2)
invoice_parser_response = docsaar_functionality.invoice_parser()
print("resume_parser_response-->",invoice_parser_response)
time.sleep(2)
if context:
context.close()
if browser:
browser.close()
except Exception as e:
print(f"An error occurred: {e}")
Conclusion
Using Playwright in Python to automate web tasks is a robust and efficient way to perform end-to-end testing and ensure your website’s functionalities are performing correctly. This script provides a basic framework that can be expanded based on specific needs and the complexity of the web application. With Playwright, you can automate almost any web interaction, making your testing process faster and more reliable.