May 27, 2024 No Comments

Automating Web Tasks with Playwright in Python

In the dynamic landscape of web development, automating repetitive tasks is not just a luxury—it’s a necessity. One of the most powerful tools for web automation is Playwright, a Node library extended to support Python. It allows for robust end-to-end testing, automating interactions with web pages in a way that simulates real user behaviors. In this blog, we’ll explore how to use Playwright in Python to automate a common web task: logging into a website and verifying its functionalities.

Why Choose Playwright?

Playwright stands out for its ability to support multiple browsers (like Chrome, Firefox, and WebKit) using a single API. It is fast, reliable, and capable of running tests in headless mode (without a GUI) which is great for automated test environments.

Setting Up Your Environment

Before diving into the code, ensure your environment is set up. You’ll need Python installed on your system, along with the Playwright package and its browser binaries.

1. Install Playwright using pip:

				
					pip install playwright
				
			

2. Run the Playwright command to install the necessary browser binaries:

				
					playwright install
				
			

Writing the Automation Script

Here, we are going to test the functionality of the resume parser and invoice parser on the DocSaar domain. (https://www.docsaar.com/). 

Let’s break down the task into a simple script that logs into a domain and checks certain functionalities. We’ll assume the website has a login form and several pages or functions we want to verify.

Step 1: Import Playwright and Start a Browser Session

Here in the code, in the “playwright_setup” function, we are initiating the browser and session for Playwright.

Step 2: Log In to the Website

In the “login” function in the code, we attempt to log in to the domain using the email ID and password, and we check whether we can successfully log in to the domain or not.

Step 3: Perform and Verify Functionalities

After logging in, you would typically want to check if certain functionalities are working as expected. This could include navigating to different sections and performing some tasks specific to your domain.

Here we are testing the functionality of the resume parser and invoice parser. First, we navigate to the respective functionality pages and upload the files accordingly – resumes on the resume parser page and invoices on the invoice parser page. After uploading the file, we click on the submit button to process it. In the code, we wait for a maximum of 40 seconds (which can be adjusted further) to check if the file has been processed. Once the file is processed, we verify if we receive a response for the file. If we receive a successful response, then we can conclude that the functionality is working as expected.

For performing resume parsing, we will go to the resume parser page, upload a resume, and then click on the submit button.

After clicking on the submit button, the resume will be processed to generate a response. It can take 30-40 seconds to generate the response.

After the resume is processed, we should be able to get the results for the parsed resume.

We can perform a similar functionality for the invoice parser by uploading the invoice file and getting the result for it.

Step 4: Clean Up

Always make sure to close the browser session after your tasks are completed to free up resources.

				
					context.close()
browser.close()
				
			
Code
				
					import time
from playwright.sync_api import sync_playwright

LOGIN_EMAIL = '<YOUR_EMAIL>'
LOGIN_PASSWORD = '<YOUR_PASSWORD>'
RESUME_FILE_PATH = '/path/to/resume/file'
INVOICE_FILE_PATH = '/path/to/invoice/file'



class DocsaarFunctionality:
    # Domain login url
    BASE_URL = "https://www.docsaar.com/login"

    def __init__(self, playwright, browser_type="chromium", headless=True):
        self.playwright = playwright
        self.browser_type = browser_type
        self.headless = headless
        self.page = None

    def getting_request(self):
        response = self.page.goto(self.BASE_URL)
        time.sleep(1)

        return response

    def check_status(self, response):
        return response.status

    def playwright_setup(self):

        if self.browser_type == "chromium":
            browser = self.playwright.chromium.launch(headless=self.headless)
        elif self.browser_type == "firefox":
            browser = self.playwright.firefox.launch(headless=self.headless)
        elif self.browser_type == "webkit":
            browser = self.playwright.webkit.launch(headless=self.headless)
        else:
            raise ValueError(f"Unsupported browser type: {self.browser_type}")

        context = browser.new_context()
        page = context.new_page()

        self.page = page
        self.browser = browser

        return browser, context, page

    # Login functionality to log in to the domain with the provided email and password.
    def login(self):
        json = {}
        try:
            response = self.getting_request()
            status_code = self.check_status(response)

            if status_code == 200:

                self.page.locator("input[name=\"email\"]").click()
                self.page.locator("input[name=\"email\"]").fill(LOGIN_EMAIL) # Add email for login
                self.page.locator("input[name=\"password\"]").click()
                self.page.locator("input[name=\"password\"]").fill(LOGIN_PASSWORD) # Add password for login
                self.page.get_by_role("button", name="Login").click() # Click on login button

                time.sleep(5)
                try:
                    # Check whether we are able to successfully log in to the domain and are redirected to the Home page (Dashboard visible).
                    dashboard_title_element = page.query_selector('h1[side-menuclass="page-title"]')
                    if dashboard_title_element:
                        is_visible = dashboard_title_element.is_visible()
                        if is_visible:

                            json["success"] = True
                            json["error"] = None
                            return json

                    json["success"] = False
                    json["error"] = 'Invalid user name or password'
                    return json

                except Exception as e:

                    json["success"] = False
                    json["error"] = f"Playwright error :{e}"

                    return json

            else:

                json["success"] = False
                json["error"] = status_code

                return json

        except Exception as e:
            json["success"] = False
            json["error"] = f"Playwright Error: {e}"

            return json


    # Resume Parser
    def resume_parser(self):
        json = {}

        try:
            # Navigate to the resume parser page.
            page.click('a.side-menu__item[href="/chatgpt_resume_parsing"]')
            time.sleep(5)

            # locate the upload file section and upload the file
            file_input = page.query_selector('input[type="file"]')
            if file_input:
                file_input.set_input_files(RESUME_FILE_PATH)
            else:
                json["success"] = False
                json["error"] = "Unable to find the file upload path"

                return json
            time.sleep(3)

            # Check for the file processing
            uploading_element =  page.query_selector('.ulProgress')
            page.get_by_role("button", name="Submit").click()

            time.sleep(5)

            start_time = time.time()
            max_duration = 40

            while time.time() - start_time < max_duration:
                uploading_element = page.query_selector('.ulProgress')
                if uploading_element:
                    is_visible = uploading_element.is_visible()
                    if is_visible:
                        print("pdf is processing ..")
                    else:
                        print("prf processing is completed !")
                        break
                else:
                    json["success"] = False
                    json["error"] = "Unable to find the file processing"

                    return json

                page.wait_for_timeout(1000)

            # Check wheather we are able to get the result for the file or not.
            result_element = page.query_selector('#chatgpt_resume_parsing_result_data')
            if result_element:
                is_visible = result_element.is_visible()
                if is_visible:
                    json["success"] = True
                    json["error"] = "sucessfully able to get the results for resume"
                    time.sleep(10)

                    return json
                else:
                    json["success"] = False
                    json["error"] = "Unsucessfull to get the result for resume"

                    return json
            else:
                print("Unsucessfull to complete the process")
                json["success"] = False
                json["error"] = "Unsucessfull to complete the process"

                return json

        except Exception as e:
            json["success"] = False
            json["error"] = f"Playwright Error: {e}"

            return json

    # Invoice Parser
    def invoice_parser(self):
        json = {}

        try:
            # Navigate to the invoice parser page.
            page.click('a.side-menu__item[href="/chatgpt_invoice_parsing"]')
            time.sleep(5)

            file_input = page.query_selector('input[type="file"]')
            if file_input:
                file_input.set_input_files(INVOICE_FILE_PATH)
            else:
                json["success"] = False
                json["error"] = "Unable to find the file upload path"

                return json
            time.sleep(3)

            # Check for the file processing
            uploading_element =  page.query_selector('.ulProgress')
            page.get_by_role("button", name="Submit").click()

            time.sleep(5)

            start_time = time.time()
            max_duration = 40

            while time.time() - start_time < max_duration:
                uploading_element = page.query_selector('.ulProgress')
                if uploading_element:
                    is_visible = uploading_element.is_visible()
                    if is_visible:
                        print("pdf is processing ..")
                    else:
                        print("prf processing is completed !")
                        break
                else:
                    json["success"] = False
                    json["error"] = "Unable to find the file processing"

                    return json

                page.wait_for_timeout(1000)

            # Check wheather we are able to get the result for the file or not.
            result_element = page.query_selector('#chatgpt_invoice_parsing_result_data')
            if result_element:
                is_visible = result_element.is_visible()
                if is_visible:
                    json["success"] = True
                    json["error"] = "sucessfully able to get the results for invoice"
                    time.sleep(10)

                    return json
                else:
                    json["success"] = False
                    json["error"] = "Unsucessfull to get the result for invoice"

                    return json
            else:
                print("Unsucessfull to complete the process")
                json["success"] = False
                json["error"] = "Unsucessfull to complete the process"

                return json

        except Exception as e:
            json["success"] = False
            json["error"] = f"Playwright Error: {e}"

            return json



if __name__=="__main__":
    context = None
    browser = None
    try:
        with sync_playwright() as playwright:
            docsaar_functionality = DocsaarFunctionality(playwright, browser_type='chromium', headless=False)
            browser, context, page = docsaar_functionality.playwright_setup()

            login_response = docsaar_functionality.login()
            print("login_response-->",login_response)

            if login_response['success'] == True:
                resume_parser_response = docsaar_functionality.resume_parser()
                print("resume_parser_response-->",resume_parser_response)
                time.sleep(2)
                invoice_parser_response = docsaar_functionality.invoice_parser()
                print("resume_parser_response-->",invoice_parser_response)
                time.sleep(2)

            if context:
                context.close()
            if browser:
                browser.close()
    except Exception as e:
        print(f"An error occurred: {e}")
				
			

Conclusion

Using Playwright in Python to automate web tasks is a robust and efficient way to perform end-to-end testing and ensure your website’s functionalities are performing correctly. This script provides a basic framework that can be expanded based on specific needs and the complexity of the web application. With Playwright, you can automate almost any web interaction, making your testing process faster and more reliable.

Write a comment

Your email address will not be published. Required fields are marked *

Pragnakalp Techlabs: Your trusted partner in Python, AI, NLP, Generative AI, ML, and Automation. Our skilled experts have successfully delivered robust solutions to satisfied clients, driving innovation and success.