Python for beginners: how to command the Web’s

A journey to easy web automation

image

Problem: Sending homework requires navigating through the labyrinth of web pages so complex that I send the job to the wrong place several times. In addition, although this process takes only 1-2 minutes, sometimes it seems like an insurmountable obstacle (for example, when I finished the task too late at night and can hardly remember my password).

Decision: Use Python to automatically submit completed tasks! Ideally, I could save the task, type a few keys and load my work in seconds. At first it sounded too good to be true, but then I discovered Selenium, a tool that you can use with Python to navigate the web.

image

Each time we repeat tedious actions on the Internet with the same sequence of steps, this is a great chance to write a program to automate the process. With Selenium and Python, we just need to write the script once, and then we can run it as many times as necessary and save ourselves from repeating the same tasks (and in my case the possibility of sending the task to the wrong place is excluded)!

Here I will look at a solution I developed to automatically (and correctly) submit my assignments. Along the way, we will cover the basics of using Python and selenium for programmatic web management. Although this program works (I use it every day!), It is quite individual, so you won’t be able to copy and paste the code for your application. However, the general methods here can be applied to an unlimited number of situations. (If you want to see the full code, it available on github)

An approach

Before we get to the interesting part of automation, we need to find out the general structure of our solution. Starting programming without a plan is a great way to spend many hours and be disappointed. I want to write a program to send completed class tasks to the right place in Canvas’e (“Learning management system” my university). Let’s start again, I need a way to tell the program the name of the job to send and the class. I used a simple approach and created a folder to store completed tasks with child folders for each class. In the child folders, I put the finished document, named for a specific task. The program can find out the name of the class from the folder and the name of the task from the name of the document.
Here is an example where the class name is EECS491 and the task is Task 3 — Output in large graphic models. ”

image

File structure (left) and Complete Assignment (right).

The first part of the program is a loop that goes through the folders to find the job and class that we store in the Python tuple:

# os for file management
import os
# Build tuple of (class, file) to turn in
submission_dir="completed_assignments"
dir_list = list(os.listdir(submission_dir))
for directory in dir_list:
    file_list = list(os.listdir(os.path.join(submission_dir, 
directory)))
    if len(file_list) != 0:
        file_tup = (directory, file_list[0])
    
print(file_tup)

(‘EECS491’, ‘Assignment 3 – Inference in Larger Graphical Models.txt’)

This takes care of file management, and now the program knows the class and task to include. The next step is to use Selenium to go to the correct web page and download the task.

Web control with Selenium

To get started with Selenium, we import the library and create a web driver, which is a browser controlled by our program. In this case, I will use Chrome as a browser and send the driver to the Canvas website, where I submit jobs.

import selenium
# Using Chrome to access web
driver = webdriver.Chrome()
# Open the website
driver.get('https://canvas.case.edu')

When we open the Canvas web page, we are faced with the first obstacle – the entry field! To get around this, we will need to enter the identifier and password and press the login button.

image

Imagine that a web driver is a person who has never seen a web page before: we need to say exactly where to click, what to print, and which buttons to click. There are several ways to tell our web driver which elements to find, and they all use selectors. Selector Is the unique identifier of an element on a web page. To find the selector for a particular element, say, the “CWRU ID” field, we need to look at the code of the web page. In Chrome, this can be done by pressing “Ctrl + Shift + I” or by right-clicking on any element and selecting “View Code”. It opens Chrome Developer Tools, an extremely useful application that features HTML underlying any web page.

To find the selector for the “CWRU ID” field, I right-clicked on the field, clicked “View Code” and saw the following in the developer tools. The highlighted line corresponds to the id_box element (this line is called the HTML tag).

image

This HTML may look overwhelming, but we can ignore most of the information and focus on parts id = "username" and name = "username". (they are known as HTML tag attributes).
To select a field id using our web driver, we can use the attribute id or namewhich we found in the developer tools. Web drivers in Selenium have many different ways to select items on a web page, and there are often several ways to select the same item:

# Select the id box
id_box = driver.find_element_by_name('username')
# Equivalent Outcome! 
id_box = driver.find_element_by_id('username')

Our program now has access to id_box, and we can interact with it in various ways, such as entering keys or pressing (if we selected a button).

# Send id information
id_box.send_keys('my_username')

We perform the same process for the password input field and the login button, selecting each one depending on what we see in the Chrome developer tools. Then we send information to the elements or click on them as necessary.

# Find password box
pass_box = driver.find_element_by_name('password')
# Send password
pass_box.send_keys('my_password')
# Find login button
login_button = driver.find_element_by_name('submit')
# Click login
login_button.click()

Once we are logged in, this slightly intimidating toolbar welcomes us:

image

We again need to run the program through the web page, indicating exactly those elements that need to be clicked, and the information that needs to be entered. In this case, I tell the program to select courses from the menu on the left, and then the class corresponding to the task that I need to pass:

# Find and click on list of courses
courses_button = driver.find_element_by_id('global_nav_courses_link')
courses_button.click()
# Get the name of the folder
folder = file_tup[0]
    
# Class to select depends on folder
if folder == 'EECS491':
    class_select = driver.find_element_by_link_text('Artificial Intelligence: Probabilistic Graphical Models (100/10039)')
elif folder == 'EECS531':
    class_select = driver.find_element_by_link_text('Computer Vision (100/10040)')
# Click on the specific class
class_select.click()

The program finds the correct class using the name of the folder that we saved in the first step. In this case, I use the selection method find_element_by_link_textto find a specific class. The link text for an element is just another selector that we can find by looking at the page:

image

This workflow may seem a little tedious, but remember that we only need to do this once when we write our program! After that, we can click “Run” as many times as we want, and the program will go to us on all these pages.
We use the same process of checking the page – selecting an element – interacting with the element to go through a couple more screens. Finally, we reach the job submission page:

image

At that moment I could see the finish line, but initially this screen puzzled me. I could quite easily click on the “Select file” field, but how should I choose the right file to upload? The answer is incredibly simple! We find the field Choose File using the selector and use the method send_keys to transfer the exact path to the file (called file_location in the code below) to the block:

# Choose File button
choose_file = driver.find_element_by_name('attachments[0][uploaded_data]')
# Complete path of the file
file_location = os.path.join(submission_dir, folder, file_name)
# Send the file location to the button
choose_file.send_keys(file_location)

By sending the exact file path, we can skip the entire process of navigating folders to find the file you need. After sending the path, we get the following screen showing that our file has been downloaded and is ready to be sent.

image

Now we select the button “Submit a task”, press, and our task is sent!

# Locate submit button and click
submit_assignment = driver.find_element_by_id('submit_file_button')
submit_assignent.click()
image

Cleaning

File management is always a critical step, and I want to be sure that I won’t resubmit or lose old jobs. I decided that the best solution would be to save the file to be placed in the folder completed_assignments, and move the files to the folder submitted_assignmentsas soon as they are downloaded. The last bit of code uses the os module to move the completed job to the right place.

# Location of files after submission
submitted_file_location = os.path.join(submitted_dir, submitted_file_name)
# Rename essentially copies and pastes files
os.rename(file_location, submitted_file_location)

All source code is packaged in a single script that I can run from the command line. To limit the possibility of errors, I send only one task at a time, which is not difficult, given that the program starts in only about 5 seconds!

Here is what it looks like when I run the program:

image

The program gives me the opportunity to make sure that this is the correct task before downloading. After the program ends, I get the following output:

image

While the program is running, I can observe how Python works for me:

image

conclusions

The Python automation technique is great for many tasks, both general and in my area of ​​data science. For example, we could use Selenium to automatically download new data files every day (assuming the website doesn’t have API) Although scripting might seem time consuming at first glance, the advantage is that we can get the computer to repeat this sequence as many times as we want, in exactly the same way. The program will never lose focus and go to Twitter. It will accurately follow the steps with perfect sequence (the algorithm will work fine until the site changes).

I must mention that you must be careful before automating critical tasks. This example is relatively low risk since I can always go back and resubmit jobs, and I usually double-check the program. Websites are changing, and if you do not change the program in return, you can get a script that does something completely different than you originally expected!

In terms of payback, this program saves me about 30 seconds for each task, and it takes 2 hours to write it. So, if I use it to complete 240 assignments, I will be a plus in time! However, the return on this program is to develop a cool solution to the problem and learn a lot in the process. Although my time could have been spent more efficiently on completing assignments rather than figuring out how to pass them automatically, I completely enjoyed this task. There are a few things that bring satisfaction such as problem solving, and Python is a pretty good tool to do this. .

image

Learn the details of how to get a sought-after profession from scratch or Level Up in skills and salary by taking paid SkillFactory online courses:


Read more

  • Trends in Data Scenсe 2020
  • Data Science is dead. Long live Business Science
  • The coolest Data Scientist does not waste time on statistics
  • How to Become a Data Scientist Without Online Courses
  • Sorting cheat sheet for Data Science
  • Data Science for the Humanities: What is Data
  • Steroid Data Scenario: Introducing Decision Intelligence

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *