How Selenium Works: Episodes 1 - 2

Translation of the article prepared in advance of the start of the course “Java QA Engineer”.

Episode 1 – Transportation

As a result of the dialogue that took place on the last weekend of January 2020, which was dedicated to one of the problems in Selenium, where someone told me “Why don’t you just do that …” In response to an explanation of the problem, I decided to write a series of articles explaining the commands in Selenium WebDriver and why we ended up with the design we have today.

I will repeat this in every episode of this series – a lot, sometimes to the point of madness, a lot of thought and thought was put into the work of every smallest detail of Selenium.

Why?

By chance and how well he does what he is intended to do, Selenium is used by millions of people around the world. This is how the widest range of companies from small startups to Microsoft and Google makes sure that their site works in every browser.

How does Selenium interact with the browser?

Over the years of Selenium, we ended up focusing on using HTTP as a way to interact with the browser. We created the REST-ish API (REST-ish – in the spirit of REST), which can use each client binding and get, we hope, the same results.

HTTP and REST-ish? Seriously?

Yeah …

Let’s start with the HTTP part. When we first started, we had to support different ways of interacting with each browser, based on the best approach for each of them. For example, for Internet Explorer, we wrote COM code. He worked well, but from him we still have nightmares. For Firefox, we wrote a line-by-line reading monster, which, fortunately, thanks to Mozilla’s “make your browser” approach, was capable of much. Opera allowed us to enter through the DevTools protocol.

Thus, this meant that, especially in the early years of WebDriver, we needed to support N: M bindings, where N is the language bindings and M are the browsers we support. This path does not lead to a good product. We decided that we needed something that every language would understand. We also needed something pretty reliable. So the choice fell on HTTP, and we started to create JSONWireProtocol.

Within JSONWireProtocol we built a REST-ish interface that communicates in JSON. I say REST-ish because it does not follow absolutely all the principles of REST, but rather embodies them to a degree sufficient to make it a powerful tool for our tasks.

What about the current state of things?

Web, Internet and the world move on. So why is Selenium not?

This is a good question, but the fact is that we are trying to modernize. Unfortunately, the network is characterized by a state when it does not work, if it is not running. HTTP is pretty reliable as a protocol. It also allows people to create clusters for testing without worrying about how multiplexing will work. This is the reason Selenium Grid was created, which still remains a good choice when it comes to organizing testing with multiple devices and computers.

But it looks more like a network.

So … There are tools that use its Chrome Debug Protocol to control Chrome. They do some things better than Selenium, which is a consequence of their choice of how to communicate with the browser. Unfortunately, this is the proprietary protocol of Chrome, and making it accessible to other browsers is not in the interest of Google.

In addition, ignoring interesting design solutions from the Google team, there is such a problem that we must have a constantly open connection. In this case, it uses WebSockets, but you should recall my comment above that the Internet is not available while it is not working. WebSockets will constantly reconnect. There is also the problem of how much traffic will go up and down this channel.

This is not a problem for the puppeteer when you interact with something only on your local computer, but if you integrate a CI service such as Circle CI or TravisCI and something like AWS Device Farm, Sauce Labs or BrowserStack, between you and your runner suddenly gets the Internet, and this data should reach the recipients.

The W3C Browser Testing and Tools working group, consisting of browser vendors and Selenium guys, is trying to design how everything should look to be sure that we can ensure cross-browser compatibility from the very beginning, without resorting to strange hacker patches and personal delivery of these browsers.

Want to know more?

WebDriver shipping specification details.

Episode 2 – Navigation
In this episode, we will look at a huge amount of work related to navigation.

from selenium import webdriver

driver = webdriver.Firefox()
driver.get("https://www.theautomatedtester.co.uk")

What we see above … looks quite simple, right … but here it is!

In fact, this leads me to a calloused interview. If someone ever asks you to describe what happens in the browser when you type the URL and press enter, there is a high probability that he does not have a real idea of what is happening in the navigation. Anyway … back to Selenium and its navigation.

driver.get

As soon as we send the request through transport layerreal fun begins. We need to find out where we want to go, and tell the browser to go there for us. Unfortunately, this will lead to the first of many problems for automation.

Certificates

If you have ever worked in large corporate companies, where technology is not the main one, you understand what kind of pain the development team may face. Now double this pain, and you can imagine the pain that testers and automation engineers have to deal with.

One of these pains is certificates. Companies are stingy and will make self-signed certificates and other deformities. Especially in the early days of Selenium, when there were no services like Let’s encrypt. And even now, most developers and QA groups rarely have access to configuration changes in their test environments or on their CI servers. We needed to find a way around certificates. (This is one of the first reasons Selenium is seen as a security risk.)

Thus, each of the browsers skips badly configured certificates during automation. If they had not realized this, then many testers / developers would not have the opportunity to test their sites.

Now … when we got around the first problem, we need to move on to loading the page.

Loading

As soon as we receive the certificates, we get the page to download. Fortunately, we do not need to do anything more complicated than the equivalent

location = "https://www.theautomatedtester.co.uk"; 
// или
window.location.href = "https://www.theautomatedtester.co.uk";

Done …

When Selenium “completes” the team, we will receive a refund. So, we just need to wait for the page to load. To be completely honest, what does it mean “Finished download”?

The browser will fire several different events. We will know when the page will be displayed, and then what is it readyState. Selenium will check all this and will also expect DOMContentLoaded.

And then a problem arises if you are on a page and trying to jump to the anchor on that page. Let’s look at the following example.

from selenium import webdriver

driver = webdriver.Firefox()
driver.get("https://www.theautomatedtester.co.uk")
driver.get("https://www.theautomatedtester.co.uk#someAnchor")

Oh, just look, none of the page load events will be fired! Class? A stupid browser will work efficiently and just scroll to the anchor. This means that we cannot depend solely on events emanating from the page.

No, therefore, we need a special code for this case and an understanding of what “ready” means in this case. If we make a mistake, this will create many unstable tests. Unstable tests make people grumpy, and we do not want grumpy developers, there are enough of them in the world and without unstable tests.

After we have completed these checks, and you can manipulate the way we look at these events, if you want them to load faster using Page Load Strategies. This is more of a paver user feature, so I would not bother about them now, but they affect the speed of navigation commands.

What about JavaScript frameworks and navigation

This is the place where all the “fun” is concentrated. Many frameworks will still load a lot after the initial loading on the page. If you have ever worked on a single-page application or just used it, you have seen many elements displayed as they load. Unfortunately, this means that you cannot just rely on the return of the navigation command. You will need to add a command WebDriverWait to your code, as shown below, to make sure your test is in the correct state before you set out to do what it needs.

from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait

driver = webdriver.Firefox()
driver.get("https://www.theautomatedtester.co.uk")
element = WebDriverWait(driver, 10).until(lambda x: x.find_element_by_id(“someId”))

Conclusion

When loading a page, do not always rely on Selenium returning when the page has finished loading. If you need to look at an element on a page, then do it. Just know how JavaScript on the page can change it after the initial load.

For further reading

Learn more about the course.

How Selenium Works: Episodes 1 – 2

Episode 1 – Transportation

How does Selenium interact with the browser?

Want to know more?

Certificates

Loading

Done …

What about JavaScript frameworks and navigation

Conclusion

For further reading

What to play? – Final Fantsy IX Moguri Mod

Palm OS – what does it look like today?

Upgrade your interaction with MobX

Google releases MLP-Mixer: MLP architecture for computer vision

Who will eat all the plastic and can we find this hero

We made a new DNS*

Leave a Reply Cancel reply

Episode 1 – Transportation

How does Selenium interact with the browser?

Want to know more?

Certificates

Loading

Done …

What about JavaScript frameworks and navigation

Conclusion

For further reading

Similar Posts

Leave a Reply Cancel reply