How to take screenshots of a web page in python

We will take screenshots through the browser Chrome (any other is possible) and the module selenium on python

Screenshots will be taken on Ubuntu 20.04. However, this is operating system independent and will work on Mac os or Windows (but did not check the work here, this is not accurate).

The instructions will be given specifically for Ubuntu, because everything is easier on Mac OS. For example, you don’t need to install Chrome from the command line or run it in stealth mode.

Install Chrome

Downloading:

wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb

Install (during installation, enter the Linux user password):

sudo apt install/google-chrome-stable_current_amd64.deb

Install Chromium

As an alternative to chrome:

sudo apt-get install chromium-browser

Install selenium

Installing from a virtual environment:

Installing the chromedriver driver

There is one peculiarity in the driver installation. It must be the same version as the browser. And since the browser is constantly updated, then every time you have to look for the right driver for it. To avoid this, you can use webdriver-manager

Installation:

pip install webdriver-manager

Installing the driver manually

Sometimes the above method does not find the browser. Errors that occur below in the article. I still couldn’t run Google chrome through webdriver-manager

and)
Install the driver from the official website:

https://chromedriver.chromium.org/

Move it to the directory of executable files:

mv chromedriver /usr/local/bin/

Assign Execution Rights:

chmod + x /usr/local/bin/chromedriver

b)
You can also try using the command to install the driver:

sudo apt install chromium-chromedriver

Further in the code use:

driver = webdriver.Chrome()

or with an indication of the path, if it does not find:

driver = webdriver.Chrome(‘/ usr / local / bin / chromedriver’)

Debag

If a webdriver refuses to see browser or driver, some commands may help.

Find out which version is installed:

chromedriver -v
google-chrome -version

Find out browser locations:

whereis chromium
whereis chromium-browser
whereis google-chrome

Final code

For good it would be limited to one ChromeDriverManager, but in the real world, what is described in the docs does not always work. As I wrote above, Google Chrome did not work for me on Ubuntu 20.04. Or rather, 1 screenshot was taken, and then he did not see it again.

So I installed Chromium, found the path to the driver via the command whereis chromiumthen commented out ChromeDriverManager… Instead, he indicated the path to it through the property executable_path

In addition, I applied an environment variable. On Mac OS (in local development), everything worked well for me, I left ChromeDriverManager, and on a remote server I used the above method.

An environment variable can be put in Flask config files like this:

os.environ[‘SERVER_ENV’] = ‘local’

The whole code turned out like this:

import os
from selenium import webdriver
from selenium.webdriverchromeoptions import Options
from webdriver_manager.chrome import ChromeDriverManager
from webdriver_manager.utils import ChromeType

def save_screenshot(link, img):
# Save only the visible part of the open page
try:
chrome_options = Options()
chrome_options.add_argument(“–headless”)
chrome_options.add_argument(‘–no-sandbox’)

if osenvironget(‘SERVER_ENV’) == ‘local’:
driver = webdriver.Chrome(ChromeDriverManager(chrome_type=ChromeType.GOOGLE, cache_valid_range=five)install(), options=chrome_options)
else:
# driver = webdriver.Chrome (ChromeDriverManager (chrome_type = ChromeType.CHROMIUM, cache_valid_range = 5) .install (), options = chrome_options)
driver = webdriver.Chrome(executable_path=‘/snap/bin/chromium.chromedriver’, options=chrome_options)

driver.get(link)
driver.save_screenshot()
driver.quit()

return True
except Exception as e:
print(‘Error while saving screenshot: {}’format(e))

cache_valid_range = 5 – this parameter means the cache in days. For this number of days, the driver will be cached before going again to search for a new version and install it.

Error: This version of ChromeDriver only supports Chrome version

This error occurs if the driver and browser versions do not match:

raise exception_class (message, screen, stacktrace)
selenium.common.exceptions.SessionNotCreatedException: Message: session not created: This version of ChromeDriver only supports Chrome version 85

Error: ubuntu The process started from chrome location

The Ubuntu server might get an error:

ubuntu The process started from chrome location / usr / bin / google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.

If the launch is from the command line, then you need to add the launch of the browser in hidden mode. This is done by adding options:

chrome_options = Options()
chrome_options.add_argument(“–headless”)
chrome_options.add_argument(‘–no-sandbox’)

Error: ValueError: Could not get version for Chrome

Another error:

ValueError: Could not get version for Chrome with this command: google-chrome

May occur if property is not specified chrome_type = ChromeType.GOOGLE, but it does not always help.

Error: google-chrome: not found

This error on Ubuntu 20.04 could not be fixed.

/ bin / sh: 1: google-chrome: not found
/ bin / sh: 1: google-chrome-stable: not found
Could not get version for Chrome with this command: google-chrome —version || google-chrome-stable —version
subprocess 931222 exited with code 127

For some unknown reason, I refused to see the browser, although it was installed and the rights to execute it were set. It was decided this alternative, it is written about it above in the article.

Optimizing images

An article on optimizing images in python.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *