How to take screenshots of a web page in python
We will take screenshots through the browser Chrome (any other is possible) and the module selenium on python…
Screenshots will be taken on Ubuntu 20.04. However, this is operating system independent and will work on Mac os or Windows (but did not check the work here, this is not accurate).
The instructions will be given specifically for Ubuntu, because everything is easier on Mac OS. For example, you don’t need to install Chrome from the command line or run it in stealth mode.
Install Chrome
Downloading:
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
Install (during installation, enter the Linux user password):
sudo apt install …/google-chrome-stable_current_amd64.deb
Install Chromium
As an alternative to chrome:
sudo apt-get install chromium-browser
Install selenium
Installing from a virtual environment:
Installing the chromedriver driver
There is one peculiarity in the driver installation. It must be the same version as the browser. And since the browser is constantly updated, then every time you have to look for the right driver for it. To avoid this, you can use webdriver-manager…
Installation:
pip install webdriver-manager
Installing the driver manually
Sometimes the above method does not find the browser. Errors that occur below in the article. I still couldn’t run Google chrome through webdriver-manager…
and)
Install the driver from the official website:
https://chromedriver.chromium.org/
Move it to the directory of executable files:
mv chromedriver /usr/local/bin/
Assign Execution Rights:
chmod + x /usr/local/bin/chromedriver
b)
You can also try using the command to install the driver:
sudo apt install chromium-chromedriver
Further in the code use:
driver = webdriver.Chrome()
or with an indication of the path, if it does not find:
driver = webdriver.Chrome(‘/ usr / local / bin / chromedriver’)
Debag
If a webdriver refuses to see browser or driver, some commands may help.
Find out which version is installed:
chromedriver -v
google-chrome -version
Find out browser locations:
whereis chromium
whereis chromium-browser
whereis google-chrome
Final code
For good it would be limited to one ChromeDriverManager, but in the real world, what is described in the docs does not always work. As I wrote above, Google Chrome did not work for me on Ubuntu 20.04. Or rather, 1 screenshot was taken, and then he did not see it again.
So I installed Chromium, found the path to the driver via the command whereis chromiumthen commented out ChromeDriverManager… Instead, he indicated the path to it through the property executable_path…
In addition, I applied an environment variable. On Mac OS (in local development), everything worked well for me, I left ChromeDriverManager, and on a remote server I used the above method.
An environment variable can be put in Flask config files like this:
os.environ[‘SERVER_ENV’] = ‘local’
The whole code turned out like this:
from selenium import webdriver
from selenium.webdriver…chrome…options import Options
from webdriver_manager.chrome import ChromeDriverManager
from webdriver_manager.utils import ChromeType
def save_screenshot(link, img):
# Save only the visible part of the open page
try:
chrome_options = Options()
chrome_options.add_argument(“–headless”)
chrome_options.add_argument(‘–no-sandbox’)
if os…environ…get(‘SERVER_ENV’) == ‘local’:
driver = webdriver.Chrome(ChromeDriverManager(chrome_type=ChromeType.GOOGLE, cache_valid_range=five)…install(), options=chrome_options)
else:
# driver = webdriver.Chrome (ChromeDriverManager (chrome_type = ChromeType.CHROMIUM, cache_valid_range = 5) .install (), options = chrome_options)
driver = webdriver.Chrome(executable_path=‘/snap/bin/chromium.chromedriver’, options=chrome_options)
driver.get(link)
driver.save_screenshot()
driver.quit()
return True
except Exception as e:
print(‘Error while saving screenshot: {}’…format(e))
cache_valid_range = 5 – this parameter means the cache in days. For this number of days, the driver will be cached before going again to search for a new version and install it.
Error: This version of ChromeDriver only supports Chrome version
This error occurs if the driver and browser versions do not match:
raise exception_class (message, screen, stacktrace)
selenium.common.exceptions.SessionNotCreatedException: Message: session not created: This version of ChromeDriver only supports Chrome version 85
Error: ubuntu The process started from chrome location
The Ubuntu server might get an error:
ubuntu The process started from chrome location / usr / bin / google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.
If the launch is from the command line, then you need to add the launch of the browser in hidden mode. This is done by adding options:
chrome_options = Options()
chrome_options.add_argument(“–headless”)
chrome_options.add_argument(‘–no-sandbox’)
Error: ValueError: Could not get version for Chrome
Another error:
ValueError: Could not get version for Chrome with this command: google-chrome
May occur if property is not specified chrome_type = ChromeType.GOOGLE, but it does not always help.
Error: google-chrome: not found
This error on Ubuntu 20.04 could not be fixed.
/ bin / sh: 1: google-chrome: not found
/ bin / sh: 1: google-chrome-stable: not found
Could not get version for Chrome with this command: google-chrome —version || google-chrome-stable —version
subprocess 931222 exited with code 127
For some unknown reason, I refused to see the browser, although it was installed and the rights to execute it were set. It was decided this alternative, it is written about it above in the article.
Optimizing images
An article on optimizing images in python.