Beautiful Soup - Python parsing

Description of the module operation Beautifulsoup for parsing into python… Currently Beautiful Soup 4.

The article will be updated …

Installation and integration

Installation:

pip install beautifulsoup4

Import

from bs4 import Beautifulsoup

Example of work:

soup = Beautifulsoup(content, ‘html.parser’)

Or via the lxml library:

soup = Beautifulsoup(content, ‘lxml’)

Here you will need to install additionally:

Find a link with text inside it:

text = soup.find(‘a’, text=‘Some text inside the link’)

Find one link with a class main:

link = soup.find(‘a’, {‘class’: ‘main’})

Find all references with a class main:

links = soup.findAll(‘a’, {‘class’: ‘main’})

Find link text with class main:

link = soup.find(‘a’, {‘class’: ‘main’})…text

Find the h1 tag with class title, then find the first inner element and take the text from it:

title = soup.find(‘h1’, {‘class’: ‘title’})…findChildren()[0]…text

Find inside tag div with text “Country” all links:

text = soup.find(‘div’, text=‘The country’)…parent…findAll(‘a’)

Find content srcset images with class poster:

string = soup.find(‘img’, {‘class’: ‘poster’})[‘srcset’]