Beautiful Soup – Python parsing
Description of the module operation Beautifulsoup for parsing into python… Currently Beautiful Soup 4.
The article will be updated …
Installation and integration
Installation:
pip install beautifulsoup4
Import
from bs4 import Beautifulsoup
Example of work:
soup = Beautifulsoup(content, ‘html.parser’)
Or via the lxml library:
soup = Beautifulsoup(content, ‘lxml’)
Here you will need to install additionally:
Find link
Find a link with text inside it:
text = soup.find(‘a’, text=‘Some text inside the link’)
Find one link with a class main:
link = soup.find(‘a’, {‘class’: ‘main’})
Find all references with a class main:
links = soup.findAll(‘a’, {‘class’: ‘main’})
Find link text with class main:
link = soup.find(‘a’, {‘class’: ‘main’})…text
Find Tag
Find the h1 tag with class title, then find the first inner element and take the text from it:
title = soup.find(‘h1’, {‘class’: ‘title’})…findChildren()[0]…text
Find inside tag div with text “Country” all links:
text = soup.find(‘div’, text=‘The country’)…parent…findAll(‘a’)
Find Image
Find content srcset images with class poster:
string = soup.find(‘img’, {‘class’: ‘poster’})[‘srcset’]