1 - What is beautifulSoup ?
BeautifulSoup is a python library that lets you extract informations from a website, or from an XML document, with a few lines of code.
2 - How to intall BeatifulSoup ?
Installing beautifulsoup is a fairly simple operation, just use the pip command line utility and type:
pip install beautifulsoup4
Installing beautifulsoup is a fairly simple operation, just use the pip command line utility and type: You must install from the command line via the pip utility, other important modules which are required for the proper functioning of beatifulsoup:
Install the lxml module:
pip install lxml
Install the html5lib module:
pip install html5lib
3 -Example of using beautifulsoup to extract information from web site
We will see as an example how to retrieve the contents of all the h2 tags from a given url address.
import requests
from bs4 import BeautifulSoup
req = requests.get('https://www.my-courses.net/2020/02/the-python-numpy-library.html')
soup = BeautifulSoup(req.text, "lxml")
for h2_tag in soup.find_all('h2'):
print(h2_tag.text)
The output is:
Social Icons
Pages
Monday, February 24, 2020
1. About numpy
2. Matrix or table with numpy
3. The sublibrary linear algebra
Category Of Mobile Courses
Python Courses
Python For Data Science
Python-Exercises-with-solutions
Facebook Followers
Computer and internet Glossary
Follow by Email
HTML Courses
Javascript
Node.JS Courses
Followers
Total Views
Blog Archive
Contact me
My Courses
Blog Archive
Sample text
Sample Text
Note
With beautiful souple you can extract any tag content from web site
Example to extract content of all bold tag 'b'
import requests
from bs4 import BeautifulSoup
req = requests.get('https://www.my-courses.net/2020/02/the-python-numpy-library.html')
soup = BeautifulSoup(req.text, "lxml")
for bold_tag in soup.find_all('b'):
print(bold_tag.text)
The output is:
Official documentation
numpy.linspace()
linear algebra
Younes Derfoufimy-courses.net
Note
The beatifulsoup library has several other applications. For more information see the official website:
https://www.crummy.com/software/BeautifulSoup/
my-courses.net