Updated On : Jan-01,2022 Tags beautifulsoup, scraping, htmlparsing, xmlparsing
beautifulsoup - Scrape,Parse & Analyze Web Pages in Python

BeautifulSoup: Detailed Guide to Parse & Search HTML Web Pages

Beautifulsoup is a python library that helps developers in parsing HTML and XML files quite easily. Its API can help in searching, navigating, and also modifying the parsed tree of documents. Beautifulsoup is a commonly used library to parse data from scraped website pages. It can be quite useful in scraping websites that are not providing REST APIs for information needed by users. Beautifulsoup library itself can not scrape web pages, it can only parse scrapped pages. For scrapping page, we need to use libraries like urllib, requests, etc. Beautifulsoup behind the scene uses other python libraries (html.parser, lxml, html5lib) for parsing DOM structure of web page. The API of beautifulsoup is very intuitive and easy to use. The current version of beautifulsoup is beautifulsoup4 which is recommended version and works with Python3.

As a part of this tutorial, we'll cover in detail the API of beautifulsoup library. We'll be covering the majority of functionalities provided by it. The tutorial is designed with a simple HTML document to make things easier to understand and grasp. This tutorial is specifically designed to retrieve tags and strings from the given HTML document. It does not concentrate on methods that are used to modify HTML documents. We have a different tutorial where we cover how to modify HTML documents using beautifulsoup. Please feel free to explore it from the below link.

Below we have highlighted important sections of the tutorial to give an overview of the material covered.

Important Sections of Tutorial

  1. Create BeautifulSoup Object to Easily Parse HTML of Web Page
  2. How to Access Individual HTML Tags?
  3. How to Access Attributes of Individual HTML Tag?
    • Access Tag Attributes using 'attrs' Attribute
    • Access Tag Attributes by Treating Tag Object as Dictionary
    • Access Tag Attributes using get() Method
    • Access Tag Attributes using get_attribute_list() Method
  4. How to Check if Some Attribute is Present in HTML Tag?
  5. How to Access Text of Individual HTML Tag?
    • Access Text using 'text' Attribute of Tag Object
    • Access Text using 'get_text()' Method of Tag Object
    • Access Text using 'getText()' Method of Tag Object
    • Access Text using 'strings' and 'stripped_strings' Attributes of Tag Object
  6. How to Retrieve Name of the HTML Tag?
  7. How to Access Children of HTML Tag as a List?
    • Access Children using 'contents' Attribute
    • Access Children using 'children' Attribute
    • Access Children using 'descendants' Attribute
  8. How to Access Parent/Parents of HTML Tag?
    • Access Single Direct Parent using 'parent' Attribute
    • Recursively Access All Parents till Root of Document using 'parents' Attribute of Tab Object
  9. How to Find Siblings of HTML Tag?
    • Access Next Sibling 'next_sibling' Attribute
    • Access Previous Sibling using 'previous_sibling' Attribute
    • Access All Next Siblings using 'next_siblings' Attribute
    • Access All Previous Siblings using 'previous_siblings' Attribute
  10. How to Parse HTML Document Forward/Backward from Given HTML Tag?
    • Parse Forward using 'next', 'next_element' and 'next_elements' Attributes
      • 'next' Attribute
      • 'next_element' Attribute
      • 'next_elements' Attribute
    • Parse Backward using 'previous', 'previous_element' and 'previous_elements' Attributes
      • 'previous' Attribute
      • 'previous_element' Attribute
      • 'previous_elements' Attribute
  11. How to Search for Specific HTML Tag in an HTML Document ?
    • Find First Tag of the Specified Name
    • Find All Tags of Specified Name
    • Find Next Element in Document for Given HTML Tag (Works 'next' Attribute)
    • Find Previous Element in Document for Given HTML Tag (Works 'previous' Attribute)
    • Find All Next Elements in Document for Given HTML Tag (Works like 'next_elements' Attribute)
    • Find All Previous Elements in Document for Given HTML Tag (Works like 'previous_elements' Attribute)
    • Find Parent of Given HTML Tag (Works like 'parent' Attribute)
    • Find Parents of Given HTML Tag (Works like 'parents' Attribute)
    • Find Next Sibling of Given HTML Tag (Works like 'next_sibling' Attribute)
    • Find Previous Sibling of Given HTML Tag (Works like 'previous_sibling' Attribute)
    • Find All Next Siblings of Given HTML Tag (Works like 'next_siblings' Attribute)
    • Find All Previous Siblings of Given HTML Tag (Works like 'previous_siblings' Attribute)

Installation

  • pip install beautifulsoup4
  • easy_install beautifulsoup4
  • apt-get install python-bs4 (for Python 2)
  • apt-get install python3-bs4 (for Python 3)

In the below cell, we have imported beautifulsoup library and printed the version of it that we'll be using in this tutorial.

In [1]:
import bs4

print("BeautifulSoup Version : {}".format(bs4.__version__))
BeautifulSoup Version : 4.10.0

1. Create BeautifulSoup Object to Easily Parse HTML of Web Page

In order to use beautifulsoup to parse HTML documents, we need to create a BeautifulSoup object which has information about parsed DOM tree of the HTML document. We can then call various methods and attributes available through this BeautifulSoup object to search and retrieve information from an HTML document.

Below we have created a sample HTML document that we'll be using through our tutorial. We'll be parsing this document and explaining various methods/attributes of BeautifulSoup object on this document.

In [2]:
sample_html= '''<html>
<head>
    <title>CoderzColumn : Developed for Developers by Developers for the betterment of Development.</title>
    <script src="static/script1.js"></script>
    <script src="static/script2.js"></script>
    <link rel="stylesheet" href="static/stylesheet.css" type="text/css" />
</head>
<body>
    <p id='start'>Welcome to CoderzColumn</p>
    <p id='main_para'>We regularly publish tutorials on various topics 
    (Python, Machine learning, Data Visualization, Digital Marketing, etc.) regularly explaining 
    how to use various Python libraries.</p>
    <p id='sub_para'>Below are list of Important Sections of Our Website : </p>
        <ul>
            <li><a href='https://coderzcolumn.com/blogs'>Blogs</a></li>
            <li><a href='https://coderzcolumn.com/tutorials'>Tutorials</a></li>
            <li><a href='https://coderzcolumn.com/about'>About</a></li>
            <li><a href='https://coderzcolumn.com/contact-us'>Contact US</a></li>
        </ul>
    <p id='end'>Please feel free to send us mail @ coderzcolumn07@gmail.com if you need any 
    information about any article or want us to publish article on particular topic.</p>
</body>
</html>'''

We can create BeautifulSoup object by calling the constructor of it. The first argument that we give to the constructor is the whole HTML/XML document as a string or file-like object pointing to HTML/XML file. The second argument is a string specifying which underlying parser library to use to parse documents. The possible values for second argument are 'lxml', 'lxml-xml', 'html.parse' and 'html5lib'. When we create a BeautifulSoup object using constructor, it parses the whole input HTML/XML document and creates a DOM-like (tree-like) structure whose nodes are one of the two types of objects mentioned below.

  1. Tag - This object holds information about single HTML tag. The HTML tag can have other tags in it which will be accessible through this object. At the highest level, the main Tag object is html tag which has information about the whole document. All other tags in the HTML document can be accessed from this tag.
  2. NavigableString - The string between the start and end of a particular tag and values of tag attributes are stored as NavigableString object.

In order for parsing to complete successfully, the document should be valid, i.e, all tags should have ends tags, etc. We'll explain how we can retrieve information from Tag and Navigable objects in our upcoming sections.

Below we have created BeautifulSoup object for our HTML document. We have then printed it as well.

In [3]:
from bs4 import BeautifulSoup

soup = BeautifulSoup(sample_html, 'html.parser')

print(soup)
<html>
<head>
<title>CoderzColumn : Developed for Developers by Developers for the betterment of Development.</title>
<script src="static/script1.js"></script>
<script src="static/script2.js"></script>
<link href="static/stylesheet.css" rel="stylesheet" type="text/css"/>
</head>
<body>
<p id="start">Welcome to CoderzColumn</p>
<p id="main_para">We regularly publish tutorials on various topics
    (Python, Machine learning, Data Visualization, Digital Marketing, etc.) regularly explaining
    how to use various Python libraries.</p>
<p id="sub_para">Below are list of Important Sections of Our Website : </p>
<ul>
<li><a href="https://coderzcolumn.com/blogs">Blogs</a></li>
<li><a href="https://coderzcolumn.com/tutorials">Tutorials</a></li>
<li><a href="https://coderzcolumn.com/about">About</a></li>
<li><a href="https://coderzcolumn.com/contact-us">Contact US</a></li>
</ul>
<p id="end">Please feel free to send us mail @ coderzcolumn07@gmail.com if you need any
    information about any article or want us to publish article on particular topic.</p>
</body>
</html>

2. How to Access Individual HTML Tags?

In this section, we'll explain how we can access individual HTML tags by treating tag names as property of BeautifulSoup object. We can retrieve Tag object representing a particular tag by treating the tag name as the property of Tag object.

If there is more than one HTML tag by the same name then this property call will retrieve the first one. There are sections on tutorial later which explain how to retrieve all tags by a particular name.

Below we have called html property on our soup object to retrieve the whole html document. The majority of calls to methods and property of BeautifulSoup object returns an object of type Tag or NavigableString string. Whenever we print Tag or NavigableString object, it prints the content of that HTML tag.

In [4]:
whole_page_src = soup.html

print("Object Type : {}".format(type(whole_page_src)))

print(whole_page_src)
Object Type : <class 'bs4.element.Tag'>
<html>
<head>
<title>CoderzColumn : Developed for Developers by Developers for the betterment of Development.</title>
<script src="static/script1.js"></script>
<script src="static/script2.js"></script>
<link href="static/stylesheet.css" rel="stylesheet" type="text/css"/>
</head>
<body>
<p id="start">Welcome to CoderzColumn</p>
<p id="main_para">We regularly publish tutorials on various topics
    (Python, Machine learning, Data Visualization, Digital Marketing, etc.) regularly explaining
    how to use various Python libraries.</p>
<p id="sub_para">Below are list of Important Sections of Our Website : </p>
<ul>
<li><a href="https://coderzcolumn.com/blogs">Blogs</a></li>
<li><a href="https://coderzcolumn.com/tutorials">Tutorials</a></li>
<li><a href="https://coderzcolumn.com/about">About</a></li>
<li><a href="https://coderzcolumn.com/contact-us">Contact US</a></li>
</ul>
<p id="end">Please feel free to send us mail @ coderzcolumn07@gmail.com if you need any
    information about any article or want us to publish article on particular topic.</p>
</body>
</html>

Below we have retrieved script tag from the document. As there is more than one script tag, it returns the first one. Then in the next few cells, we have explained a few more examples explaining how to retrieve various other tags from soup objects by treating their name as simply property.

In [5]:
soup.script
Out[5]:
<script src="static/script1.js"></script>
In [6]:
soup.link
Out[6]:
<link href="static/stylesheet.css" rel="stylesheet" type="text/css"/>
In [7]:
soup.title
Out[7]:
<title>CoderzColumn : Developed for Developers by Developers for the betterment of Development.</title>
In [8]:
soup.body
Out[8]:
<body>
<p id="start">Welcome to CoderzColumn</p>
<p id="main_para">We regularly publish tutorials on various topics
    (Python, Machine learning, Data Visualization, Digital Marketing, etc.) regularly explaining
    how to use various Python libraries.</p>
<p id="sub_para">Below are list of Important Sections of Our Website : </p>
<ul>
<li><a href="https://coderzcolumn.com/blogs">Blogs</a></li>
<li><a href="https://coderzcolumn.com/tutorials">Tutorials</a></li>
<li><a href="https://coderzcolumn.com/about">About</a></li>
<li><a href="https://coderzcolumn.com/contact-us">Contact US</a></li>
</ul>
<p id="end">Please feel free to send us mail @ coderzcolumn07@gmail.com if you need any
    information about any article or want us to publish article on particular topic.</p>
</body>
In [9]:
soup.p
Out[9]:
<p id="start">Welcome to CoderzColumn</p>
In [10]:
soup.li
Out[10]:
<li><a href="https://coderzcolumn.com/blogs">Blogs</a></li>

3. How to Access Attributes of Individual HTML Tag?

In this section, we'll explain how we can retrieve the attributes and values of those attributes of HTML tag.

Access Tag Attributes using 'attrs' Attribute

Each Tag object has a property named attrs which returns a dictionary that has all attributes of that HTML tag. We can also modify an attribute's value and add a new attribute to this dictionary,

In [11]:
soup.link.attrs
Out[11]:
{'rel': ['stylesheet'], 'href': 'static/stylesheet.css', 'type': 'text/css'}
In [12]:
soup.script.attrs
Out[12]:
{'src': 'static/script1.js'}
In [13]:
soup.a.attrs
Out[13]:
{'href': 'https://coderzcolumn.com/blogs'}

Access Tag Attributes by Treating Tag Object as Dictionary

Another way to retrieve the value of the particular attribute is by treating Tag object as a dictionary. Below we have explained with simple examples how to retrieve attribute values this way.

In [14]:
soup.link["type"]
Out[14]:
'text/css'
In [15]:
soup.link["href"]
Out[15]:
'static/stylesheet.css'
In [16]:
soup.a["href"]
Out[16]:
'https://coderzcolumn.com/blogs'

Access Tag Attributes using get() Method

The get() method can also be used to retrieve the value of an attribute. We just need to call get() method on Tag object and give it an attribute name, it'll return the value of the given attribute of that HTML tag.

In [17]:
soup.link.get("href")
Out[17]:
'static/stylesheet.css'
In [18]:
soup.link.get("type")
Out[18]:
'text/css'

Access Tag Attributes using get_attribute_list() Method

We can access all values of a particular attribute if an attribute has more than one value using get_attribute_list() method of Tag object.

In [19]:
soup.link.get_attribute_list("type")
Out[19]:
['text/css']
In [20]:
soup.link.get_attribute_list("href")
Out[20]:
['static/stylesheet.css']

4. How to Check if Some Attribute is Present in HTML Tag?

We can check whether some attribute is present in HTML tag using has_attr() method. It takes attribute name as input and returns True if an attribute is present in Tag object else, False.

In [149]:
soup.a.has_attr("href")
Out[149]:
True
In [153]:
soup.a
Out[153]:
<a href="https://coderzcolumn.com/blogs">Blogs</a>
In [150]:
soup.a.has_attr("id")
Out[150]:
False
In [151]:
soup.a.has_attr("target")
Out[151]:
False
In [152]:
soup.p.has_attr("id")
Out[152]:
True
In [154]:
soup.p
Out[154]:
<p id="start">Welcome to CoderzColumn</p>

5. How to Access Text of Individual HTML Tag?

There are various ways to access the text of the HTML tag in BeautifulSoup. We'll explain them one by one next.

  • Using text property of Tag object.
  • Using get_text() method of Tag object.
  • Using getText() method of Tag object.
  • Using strings and stripped_strings properties of Tag object.

Access Text using 'text' Attribute of Tag Object

We can access the total text of Tag or soup object by just calling text property on them. It'll recursively retrieve the text of all tags inside a particular HTML tag to form the total text of the HTML tag.

Below we are retrieving the total text of the HTML doc by calling text property on the soup object. It retrieves all text present in the body of the HTML doc.

In [21]:
soup.text
Out[21]:
'\n\nCoderzColumn : Developed for Developers by Developers for the betterment of Development.\n\n\n\n\n\nWelcome to CoderzColumn\nWe regularly publish tutorials on various topics \n    (Python, Machine learning, Data Visualization, Digital Marketing, etc.) regularly explaining \n    how to use various Python libraries.\nBelow are list of Important Sections of Our Website : \n\nBlogs\nTutorials\nAbout\nContact US\n\nPlease feel free to send us mail @ coderzcolumn07@gmail.com if you need any \n    information about any article or want us to publish article on particular topic.\n\n'
In [22]:
print(soup.text)

CoderzColumn : Developed for Developers by Developers for the betterment of Development.





Welcome to CoderzColumn
We regularly publish tutorials on various topics
    (Python, Machine learning, Data Visualization, Digital Marketing, etc.) regularly explaining
    how to use various Python libraries.
Below are list of Important Sections of Our Website :

Blogs
Tutorials
About
Contact US

Please feel free to send us mail @ coderzcolumn07@gmail.com if you need any
    information about any article or want us to publish article on particular topic.


In [23]:
print(soup.title.text)
CoderzColumn : Developed for Developers by Developers for the betterment of Development.
In [24]:
print(soup.body.text.strip())
Welcome to CoderzColumn
We regularly publish tutorials on various topics
    (Python, Machine learning, Data Visualization, Digital Marketing, etc.) regularly explaining
    how to use various Python libraries.
Below are list of Important Sections of Our Website :

Blogs
Tutorials
About
Contact US

Please feel free to send us mail @ coderzcolumn07@gmail.com if you need any
    information about any article or want us to publish article on particular topic.
In [25]:
print(soup.p.text)
Welcome to CoderzColumn

Access Text using get_text() Method of Tag Object

The second way of retrieving the text of a particular Tag is by calling get_text() method on it.

In [26]:
print(soup.body.get_text().strip())
Welcome to CoderzColumn
We regularly publish tutorials on various topics
    (Python, Machine learning, Data Visualization, Digital Marketing, etc.) regularly explaining
    how to use various Python libraries.
Below are list of Important Sections of Our Website :

Blogs
Tutorials
About
Contact US

Please feel free to send us mail @ coderzcolumn07@gmail.com if you need any
    information about any article or want us to publish article on particular topic.
In [27]:
print(soup.ul.get_text().strip())
Blogs
Tutorials
About
Contact US

Access Text using getText() Method of Tag Object

The getText() method works just like get_text() method.

In [28]:
print(soup.ul.getText().strip())
Blogs
Tutorials
About
Contact US
In [29]:
print(soup.li.get_text().strip())
Blogs

Access Text using 'strings' and 'stripped_strings' Attributes of Tag Object

The Tag object has two more properties that return the text of it.

  • strings - It returns all text of Tag object as list of strings. It also includes all newline characters.
  • stripped_strings - It works like strings property only but removes newlines characters.
In [30]:
soup.ul.strings
Out[30]:
<generator object Tag._all_strings at 0x7f1f51e050c0>
In [31]:
list(soup.ul.strings)
Out[31]:
['\n', 'Blogs', '\n', 'Tutorials', '\n', 'About', '\n', 'Contact US', '\n']
In [32]:
list(soup.body.strings)
Out[32]:
['\n',
 'Welcome to CoderzColumn',
 '\n',
 'We regularly publish tutorials on various topics \n    (Python, Machine learning, Data Visualization, Digital Marketing, etc.) regularly explaining \n    how to use various Python libraries.',
 '\n',
 'Below are list of Important Sections of Our Website : ',
 '\n',
 '\n',
 'Blogs',
 '\n',
 'Tutorials',
 '\n',
 'About',
 '\n',
 'Contact US',
 '\n',
 '\n',
 'Please feel free to send us mail @ coderzcolumn07@gmail.com if you need any \n    information about any article or want us to publish article on particular topic.',
 '\n']
In [33]:
list(soup.ul.stripped_strings)
Out[33]:
['Blogs', 'Tutorials', 'About', 'Contact US']
In [34]:
list(soup.body.stripped_strings)
Out[34]:
['Welcome to CoderzColumn',
 'We regularly publish tutorials on various topics \n    (Python, Machine learning, Data Visualization, Digital Marketing, etc.) regularly explaining \n    how to use various Python libraries.',
 'Below are list of Important Sections of Our Website :',
 'Blogs',
 'Tutorials',
 'About',
 'Contact US',
 'Please feel free to send us mail @ coderzcolumn07@gmail.com if you need any \n    information about any article or want us to publish article on particular topic.']

6. How to Retrieve Name of the HTML Tag?

We can retrieve the name of any HTML tag by calling name property on Tag object itself.

In [35]:
soup.ul.name
Out[35]:
'ul'
In [36]:
soup.title.name
Out[36]:
'title'
In [37]:
soup.title.parent.name
Out[37]:
'head'
In [38]:
soup.body.parent.name
Out[38]:
'html'

7. How to Access Children of HTML Tag as a List?

In this section, we'll explain how we can retrieve various HTML tags which are present inside of given HTML tags. These are generally referred to as children of HTML tag. There are various ways to retrieve children of the given HTML tag.

  • Using contents property of Tag object.
  • Using children property of Tag object.
  • Using descendants property of Tag object.

Access Children using 'contents' Attribute

The contents property of Tag object returns list of Tag and NavigableString objects which are children of the HTML tag.

Below we have retrieved children of a few HTML tags by calling contents property.

In [39]:
for content in soup.ul.contents:
    print(type(content))

soup.ul.contents
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
Out[39]:
['\n',
 <li><a href="https://coderzcolumn.com/blogs">Blogs</a></li>,
 '\n',
 <li><a href="https://coderzcolumn.com/tutorials">Tutorials</a></li>,
 '\n',
 <li><a href="https://coderzcolumn.com/about">About</a></li>,
 '\n',
 <li><a href="https://coderzcolumn.com/contact-us">Contact US</a></li>,
 '\n']
In [40]:
for content in soup.ul.contents:
    if isinstance(content, bs4.element.Tag):
        print([type(c) for c in content.contents],content.contents)
[<class 'bs4.element.Tag'>] [<a href="https://coderzcolumn.com/blogs">Blogs</a>]
[<class 'bs4.element.Tag'>] [<a href="https://coderzcolumn.com/tutorials">Tutorials</a>]
[<class 'bs4.element.Tag'>] [<a href="https://coderzcolumn.com/about">About</a>]
[<class 'bs4.element.Tag'>] [<a href="https://coderzcolumn.com/contact-us">Contact US</a>]
In [41]:
for content in soup.head.contents:
    print(type(content))

soup.head.contents
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
Out[41]:
['\n',
 <title>CoderzColumn : Developed for Developers by Developers for the betterment of Development.</title>,
 '\n',
 <script src="static/script1.js"></script>,
 '\n',
 <script src="static/script2.js"></script>,
 '\n',
 <link href="static/stylesheet.css" rel="stylesheet" type="text/css"/>,
 '\n']
In [42]:
for content in soup.p.contents:
    print(type(content))

soup.p.contents
<class 'bs4.element.NavigableString'>
Out[42]:
['Welcome to CoderzColumn']

Access Children using 'children' Attribute

Another way to retrieve all children of the HTML tag is by calling children property of Tag object. Below we have explained the usage with a few simple examples.

In [43]:
soup.ul.children
Out[43]:
<list_iterator at 0x7f1f51dd86a0>
In [44]:
for child in soup.ul.children:
    print(type(child))

list(soup.ul.children)
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
Out[44]:
['\n',
 <li><a href="https://coderzcolumn.com/blogs">Blogs</a></li>,
 '\n',
 <li><a href="https://coderzcolumn.com/tutorials">Tutorials</a></li>,
 '\n',
 <li><a href="https://coderzcolumn.com/about">About</a></li>,
 '\n',
 <li><a href="https://coderzcolumn.com/contact-us">Contact US</a></li>,
 '\n']
In [45]:
for child in soup.head.children:
    print(type(child))

list(soup.head.children)
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
Out[45]:
['\n',
 <title>CoderzColumn : Developed for Developers by Developers for the betterment of Development.</title>,
 '\n',
 <script src="static/script1.js"></script>,
 '\n',
 <script src="static/script2.js"></script>,
 '\n',
 <link href="static/stylesheet.css" rel="stylesheet" type="text/css"/>,
 '\n']

Access Children using 'descendants' Attribute

The descendants is another property provided by Tag object that can be used to retrieve children of any HTML tag.

In [46]:
soup.ul.descendants
Out[46]:
<generator object Tag.descendants at 0x7f1f51e05ed0>
In [47]:
for descendant in soup.ul.descendants: ## Breadth first 
    print(type(descendant))

list(soup.ul.descendants)
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.NavigableString'>
Out[47]:
['\n',
 <li><a href="https://coderzcolumn.com/blogs">Blogs</a></li>,
 <a href="https://coderzcolumn.com/blogs">Blogs</a>,
 'Blogs',
 '\n',
 <li><a href="https://coderzcolumn.com/tutorials">Tutorials</a></li>,
 <a href="https://coderzcolumn.com/tutorials">Tutorials</a>,
 'Tutorials',
 '\n',
 <li><a href="https://coderzcolumn.com/about">About</a></li>,
 <a href="https://coderzcolumn.com/about">About</a>,
 'About',
 '\n',
 <li><a href="https://coderzcolumn.com/contact-us">Contact US</a></li>,
 <a href="https://coderzcolumn.com/contact-us">Contact US</a>,
 'Contact US',
 '\n']
In [48]:
for descendant in soup.head.descendants:
    print(type(descendant))

list(soup.head.descendants)
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
Out[48]:
['\n',
 <title>CoderzColumn : Developed for Developers by Developers for the betterment of Development.</title>,
 'CoderzColumn : Developed for Developers by Developers for the betterment of Development.',
 '\n',
 <script src="static/script1.js"></script>,
 '\n',
 <script src="static/script2.js"></script>,
 '\n',
 <link href="static/stylesheet.css" rel="stylesheet" type="text/css"/>,
 '\n']

8. How to Access Parent/Parents of HTML Tag?

In this section, we'll explain how we can retrieve parent details of a given HTML tag. We can retrieve details about the immediate parent of the given HTML tag or all parents of the HTML tag till the root of the document which is the parent of all tags. The Tag object provides two properties to retrieve parents’ details.

  • parent
  • parents

Access Single Direct Parent using 'parent' Attribute

The parent property returns just immediate direct parent of given HTML tag. The HTML tag which contains given HTML tag.

In [49]:
soup.li
Out[49]:
<li><a href="https://coderzcolumn.com/blogs">Blogs</a></li>
In [50]:
parent = soup.li.parent

print(type(parent))

parent
<class 'bs4.element.Tag'>
Out[50]:
<ul>
<li><a href="https://coderzcolumn.com/blogs">Blogs</a></li>
<li><a href="https://coderzcolumn.com/tutorials">Tutorials</a></li>
<li><a href="https://coderzcolumn.com/about">About</a></li>
<li><a href="https://coderzcolumn.com/contact-us">Contact US</a></li>
</ul>
In [51]:
soup.a
Out[51]:
<a href="https://coderzcolumn.com/blogs">Blogs</a>
In [52]:
soup.a.parent
Out[52]:
<li><a href="https://coderzcolumn.com/blogs">Blogs</a></li>
In [53]:
soup.title
Out[53]:
<title>CoderzColumn : Developed for Developers by Developers for the betterment of Development.</title>
In [54]:
soup.title.parent
Out[54]:
<head>
<title>CoderzColumn : Developed for Developers by Developers for the betterment of Development.</title>
<script src="static/script1.js"></script>
<script src="static/script2.js"></script>
<link href="static/stylesheet.css" rel="stylesheet" type="text/css"/>
</head>

Recursively Access All Parents till Root of Document using 'parents' Attribute of Tab Object

The parents property returns all parents of given HTML tag. This contains immediate direct parent as well as all parents of the parent of HTML tag.

In [55]:
soup.a
Out[55]:
<a href="https://coderzcolumn.com/blogs">Blogs</a>
In [56]:
soup.a.parents
Out[56]:
<generator object PageElement.parents at 0x7f1f51e05138>
In [57]:
total_parents = list(soup.a.parents)

print("Total Number of Parents : {}".format(len(total_parents)))

print("=========First Immediate Parent =====================")
print(total_parents[0])

print("\n=========Parent of Immediate Parent (Grand Parent) =================")
print(total_parents[1])
Total Number of Parents : 5
=========First Immediate Parent =====================
<li><a href="https://coderzcolumn.com/blogs">Blogs</a></li>

=========Parent of Immediate Parent (Grand Parent) =================
<ul>
<li><a href="https://coderzcolumn.com/blogs">Blogs</a></li>
<li><a href="https://coderzcolumn.com/tutorials">Tutorials</a></li>
<li><a href="https://coderzcolumn.com/about">About</a></li>
<li><a href="https://coderzcolumn.com/contact-us">Contact US</a></li>
</ul>
In [58]:
soup.ul
Out[58]:
<ul>
<li><a href="https://coderzcolumn.com/blogs">Blogs</a></li>
<li><a href="https://coderzcolumn.com/tutorials">Tutorials</a></li>
<li><a href="https://coderzcolumn.com/about">About</a></li>
<li><a href="https://coderzcolumn.com/contact-us">Contact US</a></li>
</ul>
In [59]:
total_parents =  list(soup.ul.parents)

print("Total Number of Parents : {}".format(len(total_parents)))

print("=========First Immediate Parent =====================")
print(total_parents[0])

print("\n=========Parent of Immediate Parent (Grand Parent) =================")
print(total_parents[1])
Total Number of Parents : 3
=========First Immediate Parent =====================
<body>
<p id="start">Welcome to CoderzColumn</p>
<p id="main_para">We regularly publish tutorials on various topics
    (Python, Machine learning, Data Visualization, Digital Marketing, etc.) regularly explaining
    how to use various Python libraries.</p>
<p id="sub_para">Below are list of Important Sections of Our Website : </p>
<ul>
<li><a href="https://coderzcolumn.com/blogs">Blogs</a></li>
<li><a href="https://coderzcolumn.com/tutorials">Tutorials</a></li>
<li><a href="https://coderzcolumn.com/about">About</a></li>
<li><a href="https://coderzcolumn.com/contact-us">Contact US</a></li>
</ul>
<p id="end">Please feel free to send us mail @ coderzcolumn07@gmail.com if you need any
    information about any article or want us to publish article on particular topic.</p>
</body>

=========Parent of Immediate Parent (Grand Parent) =================
<html>
<head>
<title>CoderzColumn : Developed for Developers by Developers for the betterment of Development.</title>
<script src="static/script1.js"></script>
<script src="static/script2.js"></script>
<link href="static/stylesheet.css" rel="stylesheet" type="text/css"/>
</head>
<body>
<p id="start">Welcome to CoderzColumn</p>
<p id="main_para">We regularly publish tutorials on various topics
    (Python, Machine learning, Data Visualization, Digital Marketing, etc.) regularly explaining
    how to use various Python libraries.</p>
<p id="sub_para">Below are list of Important Sections of Our Website : </p>
<ul>
<li><a href="https://coderzcolumn.com/blogs">Blogs</a></li>
<li><a href="https://coderzcolumn.com/tutorials">Tutorials</a></li>
<li><a href="https://coderzcolumn.com/about">About</a></li>
<li><a href="https://coderzcolumn.com/contact-us">Contact US</a></li>
</ul>
<p id="end">Please feel free to send us mail @ coderzcolumn07@gmail.com if you need any
    information about any article or want us to publish article on particular topic.</p>
</body>
</html>

9. How to Find Siblings of HTML Tag?

In this section, we'll explain how we can retrieve siblings of a given HTML tag. The siblings are tags that are at the same level as the given HTML tag and have the same immediate parent as the given HTML tag. The Tag object provides various properties to retrieve siblings of a given HTML tag.

  • next_sibling
  • previous_sibling
  • next_siblings
  • previous_siblings

Access Next Sibling 'next_sibling' Attribute

The next_sibling property when called on Tag object returns immediate single sibling of given HTML tag.

In [60]:
soup.p
Out[60]:
<p id="start">Welcome to CoderzColumn</p>
In [61]:
out = soup.p.next_sibling

print(type(out))

out
<class 'bs4.element.NavigableString'>
Out[61]:
'\n'
In [62]:
soup.p.next_sibling.next_sibling
Out[62]:
<p id="main_para">We regularly publish tutorials on various topics
    (Python, Machine learning, Data Visualization, Digital Marketing, etc.) regularly explaining
    how to use various Python libraries.</p>
In [63]:
soup.li
Out[63]:
<li><a href="https://coderzcolumn.com/blogs">Blogs</a></li>
In [64]:
soup.li.next_sibling
Out[64]:
'\n'
In [65]:
soup.li.next_sibling.next_sibling
Out[65]:
<li><a href="https://coderzcolumn.com/tutorials">Tutorials</a></li>

Access Previous Sibling using 'previous_sibling' Attribute

The previous_sibling property of Tag object returns immediate previous sibling of given HTML tag.

In [66]:
out = soup.ul.previous_sibling

print(type(out))

out
<class 'bs4.element.NavigableString'>
Out[66]:
'\n'
In [67]:
soup.ul.previous_sibling.previous_sibling
Out[67]:
<p id="sub_para">Below are list of Important Sections of Our Website : </p>
In [68]:
soup.li
Out[68]:
<li><a href="https://coderzcolumn.com/blogs">Blogs</a></li>
In [69]:
soup.script
Out[69]:
<script src="static/script1.js"></script>
In [70]:
soup.script.previous_sibling.previous_sibling
Out[70]:
<title>CoderzColumn : Developed for Developers by Developers for the betterment of Development.</title>

Access All Next Siblings using 'next_siblings' Attribute

The next_siblings property returns all siblings which comes after given HTML tag. It'll return all siblings which will be parsed after the given HTML tag.

In [71]:
soup.li
Out[71]:
<li><a href="https://coderzcolumn.com/blogs">Blogs</a></li>
In [72]:
soup.li.next_siblings
Out[72]:
<generator object PageElement.next_siblings at 0x7f1f51e3d1b0>
In [73]:
for sibling in soup.li.next_siblings:
    print(type(sibling))

list(soup.li.next_siblings)
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
Out[73]:
['\n',
 <li><a href="https://coderzcolumn.com/tutorials">Tutorials</a></li>,
 '\n',
 <li><a href="https://coderzcolumn.com/about">About</a></li>,
 '\n',
 <li><a href="https://coderzcolumn.com/contact-us">Contact US</a></li>,
 '\n']
In [74]:
soup.p
Out[74]:
<p id="start">Welcome to CoderzColumn</p>
In [75]:
for sibling in soup.p.next_siblings:
    print(type(sibling))

list(soup.p.next_siblings)
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
Out[75]:
['\n',
 <p id="main_para">We regularly publish tutorials on various topics
     (Python, Machine learning, Data Visualization, Digital Marketing, etc.) regularly explaining
     how to use various Python libraries.</p>,
 '\n',
 <p id="sub_para">Below are list of Important Sections of Our Website : </p>,
 '\n',
 <ul>
 <li><a href="https://coderzcolumn.com/blogs">Blogs</a></li>
 <li><a href="https://coderzcolumn.com/tutorials">Tutorials</a></li>
 <li><a href="https://coderzcolumn.com/about">About</a></li>
 <li><a href="https://coderzcolumn.com/contact-us">Contact US</a></li>
 </ul>,
 '\n',
 <p id="end">Please feel free to send us mail @ coderzcolumn07@gmail.com if you need any
     information about any article or want us to publish article on particular topic.</p>,
 '\n']

Access All Previous Siblings using 'previous_siblings' Attribute

The previous_siblings property returns all siblings which were parsed before the given HTML tag by the parser.

In [76]:
soup.ul.previous_siblings
Out[76]:
<generator object PageElement.previous_siblings at 0x7f1f51e3d318>
In [77]:
for sibling in soup.ul.previous_siblings:
    print(type(sibling))

list(soup.ul.previous_siblings)
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
Out[77]:
['\n',
 <p id="sub_para">Below are list of Important Sections of Our Website : </p>,
 '\n',
 <p id="main_para">We regularly publish tutorials on various topics
     (Python, Machine learning, Data Visualization, Digital Marketing, etc.) regularly explaining
     how to use various Python libraries.</p>,
 '\n',
 <p id="start">Welcome to CoderzColumn</p>,
 '\n']

10. How to Parse HTML Document Forward/Backward from Given HTML Tag?

We can retrieve all elements which get parsed after the given HTML tag and all elements which already got parsed before the given HTML tag.

  1. Forward Parse till End of Document
  2. Backward Parse till Root of Document

The Tag object has various properties which can be used to perform forward parse and backward parse.

  • Forward Parse - next, next_element, next_elements
  • Backward Parse - previous, previous_element, previous_elements

Parse Forward using 'next', 'next_element' and 'next_elements' Attributes

'next' Attribute

The next property when called on any Tag object returns Tag object which got parsed immediately after given HTML tag.

In [78]:
out = soup.p.next

print(type(out))

out
<class 'bs4.element.NavigableString'>
Out[78]:
'Welcome to CoderzColumn'
In [79]:
soup.p.next.next
Out[79]:
'\n'
In [80]:
soup.p.next.next.next
Out[80]:
<p id="main_para">We regularly publish tutorials on various topics
    (Python, Machine learning, Data Visualization, Digital Marketing, etc.) regularly explaining
    how to use various Python libraries.</p>
In [81]:
soup.ul.next
Out[81]:
'\n'
In [82]:
soup.ul.next.next
Out[82]:
<li><a href="https://coderzcolumn.com/blogs">Blogs</a></li>
'next_element' Attribute

The next_element property of given Tag object works exactly like element property and returns next element which was parsed after given HTML tag.

In [83]:
out = soup.p.next_element

print(type(out))

out
<class 'bs4.element.NavigableString'>
Out[83]:
'Welcome to CoderzColumn'
In [84]:
soup.p.next_element.next_element
Out[84]:
'\n'
In [85]:
soup.p.next_element.next_element.next_element
Out[85]:
<p id="main_para">We regularly publish tutorials on various topics
    (Python, Machine learning, Data Visualization, Digital Marketing, etc.) regularly explaining
    how to use various Python libraries.</p>
In [86]:
soup.ul.next_element
Out[86]:
'\n'
In [87]:
soup.ul.next_element.next_element
Out[87]:
<li><a href="https://coderzcolumn.com/blogs">Blogs</a></li>
'next_elements' Attribute

The next_elements property returns list of all Tag objects which were parsed after given HTML tag.

In [88]:
soup.ul.next_elements
Out[88]:
<generator object PageElement.next_elements at 0x7f1f51e3d408>
In [89]:
for elem in soup.ul.next_elements:
    print(type(elem))

list(soup.ul.next_elements)
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.NavigableString'>
Out[89]:
['\n',
 <li><a href="https://coderzcolumn.com/blogs">Blogs</a></li>,
 <a href="https://coderzcolumn.com/blogs">Blogs</a>,
 'Blogs',
 '\n',
 <li><a href="https://coderzcolumn.com/tutorials">Tutorials</a></li>,
 <a href="https://coderzcolumn.com/tutorials">Tutorials</a>,
 'Tutorials',
 '\n',
 <li><a href="https://coderzcolumn.com/about">About</a></li>,
 <a href="https://coderzcolumn.com/about">About</a>,
 'About',
 '\n',
 <li><a href="https://coderzcolumn.com/contact-us">Contact US</a></li>,
 <a href="https://coderzcolumn.com/contact-us">Contact US</a>,
 'Contact US',
 '\n',
 '\n',
 <p id="end">Please feel free to send us mail @ coderzcolumn07@gmail.com if you need any
     information about any article or want us to publish article on particular topic.</p>,
 'Please feel free to send us mail @ coderzcolumn07@gmail.com if you need any \n    information about any article or want us to publish article on particular topic.',
 '\n',
 '\n']
In [90]:
for elem in soup.a.next_elements:
    print(type(elem))

list(soup.a.next_elements)
<class 'bs4.element.NavigableString'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.NavigableString'>
Out[90]:
['Blogs',
 '\n',
 <li><a href="https://coderzcolumn.com/tutorials">Tutorials</a></li>,
 <a href="https://coderzcolumn.com/tutorials">Tutorials</a>,
 'Tutorials',
 '\n',
 <li><a href="https://coderzcolumn.com/about">About</a></li>,
 <a href="https://coderzcolumn.com/about">About</a>,
 'About',
 '\n',
 <li><a href="https://coderzcolumn.com/contact-us">Contact US</a></li>,
 <a href="https://coderzcolumn.com/contact-us">Contact US</a>,
 'Contact US',
 '\n',
 '\n',
 <p id="end">Please feel free to send us mail @ coderzcolumn07@gmail.com if you need any
     information about any article or want us to publish article on particular topic.</p>,
 'Please feel free to send us mail @ coderzcolumn07@gmail.com if you need any \n    information about any article or want us to publish article on particular topic.',
 '\n',
 '\n']

Parse Backward using 'previous', 'previous_element' and 'previous_elements' Attributes

'previous' Attribute

The previous property returns Tag object which was parsed before given HTML tag by the parser.

In [91]:
out = soup.ul.previous

print(type(out))

out
<class 'bs4.element.NavigableString'>
Out[91]:
'\n'
In [92]:
soup.ul.previous.previous
Out[92]:
'Below are list of Important Sections of Our Website : '
In [93]:
soup.ul.previous.previous.previous
Out[93]:
<p id="sub_para">Below are list of Important Sections of Our Website : </p>
'previous_element' Attribute

The previous_element property works exactly like previous property.

In [94]:
out = soup.ul.previous_element

print(type(out))

out
<class 'bs4.element.NavigableString'>
Out[94]:
'\n'
In [95]:
soup.ul.previous_element.previous_element
Out[95]:
'Below are list of Important Sections of Our Website : '
In [96]:
soup.ul.previous_element.previous_element.previous_element
Out[96]:
<p id="sub_para">Below are list of Important Sections of Our Website : </p>
'previous_elements' Attribute

The previous_elements property returns all Tag objects which were parsed before the given HTML tag on which it was called.

In [97]:
soup.ul.previous_elements
Out[97]:
<generator object PageElement.previous_elements at 0x7f1f51e3d390>
In [98]:
for elem in soup.ul.previous_elements:
    print(type(elem))

list(soup.ul.previous_elements)[:7]
<class 'bs4.element.NavigableString'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
<class 'bs4.element.NavigableString'>
<class 'bs4.element.Tag'>
Out[98]:
['\n',
 'Below are list of Important Sections of Our Website : ',
 <p id="sub_para">Below are list of Important Sections of Our Website : </p>,
 '\n',
 'We regularly publish tutorials on various topics \n    (Python, Machine learning, Data Visualization, Digital Marketing, etc.) regularly explaining \n    how to use various Python libraries.',
 <p id="main_para">We regularly publish tutorials on various topics
     (Python, Machine learning, Data Visualization, Digital Marketing, etc.) regularly explaining
     how to use various Python libraries.</p>,
 '\n']

11. How to Search for Specific HTML Tag in an HTML Document ?

In this section, we'll explain various find_*() methods available from Tag object that can be used to search for particular HTML tag/tags. Below are list of find_*() methods that we'll discuss next.

  • find(name=None, attrs={}, recursive=True, text=None)
  • find_all(name=None, attrs={}, recursive=True, text=None,)
  • find_next(name=None, attrs={}, recursive=True, text=None,)
  • find_all_next(name=None, attrs={}, recursive=True, text=None,)
  • find_previous(name=None, attrs={}, recursive=True, text=None,)
  • find_all_previous(name=None, attrs={}, recursive=True, text=None,)
  • find_parent(name=None, attrs={}, recursive=True, text=None,)
  • find_parents(name=None, attrs={}, recursive=True, text=None,)
  • find_next_sibling(name=None, attrs={}, recursive=True, text=None,)
  • find_previous_sibling(name=None, attrs={}, recursive=True, text=None,)
  • find_next_siblings(name=None, attrs={}, recursive=True, text=None,)
  • find_previous_sibling(name=None, attrs={}, recursive=True, text=None,)

The name parameter is the name of the HTML tag that we are searching for. The attrs parameter accepts dictionary where we specified if we are looking for particular HTML tags that have the specified value set for attributes. We provide a dictionary where the key is the attribute name and the value is the value of that attribute that we are searching for. The text parameter accepts string specifying that we are looking for Tag object which has given string present in its contents. We can also give id parameter to all methods where we specify the id of the HTML tag if we want to retrieve the tag by id.

Find First Tag of the Specified Name

The find() let us find first occurrence of given HTML tag. We can provide tag names, attributes, and text details to it if we want to retrieve a particular HTML tag. Below we have explained the usage of the method with various examples.

In [99]:
out = soup.find("a")

print(type(out))

out
<class 'bs4.element.Tag'>
Out[99]:
<a href="https://coderzcolumn.com/blogs">Blogs</a>
In [100]:
soup.ul.find("a")
Out[100]:
<a href="https://coderzcolumn.com/blogs">Blogs</a>
In [101]:
soup.find(id="start")
Out[101]:
<p id="start">Welcome to CoderzColumn</p>
In [102]:
soup.find("a", text="Tutorials")
Out[102]:
<a href="https://coderzcolumn.com/tutorials">Tutorials</a>
In [103]:
soup.find("script", attrs={"src": "static/script1.js"})
Out[103]:
<script src="static/script1.js"></script>
In [104]:
soup.find("script", attrs={"src": "static/script2.js"})
Out[104]:
<script src="static/script2.js"></script>
In [105]:
soup.find("a", attrs={"href": "https://coderzcolumn.com/about"})
Out[105]:
<a href="https://coderzcolumn.com/about">About</a>

Find All Tags of Specified Name

The find_all() method returns all tags of given name. We can specify attributes and text if we want to retrieve tags that satisfy particular attribute values and text. Below we have explained with a few examples how we can use find_all() method.

In [106]:
out = soup.find_all("a")

print(type(out))

for i in out:
    print(type(i))

out
<class 'bs4.element.ResultSet'>
<class 'bs4.element.Tag'>
<class 'bs4.element.Tag'>
<class 'bs4.element.Tag'>
<class 'bs4.element.Tag'>
Out[106]:
[<a href="https://coderzcolumn.com/blogs">Blogs</a>,
 <a href="https://coderzcolumn.com/tutorials">Tutorials</a>,
 <a href="https://coderzcolumn.com/about">About</a>,
 <a href="https://coderzcolumn.com/contact-us">Contact US</a>]
In [107]:
soup.find_all("li")
Out[107]:
[<li><a href="https://coderzcolumn.com/blogs">Blogs</a></li>,
 <li><a href="https://coderzcolumn.com/tutorials">Tutorials</a></li>,
 <li><a href="https://coderzcolumn.com/about">About</a></li>,
 <li><a href="https://coderzcolumn.com/contact-us">Contact US</a></li>]
In [108]:
soup.find_all(id=["start", "end"])
Out[108]:
[<p id="start">Welcome to CoderzColumn</p>,
 <p id="end">Please feel free to send us mail @ coderzcolumn07@gmail.com if you need any
     information about any article or want us to publish article on particular topic.</p>]
In [109]:
soup.ul.find_all("li", limit=2)
Out[109]:
[<li><a href="https://coderzcolumn.com/blogs">Blogs</a></li>,
 <li><a href="https://coderzcolumn.com/tutorials">Tutorials</a></li>]
In [110]:
soup.find_all("li", text=["Blogs","Tutorials", "About"])
Out[110]:
[<li><a href="https://coderzcolumn.com/blogs">Blogs</a></li>,
 <li><a href="https://coderzcolumn.com/tutorials">Tutorials</a></li>,
 <li><a href="https://coderzcolumn.com/about">About</a></li>]
In [111]:
soup.find_all("li", text=["Blogs","Tutorials", "about"])
Out[111]:
[<li><a href="https://coderzcolumn.com/blogs">Blogs</a></li>,
 <li><a href="https://coderzcolumn.com/tutorials">Tutorials</a></li>]
In [112]:
soup.find_all("a", attrs={"href":"https://coderzcolumn.com/tutorials"})
Out[112]:
[<a href="https://coderzcolumn.com/tutorials">Tutorials</a>]

Find Next Element in Document for Given HTML Tag (Works 'next' Attribute)

The find_next() method works like next property of Tag object.

In [113]:
out = soup.ul.find_next()

print(type(out))

out
<class 'bs4.element.Tag'>
Out[113]:
<li><a href="https://coderzcolumn.com/blogs">Blogs</a></li>
In [114]:
soup.ul.find_next("a")
Out[114]:
<a href="https://coderzcolumn.com/blogs">Blogs</a>
In [115]:
soup.ul.find_next("a", text="Blogs")
Out[115]:
<a href="https://coderzcolumn.com/blogs">Blogs</a>
In [116]:
soup.ul.find_next("a", attrs={"href": "https://coderzcolumn.com/blogs"}, text="Blogs")
Out[116]:
<a href="https://coderzcolumn.com/blogs">Blogs</a>

Find Previous Element in Document for Given HTML Tag (Works 'previous' Attribute)

The find_previous() method works like previous property of Tag object that we had explained in earlier section.

In [117]:
out = soup.ul.find_previous()

print(type(out))

out
<class 'bs4.element.Tag'>
Out[117]:
<p id="sub_para">Below are list of Important Sections of Our Website : </p>
In [118]:
soup.ul.find_previous("p")
Out[118]:
<p id="sub_para">Below are list of Important Sections of Our Website : </p>

Find All Next Elements in Document for Given HTML Tag (Works like 'next_elements' Attribute)

We can retrieve all tags that were parsed after the given tag using find_all_next() method. We can filter tags if we want to retrieve only tags by given name or attributes.

In [119]:
out = soup.ul.find_all_next()

print(type(out))

out
<class 'bs4.element.ResultSet'>
Out[119]:
[<li><a href="https://coderzcolumn.com/blogs">Blogs</a></li>,
 <a href="https://coderzcolumn.com/blogs">Blogs</a>,
 <li><a href="https://coderzcolumn.com/tutorials">Tutorials</a></li>,
 <a href="https://coderzcolumn.com/tutorials">Tutorials</a>,
 <li><a href="https://coderzcolumn.com/about">About</a></li>,
 <a href="https://coderzcolumn.com/about">About</a>,
 <li><a href="https://coderzcolumn.com/contact-us">Contact US</a></li>,
 <a href="https://coderzcolumn.com/contact-us">Contact US</a>,
 <p id="end">Please feel free to send us mail @ coderzcolumn07@gmail.com if you need any
     information about any article or want us to publish article on particular topic.</p>]
In [120]:
soup.ul.find_all_next("a")
Out[120]:
[<a href="https://coderzcolumn.com/blogs">Blogs</a>,
 <a href="https://coderzcolumn.com/tutorials">Tutorials</a>,
 <a href="https://coderzcolumn.com/about">About</a>,
 <a href="https://coderzcolumn.com/contact-us">Contact US</a>]
In [121]:
soup.ul.find_all_next(id="end")
Out[121]:
[<p id="end">Please feel free to send us mail @ coderzcolumn07@gmail.com if you need any
     information about any article or want us to publish article on particular topic.</p>]
In [122]:
soup.ul.find_all_next("a", text=["Blogs", "About"])
Out[122]:
[<a href="https://coderzcolumn.com/blogs">Blogs</a>,
 <a href="https://coderzcolumn.com/about">About</a>]
In [123]:
soup.ul.find_all_next("a", attrs={"href": "https://coderzcolumn.com/blogs"}, text="Blogs")
Out[123]:
[<a href="https://coderzcolumn.com/blogs">Blogs</a>]

Find All Previous Elements in Document for Given HTML Tag (Works like 'previous_elements' Attribute)

The find_all_previous() method returns all tags that were parsed before given HTML tag. We can filter tags by providing tag names or attribute details.

In [124]:
out = soup.ul.find_all_previous()

print(type(out))

out[:5]
<class 'bs4.element.ResultSet'>
Out[124]:
[<p id="sub_para">Below are list of Important Sections of Our Website : </p>,
 <p id="main_para">We regularly publish tutorials on various topics
     (Python, Machine learning, Data Visualization, Digital Marketing, etc.) regularly explaining
     how to use various Python libraries.</p>,
 <p id="start">Welcome to CoderzColumn</p>,
 <body>
 <p id="start">Welcome to CoderzColumn</p>
 <p id="main_para">We regularly publish tutorials on various topics
     (Python, Machine learning, Data Visualization, Digital Marketing, etc.) regularly explaining
     how to use various Python libraries.</p>
 <p id="sub_para">Below are list of Important Sections of Our Website : </p>
 <ul>
 <li><a href="https://coderzcolumn.com/blogs">Blogs</a></li>
 <li><a href="https://coderzcolumn.com/tutorials">Tutorials</a></li>
 <li><a href="https://coderzcolumn.com/about">About</a></li>
 <li><a href="https://coderzcolumn.com/contact-us">Contact US</a></li>
 </ul>
 <p id="end">Please feel free to send us mail @ coderzcolumn07@gmail.com if you need any
     information about any article or want us to publish article on particular topic.</p>
 </body>,
 <link href="static/stylesheet.css" rel="stylesheet" type="text/css"/>]
In [125]:
soup.ul.find_all_previous("p")
Out[125]:
[<p id="sub_para">Below are list of Important Sections of Our Website : </p>,
 <p id="main_para">We regularly publish tutorials on various topics
     (Python, Machine learning, Data Visualization, Digital Marketing, etc.) regularly explaining
     how to use various Python libraries.</p>,
 <p id="start">Welcome to CoderzColumn</p>]
In [126]:
soup.ul.find_all_previous(id="start")
Out[126]:
[<p id="start">Welcome to CoderzColumn</p>]
In [127]:
soup.ul.find_all_previous("script")
Out[127]:
[<script src="static/script2.js"></script>,
 <script src="static/script1.js"></script>]
In [128]:
soup.ul.find_all_previous("p", text="Welcome to CoderzColumn")
Out[128]:
[<p id="start">Welcome to CoderzColumn</p>]
In [129]:
soup.ul.find_all_previous("p", text=["Welcome to CoderzColumn",
                                     "Below are list of Important Sections of Our Website : "])
Out[129]:
[<p id="sub_para">Below are list of Important Sections of Our Website : </p>,
 <p id="start">Welcome to CoderzColumn</p>]
In [130]:
soup.ul.find_all_previous("p", attrs={"id":"start"})
Out[130]:
[<p id="start">Welcome to CoderzColumn</p>]

Find Parent of Given HTML Tag (Works like 'parent' Attribute)

We can retrieve the parent of the given HTML tag using find_parent() method which works like parent property we explained earlier.

In [133]:
soup.li.find_parent()
Out[133]:
<ul>
<li><a href="https://coderzcolumn.com/blogs">Blogs</a></li>
<li><a href="https://coderzcolumn.com/tutorials">Tutorials</a></li>
<li><a href="https://coderzcolumn.com/about">About</a></li>
<li><a href="https://coderzcolumn.com/contact-us">Contact US</a></li>
</ul>
In [137]:
soup.a.find_parent(name="li")
Out[137]:
<li><a href="https://coderzcolumn.com/blogs">Blogs</a></li>

Find Parents of Given HTML Tag (Works like 'parents' Attribute)

The find_parents() method returns list of all parents of given HTML tag. We can filter parents by specifying the name and attribute details in the method.

In [138]:
soup.a.find_parents(name="li")
Out[138]:
[<li><a href="https://coderzcolumn.com/blogs">Blogs</a></li>]
In [139]:
soup.a.find_parents(name=["li","ul"])
Out[139]:
[<li><a href="https://coderzcolumn.com/blogs">Blogs</a></li>,
 <ul>
 <li><a href="https://coderzcolumn.com/blogs">Blogs</a></li>
 <li><a href="https://coderzcolumn.com/tutorials">Tutorials</a></li>
 <li><a href="https://coderzcolumn.com/about">About</a></li>
 <li><a href="https://coderzcolumn.com/contact-us">Contact US</a></li>
 </ul>]
In [143]:
all_parents = soup.a.find_parents()

print("Total Parents : {}".format(len(all_parents)))
Total Parents : 5
In [144]:
all_parents[:3]
Out[144]:
[<li><a href="https://coderzcolumn.com/blogs">Blogs</a></li>,
 <ul>
 <li><a href="https://coderzcolumn.com/blogs">Blogs</a></li>
 <li><a href="https://coderzcolumn.com/tutorials">Tutorials</a></li>
 <li><a href="https://coderzcolumn.com/about">About</a></li>
 <li><a href="https://coderzcolumn.com/contact-us">Contact US</a></li>
 </ul>,
 <body>
 <p id="start">Welcome to CoderzColumn</p>
 <p id="main_para">We regularly publish tutorials on various topics
     (Python, Machine learning, Data Visualization, Digital Marketing, etc.) regularly explaining
     how to use various Python libraries.</p>
 <p id="sub_para">Below are list of Important Sections of Our Website : </p>
 <ul>
 <li><a href="https://coderzcolumn.com/blogs">Blogs</a></li>
 <li><a href="https://coderzcolumn.com/tutorials">Tutorials</a></li>
 <li><a href="https://coderzcolumn.com/about">About</a></li>
 <li><a href="https://coderzcolumn.com/contact-us">Contact US</a></li>
 </ul>
 <p id="end">Please feel free to send us mail @ coderzcolumn07@gmail.com if you need any
     information about any article or want us to publish article on particular topic.</p>
 </body>]

Find Next Sibling of Given HTML Tag (Works like 'next_sibling' Attribute)

We can used find_next_sibling() method to retrieve sibling that was parsed after given HTML tag. It works like next_sibling property of Tag object.

In [159]:
soup.li.find_next_sibling()
Out[159]:
<li><a href="https://coderzcolumn.com/tutorials">Tutorials</a></li>
In [160]:
soup.p.find_next_sibling()
Out[160]:
<p id="main_para">We regularly publish tutorials on various topics
    (Python, Machine learning, Data Visualization, Digital Marketing, etc.) regularly explaining
    how to use various Python libraries.</p>

Find Previous Sibling of Given HTML Tag (Works like 'previous_sibling' Attribute)

The find_previous_sibling() method returns sibling that was parsed before given HTML tag. It works like previous_sibling property of Tag object.

In [169]:
soup.find(id="end").find_previous_sibling()
Out[169]:
<ul>
<li><a href="https://coderzcolumn.com/blogs">Blogs</a></li>
<li><a href="https://coderzcolumn.com/tutorials">Tutorials</a></li>
<li><a href="https://coderzcolumn.com/about">About</a></li>
<li><a href="https://coderzcolumn.com/contact-us">Contact US</a></li>
</ul>
In [170]:
soup.ul.find_previous_sibling()
Out[170]:
<p id="sub_para">Below are list of Important Sections of Our Website : </p>

Find All Next Siblings of Given HTML Tag (Works like 'next_siblings' Attribute)

The find_next_siblings() method retrieves all siblings of given HTML tag that were parsed after it. We can provide tag name and attribute details if we want to filter siblings based on those details.

In [171]:
soup.p.find_next_siblings()
Out[171]:
[<p id="main_para">We regularly publish tutorials on various topics
     (Python, Machine learning, Data Visualization, Digital Marketing, etc.) regularly explaining
     how to use various Python libraries.</p>,
 <p id="sub_para">Below are list of Important Sections of Our Website : </p>,
 <ul>
 <li><a href="https://coderzcolumn.com/blogs">Blogs</a></li>
 <li><a href="https://coderzcolumn.com/tutorials">Tutorials</a></li>
 <li><a href="https://coderzcolumn.com/about">About</a></li>
 <li><a href="https://coderzcolumn.com/contact-us">Contact US</a></li>
 </ul>,
 <p id="end">Please feel free to send us mail @ coderzcolumn07@gmail.com if you need any
     information about any article or want us to publish article on particular topic.</p>]
In [172]:
soup.p.find_next_siblings(name="p")
Out[172]:
[<p id="main_para">We regularly publish tutorials on various topics
     (Python, Machine learning, Data Visualization, Digital Marketing, etc.) regularly explaining
     how to use various Python libraries.</p>,
 <p id="sub_para">Below are list of Important Sections of Our Website : </p>,
 <p id="end">Please feel free to send us mail @ coderzcolumn07@gmail.com if you need any
     information about any article or want us to publish article on particular topic.</p>]
In [173]:
soup.p.find_next_siblings(id="end")
Out[173]:
[<p id="end">Please feel free to send us mail @ coderzcolumn07@gmail.com if you need any
     information about any article or want us to publish article on particular topic.</p>]
In [176]:
soup.p.find_next_siblings("ul")
Out[176]:
[<ul>
 <li><a href="https://coderzcolumn.com/blogs">Blogs</a></li>
 <li><a href="https://coderzcolumn.com/tutorials">Tutorials</a></li>
 <li><a href="https://coderzcolumn.com/about">About</a></li>
 <li><a href="https://coderzcolumn.com/contact-us">Contact US</a></li>
 </ul>]

Find All Previous Siblings of Given HTML Tag (Works like 'previous_siblings' Attribute)

The find_previous_siblings() method retrieves all siblings of a given HTML tag that were parsed before it. We can provide tag name and attribute details if we want to filter siblings based on those details.

In [177]:
soup.find(id="end")
Out[177]:
<p id="end">Please feel free to send us mail @ coderzcolumn07@gmail.com if you need any
    information about any article or want us to publish article on particular topic.</p>
In [178]:
soup.find(id="end").find_previous_siblings(name="p")
Out[178]:
[<p id="sub_para">Below are list of Important Sections of Our Website : </p>,
 <p id="main_para">We regularly publish tutorials on various topics
     (Python, Machine learning, Data Visualization, Digital Marketing, etc.) regularly explaining
     how to use various Python libraries.</p>,
 <p id="start">Welcome to CoderzColumn</p>]
In [179]:
soup.find(id="end").find_previous_siblings(id="start")
Out[179]:
[<p id="start">Welcome to CoderzColumn</p>]
In [180]:
soup.find(id="end").find_previous_siblings("ul")
Out[180]:
[<ul>
 <li><a href="https://coderzcolumn.com/blogs">Blogs</a></li>
 <li><a href="https://coderzcolumn.com/tutorials">Tutorials</a></li>
 <li><a href="https://coderzcolumn.com/about">About</a></li>
 <li><a href="https://coderzcolumn.com/contact-us">Contact US</a></li>
 </ul>]

This ends our small tutorial explaining how we can parse HTML doc and retrieve information about various HTML tags using beautifulsoup library. Please feel free to let us know your views in the comments section.

References


  Support Us to Make a Difference

Thank You for visiting our website. If you like our work, please support us so that we can keep on creating new tutorials/blogs on interesting topics (like AI, ML, Data Science, Python, Digital Marketing, SEO, etc.) that can help people learn new things faster. You can support us by clicking on the Coffee button at the bottom right corner. We would appreciate even if you can give a thumbs-up to our article in the comments section below.

 Want to Share Your Views? Have Any Suggestions?

If you want to

  • provide some suggestions on topic
  • share your views
  • include some details in tutorial
  • suggest some new topics on which we should create tutorials/blogs
Please feel free to let us know in the comments section below (Guest Comments are allowed). We appreciate and value your feedbacks.



Sunny Solanki  Sunny Solanki