.contents 和 .children


tag的 .contents 属性可以将tag的子节点以列表的方式输出:

 1 head_tag = soup.head
 2 head_tag
 3 # The Dormouse's story
 4 
 5 head_tag.contents
 6 [The Dormouse<span style="color: rgba(128, 0, 0, 1)">'</span><span style="color: rgba(128, 0, 0, 1)">s story]
 7 
 8 title_tag = head_tag.contents[0]
 9 title_tag
10 # The Dormouse's story
11 title_tag.contents
12 # [u'The Dormouse's story']

BeautifulSoup 对象本身一定会包含子节点,也就是说标签也是 BeautifulSoup 对象的子节点:

len(soup.contents)
# 1
soup.contents[0].name
# u'html'

字符串没有 .contents 属性,因为字符串没有子节点:

text = title_tag.contents[0]
text.contents
# AttributeError: 'NavigableString' object has no attribute 'contents'

通过tag的 .children 生成器,可以对tag的子节点进行循环:

for child in title_tag.children:
    print(child)
    # The Dormouse's story