python 包之 PyQuery 网页解析教程


一、安装

  • 是一个非常强大又灵活的网页解析库

  • PyQuery 是 Python 仿照 jQuery 的严格实现

  • 语法与 jQuery 几乎完全相同,更多操作可以参考jQuery

pip install pyquery

二、字符串初始化

html = '''

'''

from pyquery import PyQuery as pq

doc = pq(html)
print(doc)
print(type(doc))
print(doc('li'))

三、url初始化

from pyquery import PyQuery as pq

doc = pq(url="http://www.baidu.com", encoding='utf-8')
print(doc('head')

四、文件初始化

from pyquery import PyQuery as pq

doc = pq(filename='index.html')
print(doc)

五、css选择器

html = '''

'''

from pyquery import PyQuery as pq

doc = pq(html)
print(doc('#container .fadeIn'))

六、查找子元素

html = '''

'''

from pyquery import PyQuery as pq

doc = pq(html)
items = doc('#container')
lis = items.find('li')
print(type(lis))
print(lis)

七、兄弟元素

html = '''

'''

from pyquery import PyQuery as pq

doc = pq(html)
div = doc('#container .post-thumb')
print(div.siblings())

八、获取属性

html = '''

'''

from pyquery import PyQuery as pq

doc = pq(html)
a = doc('#container .post-content a')
print(a)
print(a.attr('href'))
print(a.attr.href)

九、获取文本

html = '''

'''

from pyquery import PyQuery as pq

doc = pq(html)
a = doc('#container .post-content a').text()
print(a)

十、类操作

html = '''

'''

from pyquery import PyQuery as pq

doc = pq(html)
li = doc('#container li')
print(li)
li.removeClass('fadeIn')
print(li)
li.addClass('fadeIn')
print(li)

相关