使用BeautifulSoup库爬取疫情数据


这里请求的是丁香园官网的数据

第一步,导入库

import requests
import json
from bs4 import BeautifulSoup
import re

第二步,爬取数据,解析

response = requests.get('https://ncov.dxy.cn/ncovh5/view/pneumonia')
home_page = response.content.decode()

soup = BeautifulSoup(home_page, 'lxml')
script = soup.find(id='getAreaStat')
text = script.text

第三步,转换json

json_str = re.findall(r'\[.+\]', text)[0]

第五步,把json转换成python格式写入文件

last_day_corona_virus = json.loads(json_str)

with open('./last_day_corona_virus.json', 'w',encoding='utf8') as fp:
    json.dump(last_day_corona_virus, fp,ensure_ascii=False)