使用BeautifulSoup库爬取疫情数据
这里请求的是丁香园官网的数据
第一步,导入库
import requests import json from bs4 import BeautifulSoup import re
第二步,爬取数据,解析
response = requests.get('https://ncov.dxy.cn/ncovh5/view/pneumonia') home_page = response.content.decode() soup = BeautifulSoup(home_page, 'lxml') script = soup.find(id='getAreaStat') text = script.text
第三步,转换json
json_str = re.findall(r'\[.+\]', text)[0]
第五步,把json转换成python格式写入文件
last_day_corona_virus = json.loads(json_str) with open('./last_day_corona_virus.json', 'w',encoding='utf8') as fp: json.dump(last_day_corona_virus, fp,ensure_ascii=False)