构建基于山西招生网的数据采集与分析系统
pip install requests beautifulsoup4
import requests
url = 'https://www.sxzs.com/'
response = requests.get(url)
html_content = response.text
print(html_content[:500]) # 打印前500个字符
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_content, 'html.parser')
school_names = [a.text for a in soup.find_all('a') if 'school' in a.get('href', '')]
print(school_names)
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
}
response = requests.get(url, headers=headers)
import time
time.sleep(2) # 每次请求后等待2秒
import csv
with open('schools.csv', mode='w', newline='', encoding='utf-8') as file:
writer = csv.writer(file)
writer.writerow(['School Name'])
writer.writerows([[name] for name in school_names])
本站知识库部分内容及素材来源于互联网,如有侵权,联系必删!