专栏名称: Dance with GenAI

关于生成式人工智能AIGC的一切

AI炒股-用kimi批量爬取网易财经的要闻板块

Dance with GenAI · 公众号 · · 2024-06-01 07:26

正文

工作任务和目标：批量爬取网易财经的要闻板块

在class="tab_body current"的div标签中；

标题和链接在： https://www.163.com/dy/article/J2UI O5DD051188EA.html ">华为急需找到“松弛感”

第一步，在kimi中输入如下提示词：

你是一个Python爬虫专家，完成以下网页爬取的Python脚本任务：

在F:\aivideo文件夹里面新建一个Excel文件：163money.xlsx

设置chromedriver的路径为："D:\Program Files\chromedriver125\chromedriver.exe"

用selenium打开网页： https:// money.163.com/ ；

请求标头：

:authority:

http:// money.163.com

:method:

GET

:path:

:scheme:

https

Accept:

text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7

Accept-Encoding:

gzip, deflate, br, zstd

Accept-Language:

zh-CN,zh;q=0.9,en;q=0.8

Cache-Control:

max-age=0

Cookie:

__root_domain_v=.163.com; _qddaz=QD.484716194472545; _ntes_origin_from=sogou; _ntes_nuid=af525edefbd3d363cd5876dbe902d85b; s_n_f_l_n3=01cb466d244b7db51716517702302; _antanalysis_s_id=1716517702995; UserProvince=%u5168%u56FD; ne_analysis_trace_id=1716518441938; vinfo_n_f_l_n3=01cb466d244b7db5.1.0.1716517702302.0.1716518672850

If-Modified-Since:

Fri, 24 May 2024 02:40:05 GMT

Priority:

u=0, i

Referer:

https://www. sogou.com/link? url=hedJjaC291M4oaTBlXc5yCiioa5eDVgw

Sec-Ch-Ua:

"Google Chrome";v="125", "Chromium";v="125", "Not.A/Brand";v="24"

Sec-Ch-Ua-Mobile:

Sec-Ch-Ua-Platform:

"Windows"

Sec-Fetch-Dest:

document

Sec-Fetch-Mode:

navigate

Sec-Fetch-Site:

cross-site

Sec-Fetch-User:

Upgrade-Insecure-Requests:

User-Agent:

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36

解析源代码，并打印输出；

定位class="tab_body current"的div标签；

在div标签中定位所有的a标签，提取a标签的href作为网页下载URL，保存到163money.xlsx的第2列；

提取a标签的文本内容作为网页文件名，保存到163money.xlsx的第1列；

注意：

每一步都要输出相关信息到屏幕；

每解析一个网页随机暂停1-10秒；

设置请求头，来应对反爬虫机制；

第二步，查看审阅Kimi生成的Python代码：

import pandas as pd

import time

import random

from selenium import webdriver

from selenium.webdriver.chrome.service import Service

from webdriver_manager.chrome import ChromeDriverManager

from selenium.webdriver.common.by import By

from selenium.common.exceptions import StaleElementReferenceException

# 设置chromedriver的路径

service = Service(executable_path=r"D:\Program Files\chromedriver125\chromedriver.exe")

# 使用selenium的ChromeDriverManager来自动管理chromedriver

driver = webdriver.Chrome(service=service)

# 打开网页

url = " https:// money.163.com/ "

driver.get(url)

# 打印网页源代码

print("网页源代码：")

print(driver.page_source)

# 定位class="tab_body current"的div标签

try:

tab_body = driver.find_element(By.CLASS_NAME, "tab_body.current")

print("定位到class='tab_body current'的div标签")

except Exception as e:

print("定位class='tab_body current'的div标签时出错：", e)

driver.quit()

# 定位所有的a标签并提取信息

a_tags = tab_body.find_elements(By.TAG_NAME, "a")

请到「今天看啥」查看全文

推荐文章

广州淘房志 · 广州自拆自建第一村，村民要自掏腰包60w买新房？

2 天前

拆神 · 339万起！白鹅潭江景新盘开价

2 天前

财宝宝 · @金籉霷：1995年，付了4000块电话初装费，现在想想，宛若制-20250220225714

3 天前

中科院物理所 · 一颗小行星7年内或将撞击地球？别害怕，也不是第一次了...

3 天前

财宝宝 · @财虹虹：虹虹今天还是不想吃这碗狗粮，继续打，把狗盆盖在菜菜萌萌-20250220112953

3 天前

健康 · 垃圾食品有“解药”！后悔这么晚才知道！

7 年前

新生大学 · 刻奇陷阱：你为何总是热泪盈眶？

7 年前

BIE别的 · VICE可以说很健康了 | 情绪低落的人有一半儿身体也疼

7 年前

济宁大众网 · 辟谣！济宁朋友圈疯传“2017新交规的最新处罚7月1日施行”的真相是……

7 年前

在青岛 · 青岛买房的注意了！这个“坑”涉及户口，危害超乎你想象…

7 年前