Google Chrome 扩展中的网页抓取(JavaScript + Chrome API)

2023-05-15 前端开发问题跟版网

Web Scraping in a Google Chrome Extension (JavaScript + Chrome APIs)(Google Chrome 扩展中的网页抓取(JavaScript + Chrome API))

本文介绍了Google Chrome 扩展中的网页抓取(JavaScript + Chrome API)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着跟版网的小编来一起学习吧！

问题描述

限时送ChatGPT账号..

使用 JavaScript 和任何其他可用技术执行 从 Google Chrome 扩展程序中对当前未打开的标签页进行网页抓取 的最佳选项是什么?也接受其他 JavaScript 库.

What are the best options for performing Web Scraping of a not currently open tab from within a Google Chrome Extension with JavaScript and whatever more technologies are available. Other JavaScript-libraries are also accepted.

重要的是掩盖抓取行为，使其表现得像正常的网络请求.没有 AJAX 或 XMLHttpRequest 的迹象，例如 X-Requested-With: XMLHttpRequest 或 Origin.

The important thing is to mask the scraping to behave like a normal web-request. No indications of AJAX or XMLHttpRequest, like X-Requested-With: XMLHttpRequest or Origin.

必须可以从 JavaScript 访问抓取的内容，以便在扩展程序中进行进一步操作和呈现，最有可能作为字符串.

The scraped content must be accessible from JavaScript for further manipulation and presentation within the extension, most probably as a string.

在任何 WebKit/Chrome 特定的 API 中是否有任何钩子可用于发出正常的网络请求并获取操作结果?

Are there any hooks in any WebKit/Chrome-specific API:s that can be used to make a normal web-request and get the results for manipulation?

var pageContent = getPageContent(url); // TODO: Implement
var items = $(pageContent).find('.item');
// Display items with further selections

使用磁盘上的本地文件进行这项工作的奖励积分，用于初始调试.但如果这是唯一的一点就是停止解决方案，那么请忽略奖励积分.

Bonus-points to make this work from a local file on disk, for initial debugging. But if that is the only point is stopping a solution, then disregard the bonus-points.

推荐答案

尝试使用 XHR2 responseType = "document" 并使用 (new DOMParser).parseFromString(responseText, getResponseHeader("Content-Type"))a href="https://gist.github.com/1129031" rel="noreferrer">我的 text/html 补丁.有关我如何检测 responseType 的示例，请参阅 https://gist.github.com/1138724= "document 支持(在从 text/html blob 创建的对象 URL 上同步检查 response === null).

Attempt to use XHR2 responseType = "document" and fall back on (new DOMParser).parseFromString(responseText, getResponseHeader("Content-Type")) with my text/html patch. See https://gist.github.com/1138724 for an example of how I detect responseType = "document support (synchronously checking response === null on an object URL created from a text/html blob).

使用 Chrome WebRequest API 隐藏 X-Requested-With 等标题.

Use the Chrome WebRequest API to hide X-Requested-With, etc. headers.

这篇关于Google Chrome 扩展中的网页抓取(JavaScript + Chrome API)的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持跟版网！

本站部分内容来源互联网,如果有图片或者内容侵犯了您的权益，请联系我们，我们会在确认后第一时间进行删除！

上一篇：如何使用 JSONP 克服 XSS 问题? 下一篇：http请求正文是什么意思?

相关文档推荐

SCRIPT5:在 IE9 中对 xmlhttprequest 的访问被拒绝

SCRIPT5: Access is denied in IE9 on xmlhttprequest(SCRIPT5:在 IE9 中对 xmlhttprequest 的访问被拒绝)

XMLHttpRequest 模块未定义/未找到

XMLHttpRequest module not defined/found(XMLHttpRequest 模块未定义/未找到)

显示使用 XHR2/AJAX 下载文件的进度条

Show a progress bar for downloading files using XHR2/AJAX(显示使用 XHR2/AJAX 下载文件的进度条)

如何在没有 jQuery 的情况下在 JavaScript 中打开 JSON 文件?

How can I open a JSON file in JavaScript without jQuery?(如何在没有 jQuery 的情况下在 JavaScript 中打开 JSON 文件?)

“Access-Control-Allow-Origin 不允许 Origin null"在铬.为什么?

quot;Origin null is not allowed by Access-Control-Allow-Originquot; in Chrome. Why?(“Access-Control-Allow-Origin 不允许 Origin null在铬.为什么?)

如何在 XMLHttpRequest 中获取响应 url?

How to get response url in XMLHttpRequest?(如何在 XMLHttpRequest 中获取响应 url?)

栏目导航

前端开发问题 Java开发问题 C/C++开发问题 Python开发问题 C#/.NET开发问题 php开发问题移动开发问题数据库问题

最新文章

热门文章

热门标签

五金机械教育培训机械设备环保公司新闻资讯服装服饰营销型轴承电子元件零部件电子科技电子产品环保科技培训机构电子商城双语中英双语织梦模板 dede 外语学校竞价网站源码竞价培训网门户网站织梦笑话网 dedecms笑话网织梦源码网站建设搞笑图片织梦教程旅游网站源码织梦旅游网学校培训 html5 企业织梦源码医院源码后台样式移动营销页 chatgpt 整形医院大学医院新手建站客服代码洗衣机维修企业网站淘宝客导航菜单教育网站学校源码装修网站装修模板美容整形女性健康妈妈网机械源码建站公司珠宝首饰苹果网站手机资讯管理平台织梦模版打包妇科源码安卓市场源码男性时尚网健康之家 app应用网站笑话网站下载站车辆管理系统中医院网站家装网站源码