问题描述
我一直在研究 dc 和 crossfilter js,我目前有一个包含 550,000 行和 60mb 大小的 csv 的大型数据集,并且面临很多问题,例如浏览器崩溃等
I have been working on dc and crossfilter js and I currently have a large dataset with 550,000 rows and size 60mb csv and am facing a lot of issues with it like browser crashes etc
所以,我试图了解 dc 和 crossfilter 如何处理大型数据集.http://dc-js.github.io/dc.js/
So , I'm trying to understand how dc and crossfilter deals with large datasets. http://dc-js.github.io/dc.js/
他们主站点上的示例运行非常顺利,在查看时间线->内存(在控制台中)后,它达到最大 34 mb,并随着时间慢慢减少
The example on their main site runs very smoothly and after seeing timelines->memory (in console) it goes to a max of 34 mb and slowly reduces with time
我的项目在加载 json 文件并呈现整个可视化时,每个下拉选择占用 300-500mb 范围内的内存
My project is taking up memory in the range of 300-500mb per dropdown selection, when it loads a json file and renders the entire visualization
那么,两个问题
- dc 站点示例的后端是什么?是否可以找到确切的后端文件?
- 如何减少我的应用程序在我的 RAM 上的数据过载,该应用程序运行非常缓慢并最终崩溃?
推荐答案
您好,您可以尝试运行加载数据,并在服务器上对其进行过滤.当我的数据集太大而浏览器无法处理时,我遇到了类似的问题.几周前我发布了一个关于实施相同的问题.在客户端使用 dc.js 和 crossfilter在服务器上
Hi you can try running loading the data, and filtering it on the server. I faced a similar problem when the size of my dataset was being too big for the browser to handle. I posted a question a few weeks back as to implementing the same. Using dc.js on the clientside with crossfilter on the server
这里是关于它的概述.
在客户端,您希望创建具有 dc.js 所期望的基本功能的假维度和假组(https://github.com/dc-js/dc.js/wiki/FAQ#filter-the-data-before-它的图表).您可以在客户端创建 dc.js 图表,并在需要的地方插入虚假维度和组.
On the client side, you'd want to create fake dimensions and fake groups that have basic functionality that dc.js expects(https://github.com/dc-js/dc.js/wiki/FAQ#filter-the-data-before-its-charted). You create your dc.js charts on the client side and plug in the fake dimensions and groups wherever required.
现在在服务器端你有交叉过滤器运行(https://www.npmjs.org/package/交叉过滤器).您可以在此处创建实际维度和组.
Now on the server side you have crossfilter running(https://www.npmjs.org/package/crossfilter). You create your actual dimensions and groups here.
fakedimensions 有一个 .filter()
函数,它基本上向服务器发送一个 ajax 请求以执行实际过滤.过滤信息可以以查询字符串的形式编码.你还需要一个 .all()
在你的假组上的函数来返回过滤的结果.
The fakedimensions have a .filter()
function that basically sends an ajax request to the server to perform the actual filtering. The filtering information could be encoded in the form of a query string. You'd also need a .all()
function on your fake group to return the results of the filtering.
这篇关于具有大型数据集的 DC 和交叉过滤器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!