在 .NET 中解析大型 JSON 文件

Parsing large JSON file in .NET(在 .NET 中解析大型 JSON 文件)
本文介绍了在 .NET 中解析大型 JSON 文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

问题描述

到目前为止,我已经使用了 Json.NET 的JsonConvert.Deserialize(json)"方法,效果很好,老实说,我不需要更多的东西.

I have used the "JsonConvert.Deserialize(json)" method of Json.NET so far which worked quite well and to be honest, I didn't need anything more than this.

我正在开发一个后台(控制台)应用程序,该应用程序不断从不同的 URL 下载 JSON 内容,然后将结果反序列化为 .NET 对象列表.

I am working on a background (console) application which constantly downloads the JSON content from different URLs, then deserializes the result into a list of .NET objects.

 using (WebClient client = new WebClient())
 {
      string json = client.DownloadString(stringUrl);

      var result = JsonConvert.DeserializeObject<List<Contact>>(json);

 }

上面的简单代码片段可能看起来并不完美,但它确实可以完成工作.当文件很大(15,000 个联系人 - 48 MB 文件)时,JsonConvert.DeserializeObject 不是解决方案,并且该行会引发 JsonReaderException 异常类型.

The simple code snippet above doesn't probably seem perfect, but it does the job. When the file is large (15,000 contacts - 48 MB file), JsonConvert.DeserializeObject isn't the solution and the line throws an exception type of JsonReaderException.

下载的 JSON 内容是一个数组,这就是示例的样子.Contact 是反序列化 JSON 对象的容器类.

The downloaded JSON content is an array and this is how a sample looks like. Contact is a container class for the deserialized JSON object.

[
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  }
]

我最初的猜测是内存不足.只是出于好奇,我尝试将其解析为 JArray,这也导致了同样的异常.

My initial guess is it runs out of memory. Just out of curiosity, I tried to parse it as JArray which caused the same exception too.

我已经开始深入研究 Json.NET 文档并阅读类似的主题.由于我还没有设法产生一个可行的解决方案,我决定在这里发布一个问题.

I have started to dive into Json.NET documentation and read similar threads. As I haven't managed to produce a working solution yet, I decided to post a question here.

更新:在逐行反序列化时,我得到了同样的错误:[.Path '', line 600003, position 1."所以下载了其中两个并在记事本++中检查了它们.我注意到如果数组长度超过 12,000,则在第 12000 个元素之后,["关闭,另一个数组开始.换句话说,JSON 看起来就像这样:

UPDATE: While deserializing line by line, I got the same error: " [. Path '', line 600003, position 1." So downloaded two of them and checked them in Notepad++. I noticed that if the array length is more than 12,000, after 12000th element, the "[" is closed and another array starts. In other words, the JSON looks exactly like this:

[
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  }
]
[
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  }
]

推荐答案

正如您在更新中正确诊断的那样,问题是 JSON 有一个结束 ] 紧跟一个开始 [ 开始下一组.这种格式在整体上会使 JSON 无效,这就是 Json.NET 抛出错误的原因.

As you've correctly diagnosed in your update, the issue is that the JSON has a closing ] followed immediately by an opening [ to start the next set. This format makes the JSON invalid when taken as a whole, and that is why Json.NET throws an error.

幸运的是,这个问题似乎经常出现,以至于 Json.NET 实际上有一个特殊的设置来处理它.如果直接使用 JsonTextReader 读取 JSON,可以将 SupportMultipleContent 标志设置为 true,然后使用循环反序列化每个项个人.

Fortunately, this problem seems to come up often enough that Json.NET actually has a special setting to deal with it. If you use a JsonTextReader directly to read the JSON, you can set the SupportMultipleContent flag to true, and then use a loop to deserialize each item individually.

这应该允许您以高效的内存方式成功处理非标准 JSON,而不管有多少数组或每个数组中有多少项.

This should allow you to process the non-standard JSON successfully and in a memory efficient manner, regardless of how many arrays there are or how many items in each array.

    using (WebClient client = new WebClient())
    using (Stream stream = client.OpenRead(stringUrl))
    using (StreamReader streamReader = new StreamReader(stream))
    using (JsonTextReader reader = new JsonTextReader(streamReader))
    {
        reader.SupportMultipleContent = true;

        var serializer = new JsonSerializer();
        while (reader.Read())
        {
            if (reader.TokenType == JsonToken.StartObject)
            {
                Contact c = serializer.Deserialize<Contact>(reader);
                Console.WriteLine(c.FirstName + " " + c.LastName);
            }
        }
    }

完整演示:https://dotnetfiddle.net/2TQa8p

这篇关于在 .NET 中解析大型 JSON 文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

本站部分内容来源互联网,如果有图片或者内容侵犯了您的权益,请联系我们,我们会在确认后第一时间进行删除!

相关文档推荐

Force JsonConvert.SerializeXmlNode to serialize node value as an Integer or a Boolean(强制 JsonConvert.SerializeXmlNode 将节点值序列化为整数或布尔值)
Using JSON to Serialize/Deserialize TimeSpan(使用 JSON 序列化/反序列化 TimeSpan)
Could not determine JSON object type for type quot;Classquot;(无法确定类型“Class的 JSON 对象类型.)
How to deserialize a JSONP response (preferably with JsonTextReader and not a string)?(如何反序列化 JSONP 响应(最好使用 JsonTextReader 而不是字符串)?)
how to de-serialize JSON data in which Timestamp it-self contains fields?(如何反序列化时间戳本身包含字段的JSON数据?)
JSON.Net custom contract serialization and Collections(JSON.Net 自定义合约序列化和集合)