python – 如何逐步写入json文件

前端之家收集整理的这篇文章主要介绍了python – 如何逐步写入json文件前端之家小编觉得挺不错的,现在分享给大家,也给大家做个参考。
我正在编写一个程序,它要求我生成一个非常大的json文件.我知道传统的方法是使用json.dump()转储字典列表,但是列表太大了,即使总内存交换空间在转储之前也无法保存它.无论如何将它流式传输到json文件中,即将数据逐步写入json文件

解决方法

我知道这已经晚了一年,但问题仍然存在,我很惊讶 json.iterencode()没有被提及.

在这个例子中,iterencode的潜在问题是你希望通过使用生成器对大数据集进行迭代处理,而json编码不会序列化生成器.

解决这个问题的方法是子类列表类型并覆盖__iter__魔术方法,以便您可以生成生成器的输出.

以下是此列表子类的示例.

  1. class StreamArray(list):
  2. """
  3. Converts a generator into a list object that can be json serialisable
  4. while still retaining the iterative nature of a generator.
  5.  
  6. IE. It converts it to a list without having to exhaust the generator
  7. and keep it's contents in memory.
  8. """
  9. def __init__(self,generator):
  10. self.generator = generator
  11. self._len = 1
  12.  
  13. def __iter__(self):
  14. self._len = 0
  15. for item in self.generator:
  16. yield item
  17. self._len += 1
  18.  
  19. def __len__(self):
  20. """
  21. Json parser looks for a this method to confirm whether or not it can
  22. be parsed
  23. """
  24. return self._len

从这里开始使用非常简单.获取生成器句柄,将其传递到StreamArray类,将流数组对象传递给iterencode()并迭代块.块将是json格式输出,可以直接写入文件.

用法示例:

  1. #Function that will iteratively generate a large set of data.
  2. def large_list_generator_func():
  3. for i in xrange(5):
  4. chunk = {'hello_world': i}
  5. print 'Yielding chunk: ',chunk
  6. yield chunk
  7.  
  8. #Write the contents to file:
  9. with open('/tmp/streamed_write.json','w') as outfile:
  10. large_generator_handle = large_list_generator_func()
  11. stream_array = StreamArray(large_generator_handle)
  12. for chunk in json.JSONEncoder().iterencode(stream_array):
  13. print 'Writing chunk: ',chunk
  14. outfile.write(chunk)

显示产量和写入的输出连续发生.

  1. Yielding chunk: {'hello_world': 0}
  2. Writing chunk: [
  3. Writing chunk: {
  4. Writing chunk: "hello_world"
  5. Writing chunk: :
  6. Writing chunk: 0
  7. Writing chunk: }
  8. Yielding chunk: {'hello_world': 1}
  9. Writing chunk:,Writing chunk: {
  10. Writing chunk: "hello_world"
  11. Writing chunk: :
  12. Writing chunk: 1
  13. Writing chunk: }

猜你在找的Python相关文章