<small id='9NhgM'></small><noframes id='9NhgM'>

  1. <legend id='9NhgM'><style id='9NhgM'><dir id='9NhgM'><q id='9NhgM'></q></dir></style></legend>
      • <bdo id='9NhgM'></bdo><ul id='9NhgM'></ul>

      <i id='9NhgM'><tr id='9NhgM'><dt id='9NhgM'><q id='9NhgM'><span id='9NhgM'><b id='9NhgM'><form id='9NhgM'><ins id='9NhgM'></ins><ul id='9NhgM'></ul><sub id='9NhgM'></sub></form><legend id='9NhgM'></legend><bdo id='9NhgM'><pre id='9NhgM'><center id='9NhgM'></center></pre></bdo></b><th id='9NhgM'></th></span></q></dt></tr></i><div id='9NhgM'><tfoot id='9NhgM'></tfoot><dl id='9NhgM'><fieldset id='9NhgM'></fieldset></dl></div>
      <tfoot id='9NhgM'></tfoot>

      HDF5 - 并发、压缩和输入输出性能

      HDF5 - concurrency, compression amp; I/O performance(HDF5 - 并发、压缩和输入输出性能)
        <tbody id='VVq2p'></tbody>

      <i id='VVq2p'><tr id='VVq2p'><dt id='VVq2p'><q id='VVq2p'><span id='VVq2p'><b id='VVq2p'><form id='VVq2p'><ins id='VVq2p'></ins><ul id='VVq2p'></ul><sub id='VVq2p'></sub></form><legend id='VVq2p'></legend><bdo id='VVq2p'><pre id='VVq2p'><center id='VVq2p'></center></pre></bdo></b><th id='VVq2p'></th></span></q></dt></tr></i><div id='VVq2p'><tfoot id='VVq2p'></tfoot><dl id='VVq2p'><fieldset id='VVq2p'></fieldset></dl></div>

        <bdo id='VVq2p'></bdo><ul id='VVq2p'></ul>
              • <legend id='VVq2p'><style id='VVq2p'><dir id='VVq2p'><q id='VVq2p'></q></dir></style></legend>
              • <small id='VVq2p'></small><noframes id='VVq2p'>

                <tfoot id='VVq2p'></tfoot>

                本文介绍了HDF5 - 并发、压缩和输入输出性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

                问题描述

                我有以下关于 HDF5 性能和并发性的问题:

                I have the following questions about HDF5 performance and concurrency:

                1. HDF5 是否支持并发写入访问?
                2. 抛开并发考虑不谈,HDF5 在I/O 性能方面的性能如何(压缩率会影响性能吗)?
                3. 由于我将 HDF5 与 Python 结合使用,它的性能与 Sqlite 相比如何?
                1. Does HDF5 support concurrent write access?
                2. Concurrency considerations aside, how is HDF5 performance in terms of I/O performance (does compression rates affect the performance)?
                3. Since I use HDF5 with Python, how does its performance compare to Sqlite?

                参考文献:

                • http://www.sqlite.org/faq.html#q5
                • 可以在 NFS 文件系统上锁定 sqlite 文件吗?
                • http://pandas.pydata.org/

                推荐答案

                更新为使用 pandas 0.13.1

                Updated to use pandas 0.13.1

                1) 没有.http://pandas.pydata.org/pandas-docs/dev/io.html#notes-caveats.有多种方法可以做到,例如让不同的线程/进程写出计算结果,然后将单个进程合并.

                1) No. http://pandas.pydata.org/pandas-docs/dev/io.html#notes-caveats. There are various ways to do this, e.g. have your different threads/processes write out the computation results, then have a single process combine.

                2) 根据您存储的数据类型、存储方式以及检索方式,HDF5 可以提供更好的性能.以单个数组的形式存储在 HDFStore 中,浮点数据经过压缩(换句话说,不是以允许查询的格式存储),存储/读取速度将非常快.即使以表格式存储(这会降低写入性能),也会提供非常好的写入性能.您可以查看此进行一些详细的比较(这是 HDFStore 在幕后使用的内容).http://www.pytables.org/,这是一张不错的图片:

                2) depending the type of data you store, how you do it, and how you want to retrieve, HDF5 can offer vastly better performance. Storing in an HDFStore as a single array, float data, compressed (in other words, not storing it in a format that allows for querying), will be stored/read amazing fast. Even storing in the table format (which slows down the write performance), will offer quite good write performance. You can look at this for some detailed comparsions (which is what HDFStore uses under the hood). http://www.pytables.org/, here's a nice picture:

                (从 PyTables 2.3 开始,查询现在被索引了),所以性能实际上比这好得多因此,回答您的问题,如果您想要任何类型的性能,HDF5 是您的最佳选择.

                (and since PyTables 2.3 the queries are now indexed), so perf actually is MUCH better than this So to answer your question, if you want any kind of performance, HDF5 is the way to go.

                写作:

                In [14]: %timeit test_sql_write(df)
                1 loops, best of 3: 6.24 s per loop
                
                In [15]: %timeit test_hdf_fixed_write(df)
                1 loops, best of 3: 237 ms per loop
                
                In [16]: %timeit test_hdf_table_write(df)
                1 loops, best of 3: 901 ms per loop
                
                In [17]: %timeit test_csv_write(df)
                1 loops, best of 3: 3.44 s per loop
                

                阅读

                In [18]: %timeit test_sql_read()
                1 loops, best of 3: 766 ms per loop
                
                In [19]: %timeit test_hdf_fixed_read()
                10 loops, best of 3: 19.1 ms per loop
                
                In [20]: %timeit test_hdf_table_read()
                10 loops, best of 3: 39 ms per loop
                
                In [22]: %timeit test_csv_read()
                1 loops, best of 3: 620 ms per loop
                

                这是代码

                import sqlite3
                import os
                from pandas.io import sql
                
                In [3]: df = DataFrame(randn(1000000,2),columns=list('AB'))
                <class 'pandas.core.frame.DataFrame'>
                Int64Index: 1000000 entries, 0 to 999999
                Data columns (total 2 columns):
                A    1000000  non-null values
                B    1000000  non-null values
                dtypes: float64(2)
                
                def test_sql_write(df):
                    if os.path.exists('test.sql'):
                        os.remove('test.sql')
                    sql_db = sqlite3.connect('test.sql')
                    sql.write_frame(df, name='test_table', con=sql_db)
                    sql_db.close()
                
                def test_sql_read():
                    sql_db = sqlite3.connect('test.sql')
                    sql.read_frame("select * from test_table", sql_db)
                    sql_db.close()
                
                def test_hdf_fixed_write(df):
                    df.to_hdf('test_fixed.hdf','test',mode='w')
                
                def test_csv_read():
                    pd.read_csv('test.csv',index_col=0)
                
                def test_csv_write(df):
                    df.to_csv('test.csv',mode='w')    
                
                def test_hdf_fixed_read():
                    pd.read_hdf('test_fixed.hdf','test')
                
                def test_hdf_table_write(df):
                    df.to_hdf('test_table.hdf','test',format='table',mode='w')
                
                def test_hdf_table_read():
                    pd.read_hdf('test_table.hdf','test')
                

                当然是天啊.

                这篇关于HDF5 - 并发、压缩和输入输出性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

                本站部分内容来源互联网,如果有图片或者内容侵犯了您的权益,请联系我们,我们会在确认后第一时间进行删除!

                相关文档推荐

                SQLite loop statements?(SQLite 循环语句?)
                Can I use parameters for the table name in sqlite3?(我可以在 sqlite3 中使用表名的参数吗?)
                SQL - Inserting a row and returning primary key(SQL - 插入一行并返回主键)
                How to get the number of rows of the selected result from sqlite3?(如何从sqlite3中获取所选结果的行数?)
                Python : How to insert a dictionary to a sqlite database?(Python:如何将字典插入到 sqlite 数据库中?)
                What are the advantages of VistaDB(VistaDB有什么优势)

              • <tfoot id='9vuCj'></tfoot>
                  <tbody id='9vuCj'></tbody>
                • <bdo id='9vuCj'></bdo><ul id='9vuCj'></ul>

                  <small id='9vuCj'></small><noframes id='9vuCj'>

                  <legend id='9vuCj'><style id='9vuCj'><dir id='9vuCj'><q id='9vuCj'></q></dir></style></legend>

                        • <i id='9vuCj'><tr id='9vuCj'><dt id='9vuCj'><q id='9vuCj'><span id='9vuCj'><b id='9vuCj'><form id='9vuCj'><ins id='9vuCj'></ins><ul id='9vuCj'></ul><sub id='9vuCj'></sub></form><legend id='9vuCj'></legend><bdo id='9vuCj'><pre id='9vuCj'><center id='9vuCj'></center></pre></bdo></b><th id='9vuCj'></th></span></q></dt></tr></i><div id='9vuCj'><tfoot id='9vuCj'></tfoot><dl id='9vuCj'><fieldset id='9vuCj'></fieldset></dl></div>