使用 Pandas、Python 将数据附加到 HDF5 文件

Question

提问by Karl

I have large pandas DataFrames with financial data. I have no problem appending and concatenating additional columns and DataFrames to my .h5 file.

我有带有财务数据的大Pandas数据帧。我可以在我的 .h5 文件中附加和连接额外的列和数据帧。

The financial data is being updated every minute, I need to append a row of data to all of my existing tables inside of my .h5 file every minute.

财务数据每分钟更新一次，我需要每分钟将一行数据附加到我的 .h5 文件中的所有现有表中。

Here is what i have tried so far, but no matter what i do, it overwrites the .h5 file and does not just append data.

这是我到目前为止所尝试的，但无论我做什么，它都会覆盖 .h5 文件，而不仅仅是附加数据。

HDFStore way:

HDFStore方式：

#we open the hdf5 file
save_hdf = HDFStore('test.h5') 

ohlcv_candle.to_hdf('test.h5')

#we give the dataframe a key value
#format=table so we can append data
save_hdf.put('name_of_frame',ohlcv_candle, format='table',  data_columns=True)

#we print our dataframe by calling the hdf file with the key
#just doing this as a test
print(save_hdf['name_of_frame'])

The other way I have tried it, to_hdf:

我尝试过的另一种方式，to_hdf：

#format=t so we can append data , mode=r+ to specify the file exists and
#we want to append to it
tohlcv_candle.to_hdf('test.h5',key='this_is_a_key', mode='r+', format='t')

#again just printing to check if it worked 
print(pd.read_hdf('test.h5', key='this_is_a_key'))

Here is what one of the DataFrames looks like after being read_hdf:

这是其中一个 DataFrame 在被 read_hdf 之后的样子：

           time     open     high      low    close     volume           PP  
0    1505305260  3137.89  3147.15  3121.17  3146.94   6.205397  3138.420000   
1    1505305320  3146.86  3159.99  3130.00  3159.88   8.935962  3149.956667   
2    1505305380  3159.96  3160.00  3159.37  3159.66   4.524017  3159.676667   
3    1505305440  3159.66  3175.51  3151.08  3175.51   8.717610  3167.366667   
4    1505305500  3175.25  3175.53  3170.44  3175.53   3.187453  3173.833333

The next time I am getting data (every minute), i would like a row of it added to index 5 of all my columns..and then 6 and 7 ..and so on, without having to read and manipulate the entire file in memory as that would defeat the point of doing this. If there is a better way of solving this, do not be shy to recommend it.

下次我获取数据时（每分钟），我希望将其中一行添加到我所有列的索引 5 中……然后是 6 和 7 ……等等，而无需读取和操作整个文件记忆，因为那会破坏这样做的意义。如果有更好的方法来解决这个问题，请不要羞于推荐它。

P.S. sorry for the formatting of that table in here

PS抱歉这里表格的格式

Answer 1

回答by MaxU

pandas.HDFStore.put()has parameter append(which defaults to False) - that instructs Pandas to overwrite instead of appending.

pandas.HDFStore.put()有参数append（默认为False） - 指示 Pandas 覆盖而不是追加。

So try this:

所以试试这个：

store = pd.HDFStore('test.h5')

store.append('name_of_frame', ohlcv_candle, format='t',  data_columns=True)

we can also use store.put(..., append=True), but this file should also be created in a table format:

我们也可以使用store.put(..., append=True)，但这个文件也应该以表格格式创建：

store.put('name_of_frame', ohlcv_candle, format='t', append=True, data_columns=True)

NOTE:appending works only for the table(format='t'- is an alias for format='table') format.

注意：附加仅适用于table( format='t'- 是format='table') 格式的别名。

Answer 2

回答by Nikhil VJ

tohlcv_candle.to_hdf('test.h5',key='this_is_a_key', append=True, mode='r+', format='t')

You need to pass another argument append=Trueto specify that the data is to be appended to existing data if found under that key, instead of over-writing it.

您需要传递另一个参数append=True来指定如果在该键下找到数据，则将数据附加到现有数据，而不是覆盖它。

Without this, the default is Falseand if it encounters an existing table under 'this_is_a_key'then it overwrites.

没有这个，默认是False，如果它遇到一个现有的表，'this_is_a_key'那么它会覆盖。

The mode=argument is only at file-level, telling whether the file as a whole is to be overwritten or appended.

该mode=参数仅在文件级别，告诉整个文件是要覆盖还是附加。

One file can have any number of keys, so a mode='a', append=Falsesetting will mean only one key gets over-written while the other keys stay.

一个文件可以有任意数量的键，因此mode='a', append=False设置意味着只有一个键被覆盖，而其他键保持不变。

I had a similar experience as yours and found the additional append argument in the reference doc. After setting it, now it's appending properly for me.

我和你有类似的经历，并在参考文档中找到了附加的 append 参数。设置后，现在它为我正确附加。

Ref: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_hdf.html

参考：https: //pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_hdf.html

Note: hdf5 won't bother doing anything with the dataframe's indexes. We need to iron those out before putting the data in or when we take it out.

注意：hdf5 不会对数据帧的索引做任何事情。我们需要在放入数据之前或取出数据之前解决这些问题。

使用 Pandas、Python 将数据附加到 HDF5 文件

提问by Karl

回答by MaxU

回答by Nikhil VJ

相关推荐

最近更新

标签

使用 Pandas、Python 将数据附加到 HDF5 文件

提问by Karl

回答by MaxU

回答by Nikhil VJ

相关推荐

pandas 基于列值的随机抽样熊猫

pandas 熊猫 corr() 与 corrwith()

pandas 将列从一个数据帧映射到另一个以创建新列

pandas 波浪号在python数据框中签名

相关推荐

最近更新

标签