如何在 Pandas Dataframe 中增量添加行?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40232520/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do I incrementally add rows in Pandas Dataframe?
提问by Akshay
I am calculating the open-high-low-close (OHLC) of data for a duration of each 15 mins from 9:15 to 15:30 and want to store the OHLC values in a dataframe in each new row.
我正在计算从 9:15 到 15:30 每 15 分钟的数据的开-高-低-收盘 (OHLC),并希望将 OHLC 值存储在每个新行的数据帧中。
ohlc = pd.DataFrame(columns=('Open','High','Low','Close'))
for row in ohlc:
ohlc.loc[10] = pd.DataFrame([[candle_open_price,candle_high_price,candle_low_price,candle_close_price]])
But I am not able to do say getting an error of:
但我不能说得到以下错误:
ValueError: cannot set a row with mismatched columns
Just I want to incrementally store the OHLC data of each 15-minute duration which I have calculated & put in rows of the new ohlc dataframe
只是我想增量地存储我计算出的每 15 分钟持续时间的 OHLC 数据并将其放入新的 ohlc 数据帧的行中
EDIT
编辑
import numpy as np
import pandas as pd
import datetime as dt
import matplotlib as plt
import dateutil.parser
tradedata = pd.read_csv('ICICIBANK_TradeData.csv', index_col=False,
names=['Datetime','Price'],
header=0)
tradedata['Datetime'] = pd.to_datetime(tradedata['Datetime'])
first_trd_time = tradedata['Datetime'][0]
last_time = dateutil.parser.parse('2016-01-01 15:30:00.000000')
candle_time = 15;
candle_number = 0
while(first_trd_time < last_time):
candledata = tradedata[(tradedata['Datetime']>first_trd_time) & (tradedata['Datetime']<first_trd_time+dt.timedelta(minutes=candle_time))]
first_trd_time = first_trd_time+dt.timedelta(minutes=candle_time)
candle_open_price = candledata.iloc[0]['Price']
candle_open_time = candledata.iloc[0]['Datetime']
candle_close_price = candledata.iloc[-1]['Price']
candle_close_time = candledata.iloc[-1]['Datetime']
candle_high_price = candledata.loc[candledata['Price'].idxmax()]['Price']
candle_high_time = candledata.loc[candledata['Price'].idxmax()]['Datetime']
candle_low_price = candledata.loc[candledata['Price'].idxmin()]['Price']
candle_low_time = candledata.loc[candledata['Price'].idxmin()]['Datetime']
ohlc = pd.DataFrame(columns=('Open','High','Low','Close'))
ohlc_data = pd.DataFrame()
if(candle_number == 0):
ohlc = pd.DataFrame(np.array([[0, 0, 0, 0]]), columns=['Open', 'High', 'Low', 'Close']).append(ohlc, ignore_index=True)
candle_number = candle_number + 1
print "Zeroth Candle"
else:
ohlc.ix[candle_number] = (candle_open_price,candle_open_price,candle_open_price,candle_open_price)
print "else part with incermenting candle_number"
candle_number = candle_number + 1
print "first_trd_time"
print first_trd_time
print candle_number
print "Success!"
This is my code error is
这是我的代码错误是
ValueError: cannot set by positional indexing with enlargement
回答by jezrael
IIUC you can append DataFrames for each row to list of DataFrames dfs
and then concat
them to df1
:
IIUC 您可以将每一行的 DataFrame 附加到 DataFrame 列表中dfs
,然后将concat
它们附加到df1
:
ohlc = pd.DataFrame(columns=('Open','High','Low','Close'))
dfs = []
for row in ohlc.iterrows():
df = pd.DataFrame([candle_open_price,candle_high_price,
candle_low_price,candle_close_price]).T
dfs.append(df)
df1 = pd.concat(dfs, ignore_index=True)
print (df1)
Then concat
to original DataFrame
ohlc
:
然后concat
到原始DataFrame
ohlc
:
df2 = pd.concat([ohlc,df1])
print (df2)
Sample (for testing in each iteration of loop are added same data):
示例(为了在循环的每次迭代中进行测试都添加了相同的数据):
#sample data
candle_open_price = pd.Series([1.5,10],
name='Open',
index=pd.DatetimeIndex(['2016-01-02','2016-01-03']) )
candle_high_price = pd.Series([8,9],
name='High',
index=pd.DatetimeIndex(['2016-01-02','2016-01-03']))
candle_low_price = pd.Series([0,12],
name='Low',
index=pd.DatetimeIndex(['2016-01-02','2016-01-03']))
candle_close_price = pd.Series([4,5],
name='Close',
index=pd.DatetimeIndex(['2016-01-02','2016-01-03']))
data = np.array([[1,2,3,5],[7,7,8,9],[10,8,9,3]])
idx = pd.DatetimeIndex(['2016-01-08','2016-01-09','2016-01-10'])
ohlc = pd.DataFrame(data=data,
columns=('Open','High','Low','Close'),
index=idx)
print (ohlc)
Open High Low Close
2016-01-08 1 2 3 5
2016-01-09 7 7 8 9
2016-01-10 10 8 9 3
dfs = []
for row in ohlc.iterrows():
df = pd.DataFrame([candle_open_price,candle_high_price,
candle_low_price,candle_close_price]).T
#print (df)
dfs.append(df)
df1 = pd.concat(dfs)
print (df1)
Open High Low Close
2016-01-02 1.5 8.0 0.0 4.0
2016-01-03 10.0 9.0 12.0 5.0
2016-01-02 1.5 8.0 0.0 4.0
2016-01-03 10.0 9.0 12.0 5.0
2016-01-02 1.5 8.0 0.0 4.0
2016-01-03 10.0 9.0 12.0 5.0
df2 = pd.concat([ohlc,df1])
print (df2)
Open High Low Close
2016-01-08 1.0 2.0 3.0 5.0
2016-01-09 7.0 7.0 8.0 9.0
2016-01-10 10.0 8.0 9.0 3.0
2016-01-02 1.5 8.0 0.0 4.0
2016-01-03 10.0 9.0 12.0 5.0
2016-01-02 1.5 8.0 0.0 4.0
2016-01-03 10.0 9.0 12.0 5.0
2016-01-02 1.5 8.0 0.0 4.0
2016-01-03 10.0 9.0 12.0 5.0