pandas 按键更新pandas DataFrame
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13924972/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Updating pandas DataFrame by key
提问by garrett
I have a dataframe of historical stock trades. The frame has columns like ['ticker', 'date', 'cusip', 'profit', 'security_type']. Initially:
我有一个历史股票交易数据框。该框架具有诸如 ['ticker', 'date', 'cusip', '赢利', 'security_type'] 之类的列。最初:
trades['cusip'] = np.nan
trades['security_type'] = np.nan
I have historical config files that I can load into frames that have columns like ['ticker', 'cusip', 'date', 'name', 'security_type', 'primary_exchange'].
我有历史配置文件,我可以将其加载到具有 ['ticker'、'cusip'、'date'、'name'、'security_type'、'primary_exchange'] 等列的框架中。
I would like to UPDATE the trades frame with the cusip and security_type from config, but only where the ticker and date match.
我想使用配置中的 cusip 和 security_type 更新交易框架,但仅限于股票代码和日期匹配的地方。
I thought I could do something like:
我以为我可以做这样的事情:
pd.merge(trades, config, on=['ticker', 'date'], how='left')
But that doesn't update the columns, it just adds the config columns to trades.
但这不会更新列,它只是将配置列添加到交易中。
The following works, but I think there has to be a better way. If not, I will probably do it outside of pandas.
以下工作,但我认为必须有更好的方法。如果没有,我可能会在Pandas之外做。
for date in trades['date'].unique():
config = get_config_file_as_df(date)
## config['date'] == date
for ticker in trades['ticker'][trades['date'] == date]:
trades['cusip'][
(trades['ticker'] == ticker)
& (trades['date'] == date)
] \
= config['cusip'][config['ticker'] == ticker].values[0]
trades['security_type'][
(trades['ticker'] == ticker)
& (trades['date'] == date)
] \
= config['security_type'][config['ticker'] == ticker].values[0]
回答by unutbu
Suppose you have this setup:
假设你有这个设置:
import pandas as pd
import numpy as np
import datetime as DT
nan = np.nan
trades = pd.DataFrame({'ticker' : ['IBM', 'MSFT', 'GOOG', 'AAPL'],
'date' : pd.date_range('1/1/2000', periods = 4),
'cusip' : [nan, nan, 100, nan]
})
trades = trades.set_index(['ticker', 'date'])
print(trades)
# cusip
# ticker date
# IBM 2000-01-01 NaN
# MSFT 2000-01-02 NaN
# GOOG 2000-01-03 100 # <-- We do not want to overwrite this
# AAPL 2000-01-04 NaN
config = pd.DataFrame({'ticker' : ['IBM', 'MSFT', 'GOOG', 'AAPL'],
'date' : pd.date_range('1/1/2000', periods = 4),
'cusip' : [1,2,3,nan]})
config = config.set_index(['ticker', 'date'])
# Let's permute the index to show `DataFrame.update` correctly matches rows based on the index, not on the order of the rows.
new_index = sorted(config.index)
config = config.reindex(new_index)
print(config)
# cusip
# ticker date
# AAPL 2000-01-04 NaN
# GOOG 2000-01-03 3
# IBM 2000-01-01 1
# MSFT 2000-01-02 2
Then you can update NaN values in tradeswith values from configusing the DataFrame.updatemethod. Note that DataFrame.updatematches rows based on indices (which is why set_indexwas called above).
然后,您可以使用该方法的trades值更新 NaN 值。请注意,匹配基于索引的行(这就是上面调用的原因)。configDataFrame.updateDataFrame.updateset_index
trades.update(config, join = 'left', overwrite = False)
print(trades)
# cusip
# ticker date
# IBM 2000-01-01 1
# MSFT 2000-01-02 2
# GOOG 2000-01-03 100 # If overwrite = True, then 100 is overwritten by 3.
# AAPL 2000-01-04 NaN

