pandas 按键更新pandas DataFrame

Question

提问by garrett

I have a dataframe of historical stock trades. The frame has columns like ['ticker', 'date', 'cusip', 'profit', 'security_type']. Initially:

我有一个历史股票交易数据框。该框架具有诸如 ['ticker', 'date', 'cusip', '赢利', 'security_type'] 之类的列。最初：

trades['cusip'] = np.nan
trades['security_type'] = np.nan

I have historical config files that I can load into frames that have columns like ['ticker', 'cusip', 'date', 'name', 'security_type', 'primary_exchange'].

我有历史配置文件，我可以将其加载到具有 ['ticker'、'cusip'、'date'、'name'、'security_type'、'primary_exchange'] 等列的框架中。

I would like to UPDATE the trades frame with the cusip and security_type from config, but only where the ticker and date match.

我想使用配置中的 cusip 和 security_type 更新交易框架，但仅限于股票代码和日期匹配的地方。

I thought I could do something like:

我以为我可以做这样的事情：

pd.merge(trades, config, on=['ticker', 'date'], how='left')

But that doesn't update the columns, it just adds the config columns to trades.

但这不会更新列，它只是将配置列添加到交易中。

The following works, but I think there has to be a better way. If not, I will probably do it outside of pandas.

以下工作，但我认为必须有更好的方法。如果没有，我可能会在Pandas之外做。

for date in trades['date'].unique():
    config = get_config_file_as_df(date)
    ## config['date'] == date
    for ticker in trades['ticker'][trades['date'] == date]:
        trades['cusip'][ 
                           (trades['ticker'] == ticker)
                         & (trades['date']   == date)
                       ] \
            = config['cusip'][config['ticker'] == ticker].values[0]

        trades['security_type'][ 
                           (trades['ticker'] == ticker)
                         & (trades['date']   == date)
                       ] \
            = config['security_type'][config['ticker'] == ticker].values[0]

Answer 1

回答by unutbu

Suppose you have this setup:

假设你有这个设置：

import pandas as pd
import numpy as np
import datetime as DT

nan = np.nan

trades = pd.DataFrame({'ticker' : ['IBM', 'MSFT', 'GOOG', 'AAPL'],
                       'date' : pd.date_range('1/1/2000', periods = 4), 
                       'cusip' : [nan, nan, 100, nan]
                       })
trades = trades.set_index(['ticker', 'date'])
print(trades)
#                    cusip
# ticker date             
# IBM    2000-01-01    NaN
# MSFT   2000-01-02    NaN
# GOOG   2000-01-03    100  # <-- We do not want to overwrite this
# AAPL   2000-01-04    NaN

config = pd.DataFrame({'ticker' : ['IBM', 'MSFT', 'GOOG', 'AAPL'],
                       'date' : pd.date_range('1/1/2000', periods = 4),
                       'cusip' : [1,2,3,nan]})
config = config.set_index(['ticker', 'date'])

# Let's permute the index to show `DataFrame.update` correctly matches rows based on the index, not on the order of the rows.
new_index = sorted(config.index)
config = config.reindex(new_index)    
print(config)
#                    cusip
# ticker date             
# AAPL   2000-01-04    NaN
# GOOG   2000-01-03      3
# IBM    2000-01-01      1
# MSFT   2000-01-02      2

Then you can update NaN values in tradeswith values from configusing the DataFrame.updatemethod. Note that DataFrame.updatematches rows based on indices (which is why set_indexwas called above).

然后，您可以使用该方法的trades值更新 NaN 值。请注意，匹配基于索引的行（这就是上面调用的原因）。configDataFrame.updateDataFrame.updateset_index

trades.update(config, join = 'left', overwrite = False)
print(trades)

#                    cusip
# ticker date             
# IBM    2000-01-01      1
# MSFT   2000-01-02      2
# GOOG   2000-01-03    100 # If overwrite = True, then 100 is overwritten by 3.
# AAPL   2000-01-04    NaN

pandas 按键更新pandas DataFrame

提问by garrett

回答by unutbu

相关推荐

最近更新

标签

pandas 按键更新pandas DataFrame

提问by garrett

回答by unutbu

相关推荐

pandas 从稀疏数据帧填充连续的熊猫数据帧

在大型 DataFrame 上对 Pandas 进行排列的有效方法

pandas 熊猫滚动应用缺失数据

pandas 熊猫数据框，按值复制

相关推荐

最近更新

标签