Python 中 DataFrames 的 DataFrame (Pandas)

Question

提问by Stephen

The idea here is that for every year, I am able to create three dataframes(df1, df2, df3), each containing different firms and stock prices('firm' and 'price' are the two columns in df1~df3). I would like to use another dataframe (named 'store' below) to store the three dataframes every year.

这里的想法是，每年，我都能够创建三个数据框（df1、df2、df3），每个数据框都包含不同的公司和股票价格（'firm' 和'price' 是 df1~df3 中的两列）。我想每年使用另一个数据框（下面命名为“存储”）来存储三个数据框。

Here is what I code:

这是我的代码：

store = pd.DataFrame(list(range(1967,2014)), columns=['year'])
for year in range(1967,2014):
    ....some codes that allow me to generate df1, df2 and df3 correctly...
    store.loc[store['year']==year, 'df1']=df1
    store.loc[store['year']==year, 'df2']=df2
    store.loc[store['year']==year, 'df3']=df3

I am not getting error warning or anything after this code. But in the "store" dataframe, columns 'df1', 'df2' and 'df3' are all 'NAN' values.

在此代码之后，我没有收到错误警告或任何内容。但是在“商店”数据框中，“df1”、“df2”和“df3”列都是“NAN”值。

Answer 1

采纳答案by Ami Tavory

I think that pandas offers better alternatives to what you're suggesting (rationale below).

我认为大Pandas为您的建议提供了更好的替代方案（理由如下）。

For one, there's the pandas.Paneldata structure, which was meant for things like you're doing here.

其中之一是pandas.Panel数据结构，它适用于您在这里所做的事情。

However, as Wes McKinney (the Pandas author) noted in his book Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython, multi-dimensional indices, to a large extent, offer a better alternative.

然而，正如 Wes McKinney（Pandas 的作者）在其著作Python for Data Analysis: Data Wrangling with Pandas、NumPy 和 IPython 中指出的那样，多维索引在很大程度上提供了一个更好的选择。

Consider the following alternative to your code:

考虑以下替代代码：

dfs = []
for year in range(1967,2014):
    ....some codes that allow me to generate df1, df2 and df3 
    df1['year'] = year
    df1['origin'] = 'df1'
    df2['year'] = year
    df2['origin'] = 'df2'
    df3['year'] = year
    df3['origin'] = 'df3'
    dfs.extend([df1, df2, df3])
df = pd.concat(dfs)

This gives you a DataFrame with 4 columns: 'firm', 'price', 'year', and 'origin'.

这给你一个数据帧有4列：'firm'，'price'，'year'，和'origin'。

This gives you the flexibility to:

这使您可以灵活地：

Organize hierarchically by, say, 'year'and 'origin': df.set_index(['year', 'origin']), by, say, 'origin'and 'price': df.set_index(['origin', 'price'])
Do groupbys according to different levels
In general, slice and dice the data along many different ways.

分层组织，例如，'year'和'origin'：df.set_index(['year', 'origin'])，由，说，'origin'和'price'：df.set_index(['origin', 'price'])
DO groupbyS以不同层次
通常，按照许多不同的方式对数据进行切片和切块。

What you're suggesting in the question makes one dimension (origin) arbitrarily different, and it's hard to think of an advantage to this. If a split along some dimension is necessary due, to, e.g., performance, you can combine DataFrames better with standard Python data structures:

您在问题中的建议使一个维度（起源）任意不同，并且很难想到对此的优势。如果由于性能等原因需要沿某个维度进行拆分，则可以将 DataFrame 与标准 Python 数据结构更好地结合起来：

A dictionary mapping each year to a Dataframe with the other three dimensions.
Three DataFrames, one for each origin, each having three dimensions.

将每年映射到具有其他三个维度的 Dataframe 的字典。
三个 DataFrame，每个原点一个，每个都有三个维度。

Python 中 DataFrames 的 DataFrame (Pandas)

提问by Stephen

采纳答案by Ami Tavory

相关推荐

最近更新

标签

Python 中 DataFrames 的 DataFrame (Pandas)

提问by Stephen

采纳答案by Ami Tavory

相关推荐

pandas 熊猫格兰杰因果关系

在 Pandas DataFrame 列上应用阈值

pandas 在熊猫数据框中以相同字符串开头的列的总和值

pandas 获取 DataFrame 列作为值列表

相关推荐

最近更新

标签