Python 中 DataFrames 的 DataFrame (Pandas)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35932060/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
DataFrame of DataFrames in Python (Pandas)
提问by Stephen
The idea here is that for every year, I am able to create three dataframes(df1, df2, df3), each containing different firms and stock prices('firm' and 'price' are the two columns in df1~df3). I would like to use another dataframe (named 'store' below) to store the three dataframes every year.
这里的想法是,每年,我都能够创建三个数据框(df1、df2、df3),每个数据框都包含不同的公司和股票价格('firm' 和'price' 是 df1~df3 中的两列)。我想每年使用另一个数据框(下面命名为“存储”)来存储三个数据框。
Here is what I code:
这是我的代码:
store = pd.DataFrame(list(range(1967,2014)), columns=['year'])
for year in range(1967,2014):
....some codes that allow me to generate df1, df2 and df3 correctly...
store.loc[store['year']==year, 'df1']=df1
store.loc[store['year']==year, 'df2']=df2
store.loc[store['year']==year, 'df3']=df3
I am not getting error warning or anything after this code. But in the "store" dataframe, columns 'df1', 'df2' and 'df3' are all 'NAN' values.
在此代码之后,我没有收到错误警告或任何内容。但是在“商店”数据框中,“df1”、“df2”和“df3”列都是“NAN”值。
采纳答案by Ami Tavory
I think that pandas offers better alternatives to what you're suggesting (rationale below).
我认为大Pandas为您的建议提供了更好的替代方案(理由如下)。
For one, there's the pandas.Panel
data structure, which was meant for things like you're doing here.
其中之一是pandas.Panel
数据结构,它适用于您在这里所做的事情。
However, as Wes McKinney (the Pandas author) noted in his book Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython, multi-dimensional indices, to a large extent, offer a better alternative.
然而,正如 Wes McKinney(Pandas 的作者)在其著作Python for Data Analysis: Data Wrangling with Pandas、NumPy 和 IPython 中指出的那样,多维索引在很大程度上提供了一个更好的选择。
Consider the following alternative to your code:
考虑以下替代代码:
dfs = []
for year in range(1967,2014):
....some codes that allow me to generate df1, df2 and df3
df1['year'] = year
df1['origin'] = 'df1'
df2['year'] = year
df2['origin'] = 'df2'
df3['year'] = year
df3['origin'] = 'df3'
dfs.extend([df1, df2, df3])
df = pd.concat(dfs)
This gives you a DataFrame with 4 columns: 'firm'
, 'price'
, 'year'
, and 'origin'
.
这给你一个数据帧有4列:'firm'
,'price'
,'year'
,和'origin'
。
This gives you the flexibility to:
这使您可以灵活地:
Organize hierarchically by, say,
'year'
and'origin'
:df.set_index(['year', 'origin'])
, by, say,'origin'
and'price'
:df.set_index(['origin', 'price'])
Do
groupby
s according to different levelsIn general, slice and dice the data along many different ways.
分层组织,例如,
'year'
和'origin'
:df.set_index(['year', 'origin'])
,由,说,'origin'
和'price'
:df.set_index(['origin', 'price'])
DO
groupby
S以不同层次通常,按照许多不同的方式对数据进行切片和切块。
What you're suggesting in the question makes one dimension (origin) arbitrarily different, and it's hard to think of an advantage to this. If a split along some dimension is necessary due, to, e.g., performance, you can combine DataFrames better with standard Python data structures:
您在问题中的建议使一个维度(起源)任意不同,并且很难想到对此的优势。如果由于性能等原因需要沿某个维度进行拆分,则可以将 DataFrame 与标准 Python 数据结构更好地结合起来:
A dictionary mapping each year to a Dataframe with the other three dimensions.
Three DataFrames, one for each origin, each having three dimensions.
将每年映射到具有其他三个维度的 Dataframe 的字典。
三个 DataFrame,每个原点一个,每个都有三个维度。