Python Pandas:从系列创建数据帧
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23521511/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas: Creating DataFrame from Series
提问by BMichell
My current code is shown below - I'm importing a MAT file and trying to create a DataFrame from variables within it:
我当前的代码如下所示 - 我正在导入一个 MAT 文件并尝试从其中的变量创建一个 DataFrame:
mat = loadmat(file_path) # load mat-file
Variables = mat.keys() # identify variable names
df = pd.DataFrame # Initialise DataFrame
for name in Variables:
B = mat[name]
s = pd.Series (B[:,1])
So within the loop I can create a series of each variable (they're arrays with two columns - so the values I need are in column 2)
所以在循环中我可以创建一系列每个变量(它们是有两列的数组 - 所以我需要的值在第 2 列中)
My question is how do I append the series to the dataframe? I've looked through the documentation and none of the examples seem to fit what I'm trying to do.
我的问题是如何将系列附加到数据框?我已经浏览了文档,但似乎没有一个示例适合我正在尝试做的事情。
Best Regards,
此致,
Ben
本
回答by TomAugspurger
No need to initialize an empty DataFrame (you weren't even doing that, you'd need pd.DataFrame()
with the parens).
无需初始化一个空的 DataFrame(您甚至没有这样做,您需要pd.DataFrame()
使用括号)。
Instead, to create a DataFrame where each series is a column,
相反,要创建一个 DataFrame,其中每个系列都是一列,
- make a list of Series,
series
, and - concatenate them horizontally with
df = pd.concat(series, axis=1)
- 列出系列
series
、 和 - 将它们水平连接
df = pd.concat(series, axis=1)
Something like:
就像是:
series = [pd.Series(mat[name][:, 1]) for name in Variables]
df = pd.concat(series, axis=1)
回答by Happy001
I guess anther way, possibly faster, to achieve this is
1) Use dict comprehension to get desired dict (i.e., taking 2nd col of each array)
2) Then use pd.DataFrame
to create an instance directly from the dict without loop over each col and concat.
我想另一种方法,可能更快,实现这一点是 1) 使用 dict 理解来获得所需的 dict(即,获取每个数组的第二个 col) 2)然后使用pd.DataFrame
直接从 dict 创建一个实例,而不在每个 col 和 concat 上循环.
Assuming your mat
looks like this (you can ignore this since your mat
is loaded from file):
假设你mat
看起来像这样(你可以忽略这一点,因为你mat
是从文件加载的):
In [135]: mat = {'a': np.random.randint(5, size=(4,2)),
.....: 'b': np.random.randint(5, size=(4,2))}
In [136]: mat
Out[136]:
{'a': array([[2, 0],
[3, 4],
[0, 1],
[4, 2]]), 'b': array([[1, 0],
[1, 1],
[1, 0],
[2, 1]])}
Then you can do:
然后你可以这样做:
In [137]: df = pd.DataFrame ({name:mat[name][:,1] for name in mat})
In [138]: df
Out[138]:
a b
0 0 0
1 4 1
2 1 0
3 2 1
[4 rows x 2 columns]
回答by Jaan
Here is how to create a DataFrame where each series is a row.
以下是如何创建一个 DataFrame,其中每个系列都是一行。
For a single Series (resulting in a single-row DataFrame):
对于单个系列(导致单行数据帧):
series = pd.Series([1,2], index=['a','b'])
df = pd.DataFrame([series])
For multiple series with identical indices:
对于具有相同索引的多个系列:
cols = ['a','b']
list_of_series = [pd.Series([1,2],index=cols), pd.Series([3,4],index=cols)]
df = pd.DataFrame(list_of_series, columns=cols)
For multiple series with possibly different indices:
对于可能具有不同索引的多个系列:
list_of_series = [pd.Series([1,2],index=['a','b']), pd.Series([3,4],index=['a','c'])]
df = pd.concat(list_of_series, axis=1).transpose()
To create a DataFrame where each series is a column, see the answers by others. Alternatively, one can create a DataFrame where each series is a row, as above, and then use df.transpose()
. However, the latter approach is inefficient if the columns have different data types.
要创建一个 DataFrame ,其中每个系列都是一个 column,请参阅其他人的答案。或者,可以创建一个 DataFrame,其中每个系列都是一行,如上所述,然后使用df.transpose()
. 但是,如果列具有不同的数据类型,则后一种方法效率低下。