Python Pandas：从系列创建数据帧

Question

提问by BMichell

My current code is shown below - I'm importing a MAT file and trying to create a DataFrame from variables within it:

我当前的代码如下所示 - 我正在导入一个 MAT 文件并尝试从其中的变量创建一个 DataFrame：

mat = loadmat(file_path)  # load mat-file
Variables = mat.keys()    # identify variable names

df = pd.DataFrame         # Initialise DataFrame

for name in Variables:

    B = mat[name]
    s = pd.Series (B[:,1])

So within the loop I can create a series of each variable (they're arrays with two columns - so the values I need are in column 2)

所以在循环中我可以创建一系列每个变量（它们是有两列的数组 - 所以我需要的值在第 2 列中）

My question is how do I append the series to the dataframe? I've looked through the documentation and none of the examples seem to fit what I'm trying to do.

我的问题是如何将系列附加到数据框？我已经浏览了文档，但似乎没有一个示例适合我正在尝试做的事情。

Best Regards,

此致，

Ben

本

Answer 1

回答by TomAugspurger

No need to initialize an empty DataFrame (you weren't even doing that, you'd need pd.DataFrame()with the parens).

无需初始化一个空的 DataFrame（您甚至没有这样做，您需要pd.DataFrame()使用括号）。

Instead, to create a DataFrame where each series is a column,

相反，要创建一个 DataFrame，其中每个系列都是一列，

make a list of Series, series, and
concatenate them horizontally with df = pd.concat(series, axis=1)

列出系列series、和
将它们水平连接 df = pd.concat(series, axis=1)

Something like:

就像是：

series = [pd.Series(mat[name][:, 1]) for name in Variables]
df = pd.concat(series, axis=1)

Answer 2

回答by Happy001

I guess anther way, possibly faster, to achieve this is 1) Use dict comprehension to get desired dict (i.e., taking 2nd col of each array) 2) Then use pd.DataFrameto create an instance directly from the dict without loop over each col and concat.

我想另一种方法，可能更快，实现这一点是 1) 使用 dict 理解来获得所需的 dict（即，获取每个数组的第二个 col） 2）然后使用pd.DataFrame直接从 dict 创建一个实例，而不在每个 col 和 concat 上循环.

Assuming your matlooks like this (you can ignore this since your matis loaded from file):

假设你mat看起来像这样（你可以忽略这一点，因为你mat是从文件加载的）：

In [135]: mat = {'a': np.random.randint(5, size=(4,2)),
   .....: 'b': np.random.randint(5, size=(4,2))}

In [136]: mat
Out[136]: 
{'a': array([[2, 0],
        [3, 4],
        [0, 1],
        [4, 2]]), 'b': array([[1, 0],
        [1, 1],
        [1, 0],
        [2, 1]])}

Then you can do:

然后你可以这样做：

In [137]: df = pd.DataFrame ({name:mat[name][:,1] for name in mat})

In [138]: df
Out[138]: 
   a  b
0  0  0
1  4  1
2  1  0
3  2  1

[4 rows x 2 columns]

Answer 3

回答by Jaan

Here is how to create a DataFrame where each series is a row.

以下是如何创建一个 DataFrame，其中每个系列都是一行。

For a single Series (resulting in a single-row DataFrame):

对于单个系列（导致单行数据帧）：

series = pd.Series([1,2], index=['a','b'])
df = pd.DataFrame([series])

For multiple series with identical indices:

对于具有相同索引的多个系列：

cols = ['a','b']
list_of_series = [pd.Series([1,2],index=cols), pd.Series([3,4],index=cols)]
df = pd.DataFrame(list_of_series, columns=cols)

For multiple series with possibly different indices:

对于可能具有不同索引的多个系列：

list_of_series = [pd.Series([1,2],index=['a','b']), pd.Series([3,4],index=['a','c'])]
df = pd.concat(list_of_series, axis=1).transpose()

To create a DataFrame where each series is a column, see the answers by others. Alternatively, one can create a DataFrame where each series is a row, as above, and then use df.transpose(). However, the latter approach is inefficient if the columns have different data types.

要创建一个 DataFrame ，其中每个系列都是一个 column，请参阅其他人的答案。或者，可以创建一个 DataFrame，其中每个系列都是一行，如上所述，然后使用df.transpose(). 但是，如果列具有不同的数据类型，则后一种方法效率低下。

Python Pandas：从系列创建数据帧

提问by BMichell

回答by TomAugspurger

回答by Happy001

回答by Jaan

相关推荐

最近更新

标签

Python Pandas：从系列创建数据帧

提问by BMichell

回答by TomAugspurger

回答by Happy001

回答by Jaan

相关推荐

如何使用请求模块使用 Python 将 JSON 文件的内容发布到 RESTFUL API

使用 python 和 matplotlib 获取箱线图中使用的值

Python 多处理锁

Python 将二维数组转换为两列数据框熊猫

相关推荐

最近更新

标签