Python 如何在数据框的数组列中选择一个元素？

Question

提问by jankos

I have the following data frame:

我有以下数据框：

pa=pd.DataFrame({'a':np.array([[1.,4.],[2.],[3.,4.,5.]])})

I want to select the column 'a' and then only a particular element (i.e. first: 1., 2., 3.)

我想选择列“a”，然后只选择一个特定元素（即第一个：1., 2., 3.）

What do I need to add to:

我需要添加什么：

pa.loc[:,['a']]

?

Answer 1

采纳答案by b10n

pa.loc[row]selects the row with label row.

pa.loc[row]选择带有标签的行row。

pa.loc[row, col]selects the cells which are the instersection of rowand col

pa.loc[row, col]选择作为row和的交叉点的单元格col

pa.loc[:, col]selects allrows and the column named col. Note that although this works it is not the idiomatic way to refer to a column of a dataframe. For that you should use pa['a']

pa.loc[:, col]选择所有行和名为的列col。请注意，虽然这有效，但它不是引用数据帧列的惯用方式。为此，您应该使用pa['a']

Now you have lists in the cells of your column so you can use the vectorized string methodsto access the elements of those lists like so.

现在您在列的单元格中有列表，因此您可以使用矢量化字符串方法来访问这些列表的元素，就像这样。

pa['a'].str[0] #first value in lists
pa['a'].str[-1] #last value in lists

Answer 2

回答by unutbu

Storing lists as values in a Pandas DataFrame tends to be a mistake because it prevents you from taking advantage of fast NumPy or Pandas vectorized operations.

将列表作为值存储在 Pandas DataFrame 中往往是一个错误，因为它会阻止您利用快速的 NumPy 或 Pandas 向量化操作。

Therefore, you might be better off converting your DataFrame of lists of numbers into a wider DataFrame with native NumPy dtypes:

因此，您最好将数字列表的 DataFrame 转换为具有本机 NumPy dtypes 的更宽的 DataFrame：

import numpy as np
import pandas as pd

pa = pd.DataFrame({'a':np.array([[1.,4.],[2.],[3.,4.,5.]])})
df = pd.DataFrame(pa['a'].values.tolist())
#      0    1    2
# 0  1.0  4.0  NaN
# 1  2.0  NaN  NaN
# 2  3.0  4.0  5.0

Now, you could select the first column like this:

现在，您可以像这样选择第一列：

In [36]: df.iloc[:, 0]
Out[36]: 
0    1.0
1    2.0
2    3.0
Name: 0, dtype: float64

or the first row like this:

或像这样的第一行：

In [37]: df.iloc[0, :]
Out[37]: 
0    1.0
1    4.0
2    NaN
Name: 0, dtype: float64

If you wish to drop NaNs, use .dropna():

如果您想删除 NaN，请使用.dropna()：

In [38]: df.iloc[0, :].dropna()
Out[38]: 
0    1.0
1    4.0
Name: 0, dtype: float64

and .tolist()to retrieve the values as a list:

并.tolist()以列表形式检索值：

In [39]: df.iloc[0, :].dropna().tolist()
Out[39]: [1.0, 4.0]

but if you wish to leverage NumPy/Pandas for speed, you'll want to express your calculation as vectorized operations on dfitself without converting back to Python lists.

但是，如果您希望利用 NumPy/Pandas 来提高速度，您需要将您的计算表达为对df自身进行矢量化操作，而无需转换回 Python 列表。

Python 如何在数据框的数组列中选择一个元素？

提问by jankos

采纳答案by b10n

回答by unutbu

相关推荐

最近更新

标签

Python 如何在数据框的数组列中选择一个元素？

提问by jankos

采纳答案by b10n

回答by unutbu

相关推荐

Python 安装脚本退出并出现错误：命令“x86_64-linux-gnu-gcc”失败，退出状态为 1

Python 以简洁的方式显示从 Flask 返回的 JSON

在 Python 中使用字典作为 switch 语句

将 python (pandas) 数据帧写入 SQL 数据库错误

相关推荐

最近更新

标签