Python 如何在数据框的数组列中选择一个元素?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/26069235/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do I select an element in array column of a data frame?
提问by jankos
I have the following data frame:
我有以下数据框:
pa=pd.DataFrame({'a':np.array([[1.,4.],[2.],[3.,4.,5.]])})
I want to select the column 'a' and then only a particular element (i.e. first: 1., 2., 3.)
我想选择列“a”,然后只选择一个特定元素(即第一个:1., 2., 3.)
What do I need to add to:
我需要添加什么:
pa.loc[:,['a']]
?
?
采纳答案by b10n
pa.loc[row]selects the row with label row.
pa.loc[row]选择带有标签的行row。
pa.loc[row, col]selects the cells which are the instersection of rowand col
pa.loc[row, col]选择作为row和的交叉点的单元格col
pa.loc[:, col]selects allrows and the column named col. Note that although this works it is not the idiomatic way to refer to a column of a dataframe. For that you should use pa['a']
pa.loc[:, col]选择所有行和名为 的列col。请注意,虽然这有效,但它不是引用数据帧列的惯用方式。为此,您应该使用pa['a']
Now you have lists in the cells of your column so you can use the vectorized string methodsto access the elements of those lists like so.
现在您在列的单元格中有列表,因此您可以使用矢量化字符串方法来访问这些列表的元素,就像这样。
pa['a'].str[0] #first value in lists
pa['a'].str[-1] #last value in lists
回答by unutbu
Storing lists as values in a Pandas DataFrame tends to be a mistake because it prevents you from taking advantage of fast NumPy or Pandas vectorized operations.
将列表作为值存储在 Pandas DataFrame 中往往是一个错误,因为它会阻止您利用快速的 NumPy 或 Pandas 向量化操作。
Therefore, you might be better off converting your DataFrame of lists of numbers into a wider DataFrame with native NumPy dtypes:
因此,您最好将数字列表的 DataFrame 转换为具有本机 NumPy dtypes 的更宽的 DataFrame:
import numpy as np
import pandas as pd
pa = pd.DataFrame({'a':np.array([[1.,4.],[2.],[3.,4.,5.]])})
df = pd.DataFrame(pa['a'].values.tolist())
# 0 1 2
# 0 1.0 4.0 NaN
# 1 2.0 NaN NaN
# 2 3.0 4.0 5.0
Now, you could select the first column like this:
现在,您可以像这样选择第一列:
In [36]: df.iloc[:, 0]
Out[36]:
0 1.0
1 2.0
2 3.0
Name: 0, dtype: float64
or the first row like this:
或像这样的第一行:
In [37]: df.iloc[0, :]
Out[37]:
0 1.0
1 4.0
2 NaN
Name: 0, dtype: float64
If you wish to drop NaNs, use .dropna():
如果您想删除 NaN,请使用.dropna():
In [38]: df.iloc[0, :].dropna()
Out[38]:
0 1.0
1 4.0
Name: 0, dtype: float64
and .tolist()to retrieve the values as a list:
并.tolist()以列表形式检索值:
In [39]: df.iloc[0, :].dropna().tolist()
Out[39]: [1.0, 4.0]
but if you wish to leverage NumPy/Pandas for speed, you'll want to express your calculation as vectorized operations on dfitself without converting back to Python lists.
但是,如果您希望利用 NumPy/Pandas 来提高速度,您需要将您的计算表达为对df自身进行矢量化操作,而无需转换回 Python 列表。

