Python 要列出的 Pandas DataFrame 列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23748995/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas DataFrame column to list
提问by user3646105
I am pulling a subset of data from a column based on conditions in another column being met.
我正在根据满足另一列中的条件从一列中提取数据子集。
I can get the correct values back but it is in pandas.core.frame.DataFrame. How do I convert that to list?
我可以得到正确的值,但它在 pandas.core.frame.DataFrame 中。我如何将其转换为列表?
import pandas as pd
tst = pd.read_csv('C:\SomeCSV.csv')
lookupValue = tst['SomeCol'] == "SomeValue"
ID = tst[lookupValue][['SomeCol']]
#How To convert ID to a list
回答by Akavall
You can use the Series.to_list
method.
您可以使用该Series.to_list
方法。
For example:
例如:
import pandas as pd
df = pd.DataFrame({'a': [1, 3, 5, 7, 4, 5, 6, 4, 7, 8, 9],
'b': [3, 5, 6, 2, 4, 6, 7, 8, 7, 8, 9]})
print(df['a'].to_list())
Output:
输出:
[1, 3, 5, 7, 4, 5, 6, 4, 7, 8, 9]
To drop duplicates you can do one of the following:
要删除重复项,您可以执行以下操作之一:
>>> df['a'].drop_duplicates().to_list()
[1, 3, 5, 7, 4, 6, 8, 9]
>>> list(set(df['a'])) # as pointed out by EdChum
[1, 3, 4, 5, 6, 7, 8, 9]
回答by ShikharDua
The above solution is good if all the data is of same dtype. Numpy arrays are homogeneous containers. When you do df.values
the output is an numpy array
. So if the data has int
and float
in it then output will either have int
or float
and the columns will loose their original dtype.
Consider df
如果所有数据都是相同的 dtype,则上述解决方案是好的。Numpy 数组是同类容器。当你这样做时df.values
,输出是一个numpy array
. 因此,如果数据中包含int
和float
,则输出将具有int
或float
,并且列将丢失其原始数据类型。考虑df
a b
0 1 4
1 2 5
2 3 6
a float64
b int64
So if you want to keep original dtype, you can do something like
因此,如果您想保留原始 dtype,则可以执行以下操作
row_list = df.to_csv(None, header=False, index=False).split('\n')
this will return each row as a string.
这会将每一行作为字符串返回。
['1.0,4', '2.0,5', '3.0,6', '']
Then split each row to get list of list. Each element after splitting is a unicode. We need to convert it required datatype.
然后拆分每一行以获取列表列表。拆分后的每个元素都是一个 unicode。我们需要将其转换为所需的数据类型。
def f(row_str):
row_list = row_str.split(',')
return [float(row_list[0]), int(row_list[1])]
df_list_of_list = map(f, row_list[:-1])
[[1.0, 4], [2.0, 5], [3.0, 6]]
回答by zhql0907
You can use pandas.Series.tolist
您可以使用 pandas.Series.tolist
e.g.:
例如:
import pandas as pd
df = pd.DataFrame({'a':[1,2,3], 'b':[4,5,6]})
Run:
跑:
>>> df['a'].tolist()
You will get
你会得到
>>> [1, 2, 3]
回答by MarredCheese
I'd like to clarify a few things:
我想澄清几点:
- As other answers have pointed out, the simplest thing to do is use
pandas.Series.tolist()
. I'm not sure why the top voted answer leads off with usingpandas.Series.values.tolist()
since as far as I can tell, it adds syntax/confusion with no added benefit. tst[lookupValue][['SomeCol']]
is a dataframe (as stated in the question), not a series (as stated in a comment to the question). This is becausetst[lookupValue]
is a dataframe, and slicing it with[['SomeCol']]
asks for a list of columns (that list that happens to have a length of 1), resulting in a dataframe being returned. If you remove the extra set of brackets, as intst[lookupValue]['SomeCol']
, then you are asking for just that one column rather than a list of columns, and thus you get a series back.- You need a series to use
pandas.Series.tolist()
, so you should definitely skip the second set of brackets in this case. FYI, if you ever end up with a one-column dataframe that isn't easily avoidable like this, you can usepandas.DataFrame.squeeze()
to convert it to a series. tst[lookupValue]['SomeCol']
is getting a subset of a particular column via chained slicing. It slices once to get a dataframe with only certain rows left, and then it slices again to get a certain column. You can get away with it here since you are just reading, not writing, but the proper way to do it istst.loc[lookupValue, 'SomeCol']
(which returns a series).- Using the syntax from #4, you could reasonably do everything in one line:
ID = tst.loc[tst['SomeCol'] == 'SomeValue', 'SomeCol'].tolist()
- 正如其他答案所指出的那样,最简单的方法是使用
pandas.Series.tolist()
. 我不确定为什么最高投票的答案会导致使用,pandas.Series.values.tolist()
因为据我所知,它增加了语法/混淆而没有额外的好处。 tst[lookupValue][['SomeCol']]
是一个数据框(如问题中所述),而不是一个系列(如对问题的评论中所述)。这是因为tst[lookupValue]
是一个数据帧,并通过[['SomeCol']]
请求列列表(该列表的长度恰好为 1)对其进行切片,从而导致返回一个数据帧。如果您删除额外的一组括号,例如tst[lookupValue]['SomeCol']
,那么您只需要该列而不是列列表,因此您会得到一个系列。- 您需要使用一个系列
pandas.Series.tolist()
,因此在这种情况下您绝对应该跳过第二组括号。仅供参考,如果您最终得到一个像这样不容易避免的单列数据框,您可以使用pandas.DataFrame.squeeze()
将其转换为系列。 tst[lookupValue]['SomeCol']
正在通过链式切片获取特定列的子集。它切片一次以获取仅剩下某些行的数据帧,然后再次切片以获取特定列。您可以在这里摆脱它,因为您只是在阅读而不是写作,但正确的方法是tst.loc[lookupValue, 'SomeCol']
(返回一个系列)。- 使用 #4 中的语法,您可以合理地在一行中完成所有操作:
ID = tst.loc[tst['SomeCol'] == 'SomeValue', 'SomeCol'].tolist()
Demo Code:
演示代码:
import pandas as pd
df = pd.DataFrame({'colA':[1,2,1],
'colB':[4,5,6]})
filter_value = 1
print "df"
print df
print type(df)
rows_to_keep = df['colA'] == filter_value
print "\ndf['colA'] == filter_value"
print rows_to_keep
print type(rows_to_keep)
result = df[rows_to_keep]['colB']
print "\ndf[rows_to_keep]['colB']"
print result
print type(result)
result = df[rows_to_keep][['colB']]
print "\ndf[rows_to_keep][['colB']]"
print result
print type(result)
result = df[rows_to_keep][['colB']].squeeze()
print "\ndf[rows_to_keep][['colB']].squeeze()"
print result
print type(result)
result = df.loc[rows_to_keep, 'colB']
print "\ndf.loc[rows_to_keep, 'colB']"
print result
print type(result)
result = df.loc[df['colA'] == filter_value, 'colB']
print "\ndf.loc[df['colA'] == filter_value, 'colB']"
print result
print type(result)
ID = df.loc[rows_to_keep, 'colB'].tolist()
print "\ndf.loc[rows_to_keep, 'colB'].tolist()"
print ID
print type(ID)
ID = df.loc[df['colA'] == filter_value, 'colB'].tolist()
print "\ndf.loc[df['colA'] == filter_value, 'colB'].tolist()"
print ID
print type(ID)
Result:
结果:
df
colA colB
0 1 4
1 2 5
2 1 6
<class 'pandas.core.frame.DataFrame'>
df['colA'] == filter_value
0 True
1 False
2 True
Name: colA, dtype: bool
<class 'pandas.core.series.Series'>
df[rows_to_keep]['colB']
0 4
2 6
Name: colB, dtype: int64
<class 'pandas.core.series.Series'>
df[rows_to_keep][['colB']]
colB
0 4
2 6
<class 'pandas.core.frame.DataFrame'>
df[rows_to_keep][['colB']].squeeze()
0 4
2 6
Name: colB, dtype: int64
<class 'pandas.core.series.Series'>
df.loc[rows_to_keep, 'colB']
0 4
2 6
Name: colB, dtype: int64
<class 'pandas.core.series.Series'>
df.loc[df['colA'] == filter_value, 'colB']
0 4
2 6
Name: colB, dtype: int64
<class 'pandas.core.series.Series'>
df.loc[rows_to_keep, 'colB'].tolist()
[4, 6]
<type 'list'>
df.loc[df['colA'] == filter_value, 'colB'].tolist()
[4, 6]
<type 'list'>