Python 要列出的 Pandas DataFrame 列

Question

提问by user3646105

I am pulling a subset of data from a column based on conditions in another column being met.

我正在根据满足另一列中的条件从一列中提取数据子集。

I can get the correct values back but it is in pandas.core.frame.DataFrame. How do I convert that to list?

我可以得到正确的值，但它在 pandas.core.frame.DataFrame 中。我如何将其转换为列表？

import pandas as pd

tst = pd.read_csv('C:\SomeCSV.csv')

lookupValue = tst['SomeCol'] == "SomeValue"
ID = tst[lookupValue][['SomeCol']]
#How To convert ID to a list

Answer 1

回答by Akavall

You can use the Series.to_listmethod.

您可以使用该Series.to_list方法。

For example:

例如：

import pandas as pd

df = pd.DataFrame({'a': [1, 3, 5, 7, 4, 5, 6, 4, 7, 8, 9],
                   'b': [3, 5, 6, 2, 4, 6, 7, 8, 7, 8, 9]})

print(df['a'].to_list())

Output:

输出：

[1, 3, 5, 7, 4, 5, 6, 4, 7, 8, 9]

To drop duplicates you can do one of the following:

要删除重复项，您可以执行以下操作之一：

>>> df['a'].drop_duplicates().to_list()
[1, 3, 5, 7, 4, 6, 8, 9]
>>> list(set(df['a'])) # as pointed out by EdChum
[1, 3, 4, 5, 6, 7, 8, 9]

Answer 2

回答by ShikharDua

The above solution is good if all the data is of same dtype. Numpy arrays are homogeneous containers. When you do df.valuesthe output is an numpy array. So if the data has intand floatin it then output will either have intor floatand the columns will loose their original dtype. Consider df

如果所有数据都是相同的 dtype，则上述解决方案是好的。Numpy 数组是同类容器。当你这样做时df.values，输出是一个numpy array. 因此，如果数据中包含int和float，则输出将具有int或float，并且列将丢失其原始数据类型。考虑df

a  b 
0  1  4
1  2  5 
2  3  6 

a    float64
b    int64

So if you want to keep original dtype, you can do something like

因此，如果您想保留原始 dtype，则可以执行以下操作

row_list = df.to_csv(None, header=False, index=False).split('\n')

this will return each row as a string.

这会将每一行作为字符串返回。

['1.0,4', '2.0,5', '3.0,6', '']

Then split each row to get list of list. Each element after splitting is a unicode. We need to convert it required datatype.

然后拆分每一行以获取列表列表。拆分后的每个元素都是一个 unicode。我们需要将其转换为所需的数据类型。

def f(row_str): 
  row_list = row_str.split(',')
  return [float(row_list[0]), int(row_list[1])]

df_list_of_list = map(f, row_list[:-1])

[[1.0, 4], [2.0, 5], [3.0, 6]]

Answer 3

回答by zhql0907

You can use pandas.Series.tolist

您可以使用 pandas.Series.tolist

e.g.:

例如：

import pandas as pd
df = pd.DataFrame({'a':[1,2,3], 'b':[4,5,6]})

Run:

跑：

>>> df['a'].tolist()

You will get

你会得到

>>> [1, 2, 3]

Answer 4

回答by MarredCheese

I'd like to clarify a few things:

我想澄清几点：

As other answers have pointed out, the simplest thing to do is use pandas.Series.tolist(). I'm not sure why the top voted answer leads off with using pandas.Series.values.tolist()since as far as I can tell, it adds syntax/confusion with no added benefit.
tst[lookupValue][['SomeCol']]is a dataframe (as stated in the question), not a series (as stated in a comment to the question). This is because tst[lookupValue]is a dataframe, and slicing it with [['SomeCol']]asks for a list of columns (that list that happens to have a length of 1), resulting in a dataframe being returned. If you remove the extra set of brackets, as in tst[lookupValue]['SomeCol'], then you are asking for just that one column rather than a list of columns, and thus you get a series back.
You need a series to use pandas.Series.tolist(), so you should definitely skip the second set of brackets in this case. FYI, if you ever end up with a one-column dataframe that isn't easily avoidable like this, you can use pandas.DataFrame.squeeze()to convert it to a series.
tst[lookupValue]['SomeCol']is getting a subset of a particular column via chained slicing. It slices once to get a dataframe with only certain rows left, and then it slices again to get a certain column. You can get away with it here since you are just reading, not writing, but the proper way to do it is tst.loc[lookupValue, 'SomeCol'](which returns a series).
Using the syntax from #4, you could reasonably do everything in one line: ID = tst.loc[tst['SomeCol'] == 'SomeValue', 'SomeCol'].tolist()

正如其他答案所指出的那样，最简单的方法是使用 pandas.Series.tolist(). 我不确定为什么最高投票的答案会导致使用，pandas.Series.values.tolist()因为据我所知，它增加了语法/混淆而没有额外的好处。
tst[lookupValue][['SomeCol']]是一个数据框（如问题中所述），而不是一个系列（如对问题的评论中所述）。这是因为tst[lookupValue]是一个数据帧，并通过[['SomeCol']]请求列列表（该列表的长度恰好为 1）对其进行切片，从而导致返回一个数据帧。如果您删除额外的一组括号，例如 tst[lookupValue]['SomeCol']，那么您只需要该列而不是列列表，因此您会得到一个系列。
您需要使用一个系列pandas.Series.tolist()，因此在这种情况下您绝对应该跳过第二组括号。仅供参考，如果您最终得到一个像这样不容易避免的单列数据框，您可以使用pandas.DataFrame.squeeze()将其转换为系列。
tst[lookupValue]['SomeCol']正在通过链式切片获取特定列的子集。它切片一次以获取仅剩下某些行的数据帧，然后再次切片以获取特定列。您可以在这里摆脱它，因为您只是在阅读而不是写作，但正确的方法是tst.loc[lookupValue, 'SomeCol']（返回一个系列）。
使用 #4 中的语法，您可以合理地在一行中完成所有操作： ID = tst.loc[tst['SomeCol'] == 'SomeValue', 'SomeCol'].tolist()

Demo Code:

演示代码：

import pandas as pd
df = pd.DataFrame({'colA':[1,2,1],
                   'colB':[4,5,6]})
filter_value = 1

print "df"
print df
print type(df)

rows_to_keep = df['colA'] == filter_value
print "\ndf['colA'] == filter_value"
print rows_to_keep
print type(rows_to_keep)

result = df[rows_to_keep]['colB']
print "\ndf[rows_to_keep]['colB']"
print result
print type(result)

result = df[rows_to_keep][['colB']]
print "\ndf[rows_to_keep][['colB']]"
print result
print type(result)

result = df[rows_to_keep][['colB']].squeeze()
print "\ndf[rows_to_keep][['colB']].squeeze()"
print result
print type(result)

result = df.loc[rows_to_keep, 'colB']
print "\ndf.loc[rows_to_keep, 'colB']"
print result
print type(result)

result = df.loc[df['colA'] == filter_value, 'colB']
print "\ndf.loc[df['colA'] == filter_value, 'colB']"
print result
print type(result)

ID = df.loc[rows_to_keep, 'colB'].tolist()
print "\ndf.loc[rows_to_keep, 'colB'].tolist()"
print ID
print type(ID)

ID = df.loc[df['colA'] == filter_value, 'colB'].tolist()
print "\ndf.loc[df['colA'] == filter_value, 'colB'].tolist()"
print ID
print type(ID)

Result:

结果：

df
   colA  colB
0     1     4
1     2     5
2     1     6
<class 'pandas.core.frame.DataFrame'>

df['colA'] == filter_value
0     True
1    False
2     True
Name: colA, dtype: bool
<class 'pandas.core.series.Series'>

df[rows_to_keep]['colB']
0    4
2    6
Name: colB, dtype: int64
<class 'pandas.core.series.Series'>

df[rows_to_keep][['colB']]
   colB
0     4
2     6
<class 'pandas.core.frame.DataFrame'>

df[rows_to_keep][['colB']].squeeze()
0    4
2    6
Name: colB, dtype: int64
<class 'pandas.core.series.Series'>

df.loc[rows_to_keep, 'colB']
0    4
2    6
Name: colB, dtype: int64
<class 'pandas.core.series.Series'>

df.loc[df['colA'] == filter_value, 'colB']
0    4
2    6
Name: colB, dtype: int64
<class 'pandas.core.series.Series'>

df.loc[rows_to_keep, 'colB'].tolist()
[4, 6]
<type 'list'>

df.loc[df['colA'] == filter_value, 'colB'].tolist()
[4, 6]
<type 'list'>

Python 要列出的 Pandas DataFrame 列

提问by user3646105

回答by Akavall

回答by ShikharDua

回答by zhql0907

回答by MarredCheese

相关推荐

最近更新

标签

Python 要列出的 Pandas DataFrame 列

提问by user3646105

回答by Akavall

回答by ShikharDua

回答by zhql0907

回答by MarredCheese

相关推荐

如何在python中围绕感兴趣的区域绘制矩形

如何在 Python 中乘以小数

在python 3中将map对象转换为numpy数组

Python 浮点值作为字典键

相关推荐

最近更新

标签