Python 从 Pandas DataFrame 列标题中获取列表

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19482970/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 13:52:31  来源:igfitidea点击:

Get list from pandas DataFrame column headers

pythonpandasdataframe

提问by natsuki_2002

I want to get a list of the column headers from a pandas DataFrame. The DataFrame will come from user input so I won't know how many columns there will be or what they will be called.

我想从 Pandas DataFrame 中获取列标题的列表。DataFrame 将来自用户输入,所以我不知道会有多少列或它们将被称为什么。

For example, if I'm given a DataFrame like this:

例如,如果我得到一个这样的 DataFrame:

>>> my_dataframe
    y  gdp  cap
0   1    2    5
1   2    3    9
2   8    7    2
3   3    4    7
4   6    7    7
5   4    8    3
6   8    2    8
7   9    9   10
8   6    6    4
9  10   10    7

I would want to get a list like this:

我想得到一个这样的列表:

>>> header_list
['y', 'gdp', 'cap']

采纳答案by Simeon Visser

You can get the values as a list by doing:

您可以通过执行以下操作以列表形式获取值:

list(my_dataframe.columns.values)

Also you can simply use: (as shown in Ed Chum's answer):

您也可以简单地使用:(如Ed Chum 的回答所示):

list(my_dataframe)

回答by BrenBarn

That's available as my_dataframe.columns.

这可以作为my_dataframe.columns.

回答by user21988

n = []
for i in my_dataframe.columns:
    n.append(i)
print n

回答by EdChum

There is a built in method which is the most performant:

有一个内置的方法是性能最好的:

my_dataframe.columns.values.tolist()

.columnsreturns an Index, .columns.valuesreturns an array and this has a helper function .tolistto return a list.

.columns返回一个索引,.columns.values返回一个数组,它有一个帮助函数.tolist来返回一个列表。

If performance is not as important to you, Indexobjects define a .tolist()method that you can call directly:

如果性能对你来说不是那么重要,Index对象定义一个.tolist()你可以直接调用的方法:

my_dataframe.columns.tolist()

The difference in performance is obvious:

性能上的区别很明显:

%timeit df.columns.tolist()
16.7 μs ± 317 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit df.columns.values.tolist()
1.24 μs ± 12.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


For those who hate typing, you can just call liston df, as so:

对于那些谁讨厌打字,你可以叫listdf,像这样:

list(df)

回答by Sascha Gottfried

A DataFramefollows the dict-like convention of iterating over the “keys” of the objects.

一个数据帧遵循类似字典的遍历对象的“钥匙”的约定。

my_dataframe.keys()

Create a list of keys/columns - object method to_list()and pythonic way

创建键/列列表 - 对象方法to_list()和pythonic方式

my_dataframe.keys().to_list()
list(my_dataframe.keys())

Basic iterationon a DataFrame returns column labels

DataFrame 上的基本迭代返回列标签

[column for column in my_dataframe]

Do not convert a DataFrame into a list, just to get the column labels. Do not stop thinking while looking for convenient code samples.

不要将 DataFrame 转换为列表,只是为了获取列标签。在寻找方便的代码示例时不要停止思考。

xlarge = pd.DataFrame(np.arange(100000000).reshape(10000,10000))
list(xlarge) #compute time and memory consumption depend on dataframe size - O(N)
list(xlarge.keys()) #constant time operation - O(1)

回答by tegan

Did some quick tests, and perhaps unsurprisingly the built-in version using dataframe.columns.values.tolist()is the fastest:

做了一些快速测试,也许不出所料,使用的内置版本dataframe.columns.values.tolist()是最快的:

In [1]: %timeit [column for column in df]
1000 loops, best of 3: 81.6 μs per loop

In [2]: %timeit df.columns.values.tolist()
10000 loops, best of 3: 16.1 μs per loop

In [3]: %timeit list(df)
10000 loops, best of 3: 44.9 μs per loop

In [4]: % timeit list(df.columns.values)
10000 loops, best of 3: 38.4 μs per loop

(I still really like the list(dataframe)though, so thanks EdChum!)

(我仍然很喜欢list(dataframe),所以感谢 EdChum!)

回答by fixxxer

Its gets even simpler (by pandas 0.16.0) :

它变得更加简单(由 pandas 0.16.0 提供):

df.columns.tolist()

will give you the column names in a nice list.

会给你一个不错的列表中的列名。

回答by Alexander

>>> list(my_dataframe)
['y', 'gdp', 'cap']

To list the columns of a dataframe while in debugger mode, use a list comprehension:

要在调试器模式下列出数据帧的列,请使用列表理解:

>>> [c for c in my_dataframe]
['y', 'gdp', 'cap']

By the way, you can get a sorted list simply by using sorted:

顺便说一句,您只需使用sorted以下命令即可获得排序列表:

>>> sorted(my_dataframe)
['cap', 'gdp', 'y']

回答by Anton Protopopov

It's interesting but df.columns.values.tolist()is almost 3 times faster then df.columns.tolist()but I thought that they are the same:

这很有趣,但df.columns.values.tolist()几乎快了 3 倍,df.columns.tolist()但我认为它们是相同的:

In [97]: %timeit df.columns.values.tolist()
100000 loops, best of 3: 2.97 μs per loop

In [98]: %timeit df.columns.tolist()
10000 loops, best of 3: 9.67 μs per loop

回答by firelynx

In the Notebook

在笔记本中

For data exploration in the IPython notebook, my preferred way is this:

对于 IPython notebook 中的数据探索,我的首选方式是这样的:

sorted(df)

Which will produce an easy to read alphabetically ordered list.

这将产生一个易于阅读的按字母顺序排列的列表。

In a code repository

在代码存储库中

In code I find it more explicit to do

在代码中,我发现这样做更明确

df.columns

Because it tells others reading your code what you are doing.

因为它告诉其他阅读你的代码的人你在做什么。