Python 从 Pandas DataFrame 列标题中获取列表

Question

提问by natsuki_2002

I want to get a list of the column headers from a pandas DataFrame. The DataFrame will come from user input so I won't know how many columns there will be or what they will be called.

我想从 Pandas DataFrame 中获取列标题的列表。DataFrame 将来自用户输入，所以我不知道会有多少列或它们将被称为什么。

For example, if I'm given a DataFrame like this:

例如，如果我得到一个这样的 DataFrame：

>>> my_dataframe
    y  gdp  cap
0   1    2    5
1   2    3    9
2   8    7    2
3   3    4    7
4   6    7    7
5   4    8    3
6   8    2    8
7   9    9   10
8   6    6    4
9  10   10    7

I would want to get a list like this:

我想得到一个这样的列表：

>>> header_list
['y', 'gdp', 'cap']

Answer 1

采纳答案by Simeon Visser

You can get the values as a list by doing:

您可以通过执行以下操作以列表形式获取值：

list(my_dataframe.columns.values)

Also you can simply use: (as shown in Ed Chum's answer):

您也可以简单地使用：（如Ed Chum 的回答所示）：

list(my_dataframe)

Answer 2

回答by BrenBarn

That's available as my_dataframe.columns.

这可以作为my_dataframe.columns.

Answer 3

回答by user21988

n = []
for i in my_dataframe.columns:
    n.append(i)
print n

Answer 4

回答by EdChum

There is a built in method which is the most performant:

有一个内置的方法是性能最好的：

my_dataframe.columns.values.tolist()

.columnsreturns an Index, .columns.valuesreturns an array and this has a helper function .tolistto return a list.

.columns返回一个索引，.columns.values返回一个数组，它有一个帮助函数.tolist来返回一个列表。

If performance is not as important to you, Indexobjects define a .tolist()method that you can call directly:

如果性能对你来说不是那么重要，Index对象定义一个.tolist()你可以直接调用的方法：

my_dataframe.columns.tolist()

The difference in performance is obvious:

性能上的区别很明显：

%timeit df.columns.tolist()
16.7 μs ± 317 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit df.columns.values.tolist()
1.24 μs ± 12.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

For those who hate typing, you can just call liston df, as so:

对于那些谁讨厌打字，你可以叫list上df，像这样：

list(df)

Answer 5

回答by Sascha Gottfried

A DataFramefollows the dict-like convention of iterating over the “keys” of the objects.

一个数据帧遵循类似字典的遍历对象的“钥匙”的约定。

my_dataframe.keys()

Create a list of keys/columns - object method to_list()and pythonic way

创建键/列列表 - 对象方法to_list()和pythonic方式

my_dataframe.keys().to_list()
list(my_dataframe.keys())

Basic iterationon a DataFrame returns column labels

DataFrame 上的基本迭代返回列标签

[column for column in my_dataframe]

Do not convert a DataFrame into a list, just to get the column labels. Do not stop thinking while looking for convenient code samples.

不要将 DataFrame 转换为列表，只是为了获取列标签。在寻找方便的代码示例时不要停止思考。

xlarge = pd.DataFrame(np.arange(100000000).reshape(10000,10000))
list(xlarge) #compute time and memory consumption depend on dataframe size - O(N)
list(xlarge.keys()) #constant time operation - O(1)

Answer 6

回答by tegan

Did some quick tests, and perhaps unsurprisingly the built-in version using dataframe.columns.values.tolist()is the fastest:

做了一些快速测试，也许不出所料，使用的内置版本dataframe.columns.values.tolist()是最快的：

In [1]: %timeit [column for column in df]
1000 loops, best of 3: 81.6 μs per loop

In [2]: %timeit df.columns.values.tolist()
10000 loops, best of 3: 16.1 μs per loop

In [3]: %timeit list(df)
10000 loops, best of 3: 44.9 μs per loop

In [4]: % timeit list(df.columns.values)
10000 loops, best of 3: 38.4 μs per loop

(I still really like the list(dataframe)though, so thanks EdChum!)

（我仍然很喜欢list(dataframe)，所以感谢 EdChum！）

Answer 7

回答by fixxxer

Its gets even simpler (by pandas 0.16.0) :

它变得更加简单（由 pandas 0.16.0 提供）：

df.columns.tolist()

will give you the column names in a nice list.

会给你一个不错的列表中的列名。

Answer 8

回答by Alexander

>>> list(my_dataframe)
['y', 'gdp', 'cap']

To list the columns of a dataframe while in debugger mode, use a list comprehension:

要在调试器模式下列出数据帧的列，请使用列表理解：

>>> [c for c in my_dataframe]
['y', 'gdp', 'cap']

By the way, you can get a sorted list simply by using sorted:

顺便说一句，您只需使用sorted以下命令即可获得排序列表：

>>> sorted(my_dataframe)
['cap', 'gdp', 'y']

Answer 9

回答by Anton Protopopov

It's interesting but df.columns.values.tolist()is almost 3 times faster then df.columns.tolist()but I thought that they are the same:

这很有趣，但df.columns.values.tolist()几乎快了 3 倍，df.columns.tolist()但我认为它们是相同的：

In [97]: %timeit df.columns.values.tolist()
100000 loops, best of 3: 2.97 μs per loop

In [98]: %timeit df.columns.tolist()
10000 loops, best of 3: 9.67 μs per loop

Answer 10

回答by firelynx

In the Notebook

在笔记本中

For data exploration in the IPython notebook, my preferred way is this:

对于 IPython notebook 中的数据探索，我的首选方式是这样的：

sorted(df)

Which will produce an easy to read alphabetically ordered list.

这将产生一个易于阅读的按字母顺序排列的列表。

In a code repository

在代码存储库中

In code I find it more explicit to do

在代码中，我发现这样做更明确

df.columns

Because it tells others reading your code what you are doing.

因为它告诉其他阅读你的代码的人你在做什么。

Python 从 Pandas DataFrame 列标题中获取列表

提问by natsuki_2002

采纳答案by Simeon Visser

回答by BrenBarn

回答by user21988

回答by EdChum

回答by Sascha Gottfried

回答by tegan

回答by fixxxer

回答by Alexander

回答by Anton Protopopov

回答by firelynx

In the Notebook

在笔记本中

In a code repository

在代码存储库中

相关推荐

最近更新

标签

Python 从 Pandas DataFrame 列标题中获取列表

提问by natsuki_2002

采纳答案by Simeon Visser

回答by BrenBarn

回答by user21988

回答by EdChum

回答by Sascha Gottfried

回答by tegan

回答by fixxxer

回答by Alexander

回答by Anton Protopopov

回答by firelynx

In the Notebook

在笔记本中

In a code repository

在代码存储库中

相关推荐

在 iPython Notebook 中查看 pdf 图像

Python tf.app.run() 是如何工作的？

Python 将pip与virtualenv一起使用时如何避免“权限被拒绝”

Python 中的十六进制到 Base64 转换

相关推荐

最近更新

标签