Python 从 Pandas DataFrame 列标题中获取列表
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19482970/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Get list from pandas DataFrame column headers
提问by natsuki_2002
I want to get a list of the column headers from a pandas DataFrame. The DataFrame will come from user input so I won't know how many columns there will be or what they will be called.
我想从 Pandas DataFrame 中获取列标题的列表。DataFrame 将来自用户输入,所以我不知道会有多少列或它们将被称为什么。
For example, if I'm given a DataFrame like this:
例如,如果我得到一个这样的 DataFrame:
>>> my_dataframe
y gdp cap
0 1 2 5
1 2 3 9
2 8 7 2
3 3 4 7
4 6 7 7
5 4 8 3
6 8 2 8
7 9 9 10
8 6 6 4
9 10 10 7
I would want to get a list like this:
我想得到一个这样的列表:
>>> header_list
['y', 'gdp', 'cap']
采纳答案by Simeon Visser
You can get the values as a list by doing:
您可以通过执行以下操作以列表形式获取值:
list(my_dataframe.columns.values)
Also you can simply use: (as shown in Ed Chum's answer):
您也可以简单地使用:(如Ed Chum 的回答所示):
list(my_dataframe)
回答by BrenBarn
That's available as my_dataframe.columns
.
这可以作为my_dataframe.columns
.
回答by user21988
n = []
for i in my_dataframe.columns:
n.append(i)
print n
回答by EdChum
There is a built in method which is the most performant:
有一个内置的方法是性能最好的:
my_dataframe.columns.values.tolist()
.columns
returns an Index, .columns.values
returns an array and this has a helper function .tolist
to return a list.
.columns
返回一个索引,.columns.values
返回一个数组,它有一个帮助函数.tolist
来返回一个列表。
If performance is not as important to you, Index
objects define a .tolist()
method that you can call directly:
如果性能对你来说不是那么重要,Index
对象定义一个.tolist()
你可以直接调用的方法:
my_dataframe.columns.tolist()
The difference in performance is obvious:
性能上的区别很明显:
%timeit df.columns.tolist()
16.7 μs ± 317 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit df.columns.values.tolist()
1.24 μs ± 12.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
For those who hate typing, you can just call list
on df
, as so:
对于那些谁讨厌打字,你可以叫list
上df
,像这样:
list(df)
回答by Sascha Gottfried
A DataFramefollows the dict-like convention of iterating over the “keys” of the objects.
一个数据帧遵循类似字典的遍历对象的“钥匙”的约定。
my_dataframe.keys()
Create a list of keys/columns - object method to_list()
and pythonic way
创建键/列列表 - 对象方法to_list()
和pythonic方式
my_dataframe.keys().to_list()
list(my_dataframe.keys())
Basic iterationon a DataFrame returns column labels
DataFrame 上的基本迭代返回列标签
[column for column in my_dataframe]
Do not convert a DataFrame into a list, just to get the column labels. Do not stop thinking while looking for convenient code samples.
不要将 DataFrame 转换为列表,只是为了获取列标签。在寻找方便的代码示例时不要停止思考。
xlarge = pd.DataFrame(np.arange(100000000).reshape(10000,10000))
list(xlarge) #compute time and memory consumption depend on dataframe size - O(N)
list(xlarge.keys()) #constant time operation - O(1)
回答by tegan
Did some quick tests, and perhaps unsurprisingly the built-in version using dataframe.columns.values.tolist()
is the fastest:
做了一些快速测试,也许不出所料,使用的内置版本dataframe.columns.values.tolist()
是最快的:
In [1]: %timeit [column for column in df]
1000 loops, best of 3: 81.6 μs per loop
In [2]: %timeit df.columns.values.tolist()
10000 loops, best of 3: 16.1 μs per loop
In [3]: %timeit list(df)
10000 loops, best of 3: 44.9 μs per loop
In [4]: % timeit list(df.columns.values)
10000 loops, best of 3: 38.4 μs per loop
(I still really like the list(dataframe)
though, so thanks EdChum!)
(我仍然很喜欢list(dataframe)
,所以感谢 EdChum!)
回答by fixxxer
Its gets even simpler (by pandas 0.16.0) :
它变得更加简单(由 pandas 0.16.0 提供):
df.columns.tolist()
will give you the column names in a nice list.
会给你一个不错的列表中的列名。
回答by Alexander
>>> list(my_dataframe)
['y', 'gdp', 'cap']
To list the columns of a dataframe while in debugger mode, use a list comprehension:
要在调试器模式下列出数据帧的列,请使用列表理解:
>>> [c for c in my_dataframe]
['y', 'gdp', 'cap']
By the way, you can get a sorted list simply by using sorted
:
顺便说一句,您只需使用sorted
以下命令即可获得排序列表:
>>> sorted(my_dataframe)
['cap', 'gdp', 'y']
回答by Anton Protopopov
It's interesting but df.columns.values.tolist()
is almost 3 times faster then df.columns.tolist()
but I thought that they are the same:
这很有趣,但df.columns.values.tolist()
几乎快了 3 倍,df.columns.tolist()
但我认为它们是相同的:
In [97]: %timeit df.columns.values.tolist()
100000 loops, best of 3: 2.97 μs per loop
In [98]: %timeit df.columns.tolist()
10000 loops, best of 3: 9.67 μs per loop
回答by firelynx
In the Notebook
在笔记本中
For data exploration in the IPython notebook, my preferred way is this:
对于 IPython notebook 中的数据探索,我的首选方式是这样的:
sorted(df)
Which will produce an easy to read alphabetically ordered list.
这将产生一个易于阅读的按字母顺序排列的列表。
In a code repository
在代码存储库中
In code I find it more explicit to do
在代码中,我发现这样做更明确
df.columns
Because it tells others reading your code what you are doing.
因为它告诉其他阅读你的代码的人你在做什么。