Python 如何获取包含特定数据类型的 pandas.DataFrame 列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/24901766/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to get pandas.DataFrame columns containing specific dtype
提问by Charlie_M
I'm using df.columns.values to make a list of column names which I then iterate over and make charts, etc... but when I set this up I overlooked the non-numeric columns in the df. Now, I'd much rather not simply drop those columns from the df (or a copy of it). Instead, I would like to find a slick way to eliminate them from the list of column names.
我正在使用 df.columns.values 来制作列名列表,然后我对其进行迭代并制作图表等......但是当我设置它时,我忽略了 df 中的非数字列。现在,我宁愿不简单地从 df (或它的副本)中删除这些列。相反,我想找到一种巧妙的方法将它们从列名列表中消除。
Now I have:
我现在有:
names = df.columns.values
what I'd like to get to is something that behaves like:
我想要的是这样的行为:
names = df.columns.values(column_type=float64)
Is there any slick way to do this? I suppose I could make a copy of the df, and drop those non-numeric columns before doing columns.values, but that strikes me as clunky.
有没有什么巧妙的方法来做到这一点?我想我可以制作 df 的副本,并在执行 column.values 之前删除那些非数字列,但这让我觉得很笨拙。
Welcome any inputs/suggestions. Thanks.
欢迎任何意见/建议。谢谢。
采纳答案by Woody Pride
Someone will give you a better answe than this possibly, but one thing I tend to do is if all my numeric data are int64
or float64
objects, then you can create a dict of the column data types and then use the values to create your list of columns.
有人会给你一个比这更好的答案,但我倾向于做的一件事是,如果我所有的数字数据都是int64
或float64
对象,那么你可以创建一个列数据类型的字典,然后使用这些值来创建你的列列表.
So for example, in a dataframe where I have columns of type float64
, int64
and object
firstly you can look at the data types as so:
因此,例如,在一个数据帧在那里我有类型的列float64
,int64
并object
首先你可以看一下数据类型为这样:
DF.dtypes
and if they conform to the standard whereby the non-numeric columns of data are all object
types (as they are in my dataframes), then you can do the following to get a list of the numeric columns:
如果它们符合标准,即非数字数据列都是object
类型(就像它们在我的数据框中一样),那么您可以执行以下操作来获取数字列的列表:
[key for key in dict(DF.dtypes) if dict(DF.dtypes)[key] in ['float64', 'int64']]
Its just a simple list comprehension. Nothing fancy. Again, though whether this works for you will depend upon how you set up you dataframe...
它只是一个简单的列表理解。没有什么花哨。同样,尽管这是否适合您将取决于您如何设置数据框......
回答by chrisb
There's a new feature in 0.14.1, select_dtypes
to select columns by dtype, by providing a list of dtypes to include or exclude.
0.14.1 中有一个新功能select_dtypes
,通过提供要包含或排除的 dtype 列表,按dtype选择列。
For example:
例如:
df = pd.DataFrame({'a': np.random.randn(1000),
'b': range(1000),
'c': ['a'] * 1000,
'd': pd.date_range('2000-1-1', periods=1000)})
df.select_dtypes(['float64','int64'])
Out[129]:
a b
0 0.153070 0
1 0.887256 1
2 -1.456037 2
3 -1.147014 3
...
回答by Arthur Zennig
dtypes is a Pandas Series. That means it contains index & values attributes. If you only need the column names:
dtypes 是 Pandas 系列。这意味着它包含索引和值属性。如果您只需要列名:
headers = df.dtypes.index
it will return a list containing the column names of "df" dataframe.
它将返回一个包含“df”数据框列名的列表。
回答by J11
To get the column names from pandas dataframe in python3- Here I am creating a data frame from a fileName.csv file
从python3中的pandas数据框中获取列名-这里我从fileName.csv文件创建一个数据框
>>> import pandas as pd
>>> df = pd.read_csv('fileName.csv')
>>> columnNames = list(df.head(0))
>>> print(columnNames)
回答by Ritik Raj Srivastava
You can also try to get the column names from panda data frame that returns columnn name as well dtype. here i'll read csv file from https://mlearn.ics.uci.edu/databases/autos/imports-85.databut you have define header that contain columns names.
您还可以尝试从返回 columnn name 和 dtype 的 panda 数据框中获取列名。在这里,我将从https://mlearn.ics.uci.edu/databases/autos/imports-85.data读取 csv 文件,但您已经定义了包含列名称的标题。
import pandas as pd
url="https://mlearn.ics.uci.edu/databases/autos/imports-85.data"
df=pd.read_csv(url,header = None)
headers=["symboling","normalized-losses","make","fuel-type","aspiration","num-of-doors","body-style",
"drive-wheels","engine-location","wheel-base","length","width","height","curb-weight","engine-type",
"num-of-cylinders","engine-size","fuel-system","bore","stroke","compression-ratio","horsepower","peak-rpm"
,"city-mpg","highway-mpg","price"]
df.columns=headers
print df.columns