在python中使用特定列名过滤pandas数据框
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/48198021/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Filter pandas dataframe with specific column names in python
提问by J Cena
I have a pandas dataframe and a list as follows
我有一个熊猫数据框和一个列表如下
mylist = ['nnn', 'mmm', 'yyy']
mydata =
xxx yyy zzz nnn ddd mmm
0 0 10 5 5 5 5
1 1 9 2 3 4 4
2 2 8 8 7 9 0
Now, I want to get only the columns mentioned in mylist
and save it as a csv file.
现在,我只想获取中提到的列mylist
并将其另存为 csv 文件。
i.e.
IE
yyy nnn mmm
0 10 5 5
1 9 3 4
2 8 7 0
My current code is as follows.
我目前的代码如下。
mydata = pd.read_csv( input_file, header=0)
for item in mylist:
mydata_new = mydata[item]
print(mydata_new)
mydata_new.to_csv(file_name)
It seems to me that my new dataframe produces wrong results.Where I am making it wrong? Please help me!
在我看来,我的新数据框产生了错误的结果。我哪里出错了?请帮我!
回答by cs95
Just pass a list of column names to index df
:
只需将列名列表传递给 index df
:
df[['nnn', 'mmm', 'yyy']]
nnn mmm yyy
0 5 5 10
1 3 4 9
2 7 0 8
If you need to handle non-existent column names in your list, try filtering with df.columns.isin
-
如果您需要处理列表中不存在的列名称,请尝试使用df.columns.isin
-
df.loc[:, df.columns.isin(['nnn', 'mmm', 'yyy', 'zzzzzz'])]
yyy nnn mmm
0 10 5 5
1 9 3 4
2 8 7 0
回答by Tai
You can just put mylist
inside []
and pandas will select it for you.
你可以把它mylist
放进去[]
,pandas 会为你选择它。
mydata_new = mydata[mylist]
Not sure whether your yyy
is a typo.
不确定你是否yyy
是一个错字。
The reason that you are wrong is that you are assigning mydata_new
to a new series every time in the loop.
你错的原因是你mydata_new
每次在循环中都分配给一个新系列。
for item in mylist:
mydata_new = mydata[item] # <-
Thus, it will create a series rather than the whole df you want.
因此,它将创建一个系列而不是您想要的整个 df。
If some names in the list is not in your data frame, you can always check it with,
如果列表中的某些名称不在您的数据框中,您可以随时检查,
len(set(mylist) - set(mydata.columns)) > 0
and print it out
并打印出来
print(set(mylist) - set(mydata.columns))
Then see if there are typos or other unintended behaviors.
然后查看是否有错别字或其他意外行为。