pandas 熊猫从列中选择唯一值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/48292656/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:03:47  来源:igfitidea点击:

Pandas select unique values from column

python-3.xpandas

提问by chowpay

I was able to ingest a csv in jupyter notes by doing this :

通过这样做,我能够在 jupyter 笔记中摄取 csv:

csvData= pd.read_csv("logfile.csv")

My data looks like this:

我的数据如下所示:

event_timestamp ip  url 
2018-01-10 00:00 111.111.111.111 http://webpage1.com
2018-01-10 00:00 222.222.222.222 http://webpage2.com
...
..
.

I got a list of unique ips:

我得到了一个独特的 ip 列表:

list_ips = csvData("[ip]")

What I'm trying to do is get a unique. Normally I would do:

我想要做的是获得一个独特的. 通常我会这样做:

list_ips.unique()

But in this case I get this error:

但在这种情况下,我收到此错误:

AttributeError: 'DataFrame' object has no attribute 'unique'

(I can use list_ips.head() and it will list a few IPs but it's not a unique list)

(我可以使用 list_ips.head() 并且它会列出一些 IP,但它不是唯一的列表)

Thanks

谢谢

EDITMy problem is I actually had:

编辑我的问题是我实际上有:

list_ips = csvData([["ip"]]) 

So I removed 1 set of brackets so it became:

所以我删除了一组括号,所以它变成了:

list_ips = csvData(["ip"]) 

Then I was able to follow Wen's example and do:

然后我就可以按照温的例子做:

list_ips.unique().tolist()

Output:

输出:

['111.111.111.111','222.222.222.222'...]

回答by YOBEN_S

You need to select the column correctly then apply unique

您需要正确选择列然后应用 unique

csvData['ip'].unique().tolist()
Out[677]: ['111.111.111.111', '222.222.222.222']

回答by Julian Rachman

The reason why you are running into this problem is because pd.read_csv("logfile.csv").unique()is not a valid attribute from DataFrame. What I suggest you do is since csvData comes out as a list, you can search for all ip's by csvData['ip']then search for unique ip's with csvData['ip'].unique().

您遇到此问题的原因pd.read_csv("logfile.csv").unique()是它不是来自 DataFrame 的有效属性。我建议你做的是因为 csvData 作为一个列表出现,你可以搜索所有 ip,csvData['ip']然后搜索带有csvData['ip'].unique().