Python 在 Pandas 数据框中的每一列中打印唯一值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/27241253/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
print the unique values in every column in a pandas dataframe
提问by yoshiserry
I have a dataframe (df) and want to print the unique values from each column in the dataframe.
我有一个数据框 (df),想打印数据框中每一列的唯一值。
I need to substitute the variable (i) [column name] into the print statement
我需要将变量 (i) [列名] 替换到打印语句中
column_list = df.columns.values.tolist()
for column_name in column_list:
print(df."[column_name]".unique()
Update
更新
When I use this: I get "Unexpected EOF Parsing"with no extra details.
当我使用它时:我得到“意外的 EOF 解析”,没有额外的细节。
column_list = sorted_data.columns.values.tolist()
for column_name in column_list:
print(sorted_data[column_name].unique()
What is the difference between your syntax YS-L (above) and the below:
你的语法 YS-L(上面)和下面的有什么区别:
for column_name in sorted_data:
print(column_name)
s = sorted_data[column_name].unique()
for i in s:
print(str(i))
回答by YS-L
It can be written more concisely like this:
可以更简洁地写成这样:
for col in df:
print(df[col].unique())
Generally, you can access a column of the DataFrame through indexingusing the []
operator (e.g. df['col']
), or through attribute(e.g. df.col
).
通常,您可以通过使用运算符(例如)或属性(例如)进行索引来访问 DataFrame 的列。[]
df['col']
df.col
Attribute accessing makes the code a bit more concise when the target column name is known beforehand, but has several caveats -- for example, it does not work when the column name is not a valid Python identifier (e.g. df.123
), or clashes with the built-in DataFrame attribute (e.g. df.index
). On the other hand, the []
notation should always work.
当预先知道目标列名时,属性访问使代码更加简洁,但有几个警告——例如,当列名不是有效的 Python 标识符(例如df.123
)时,它不起作用,或者与构建的冲突-in DataFrame 属性(例如df.index
)。另一方面,[]
符号应该始终有效。
回答by A.Kot
If you're trying to create multiple separate dataframes as mentioned in your comments, create a dictionary of dataframes:
如果您尝试创建评论中提到的多个单独的数据框,请创建一个数据框字典:
df_dict = dict(zip([i for i in df.columns] , [pd.DataFrame(df[i].unique(), columns=[i]) for i in df.columns]))
Then you can access any dataframe easily using the name of the column:
然后您可以使用列的名称轻松访问任何数据框:
df_dict[column name]
回答by bhavin
cu = []
i = []
for cn in card.columns[:7]:
cu.append(card[cn].unique())
i.append(cn)
pd.DataFrame( cu, index=i).T
回答by mgoldwasser
We can make this even more concise:
我们可以让这更简洁:
df.describe(include='all').loc['unique', :]
Pandas describe gives a few key statistics about each column, but we can just grab the 'unique' statistic and leave it at that.
Pandas describe 给出了关于每列的一些关键统计数据,但我们可以只获取“唯一”统计数据并保留它。
Note that this will give a unique count of NaN
for numeric columns - if you want to include those columns as well, you can do something like this:
请注意,这将为NaN
数字列提供唯一计数- 如果您还想包括这些列,您可以执行以下操作:
df.astype('object').describe(include='all').loc['unique', :]
回答by Sam Shanmukh
Or in short it can be written as:
或者简而言之,它可以写成:
for val in df['column_name'].unique():
print(val)
回答by Simon Lo
The code below could provide you a list of unique values for each field, I find it very useful when you want to take a deeper look at the data frame:
下面的代码可以为您提供每个字段的唯一值列表,当您想更深入地查看数据框时,我发现它非常有用:
for col in list(df):
print(col)
print(df[col].unique())
You can also sort the unique values if you want them to be sorted:
如果您希望对唯一值进行排序,您还可以对它们进行排序:
import numpy as np
for col in list(df):
print(col)
print(np.sort(df[col].unique()))