Python 在 Pandas 数据框中的每一列中打印唯一值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/27241253/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 01:33:25  来源:igfitidea点击:

print the unique values in every column in a pandas dataframe

pythonfor-looppandas

提问by yoshiserry

I have a dataframe (df) and want to print the unique values from each column in the dataframe.

我有一个数据框 (df),想打印数据框中每一列的唯一值。

I need to substitute the variable (i) [column name] into the print statement

我需要将变量 (i) [列名] 替换到打印语句中

column_list = df.columns.values.tolist()
for column_name in column_list:
    print(df."[column_name]".unique()

Update

更新

When I use this: I get "Unexpected EOF Parsing"with no extra details.

当我使用它时:我得到“意外的 EOF 解析”,没有额外的细节。

column_list = sorted_data.columns.values.tolist()
for column_name in column_list:
      print(sorted_data[column_name].unique()

What is the difference between your syntax YS-L (above) and the below:

你的语法 YS-L(上面)和下面的有什么区别:

for column_name in sorted_data:
      print(column_name)
      s = sorted_data[column_name].unique()
      for i in s:
        print(str(i))

回答by YS-L

It can be written more concisely like this:

可以更简洁地写成这样:

for col in df:
    print(df[col].unique())

Generally, you can access a column of the DataFrame through indexingusing the []operator (e.g. df['col']), or through attribute(e.g. df.col).

通常,您可以通过使用运算符(例如)或属性(例如)进行索引来访问 DataFrame 的列。[]df['col']df.col

Attribute accessing makes the code a bit more concise when the target column name is known beforehand, but has several caveats -- for example, it does not work when the column name is not a valid Python identifier (e.g. df.123), or clashes with the built-in DataFrame attribute (e.g. df.index). On the other hand, the []notation should always work.

当预先知道目标列名时,属性访问使代码更加简洁,但有几个警告——例如,当列名不是有效的 Python 标识符(例如df.123)时,它不起作用,或者与构建的冲突-in DataFrame 属性(例如df.index)。另一方面,[]符号应该始终有效。

回答by A.Kot

If you're trying to create multiple separate dataframes as mentioned in your comments, create a dictionary of dataframes:

如果您尝试创建评论中提到的多个单独的数据框,请创建一个数据框字典:

df_dict = dict(zip([i for i in df.columns] , [pd.DataFrame(df[i].unique(), columns=[i]) for i in df.columns]))

Then you can access any dataframe easily using the name of the column:

然后您可以使用列的名称轻松访问任何数据框:

df_dict[column name]

回答by bhavin

cu = []
i = []
for cn in card.columns[:7]:
    cu.append(card[cn].unique())
    i.append(cn)

pd.DataFrame( cu, index=i).T

回答by mgoldwasser

We can make this even more concise:

我们可以让这更简洁:

df.describe(include='all').loc['unique', :]

Pandas describe gives a few key statistics about each column, but we can just grab the 'unique' statistic and leave it at that.

Pandas describe 给出了关于每列的一些关键统计数据,但我们可以只获取“唯一”统计数据并保留它。

Note that this will give a unique count of NaNfor numeric columns - if you want to include those columns as well, you can do something like this:

请注意,这将为NaN数字列提供唯一计数- 如果您还想包括这些列,您可以执行以下操作:

df.astype('object').describe(include='all').loc['unique', :]

回答by Sam Shanmukh

Or in short it can be written as:

或者简而言之,它可以写成:

for val in df['column_name'].unique():
    print(val)

回答by Simon Lo

The code below could provide you a list of unique values for each field, I find it very useful when you want to take a deeper look at the data frame:

下面的代码可以为您提供每个字段的唯一值列表,当您想更深入地查看数据框时,我发现它非常有用:

for col in list(df):
    print(col)
    print(df[col].unique())

You can also sort the unique values if you want them to be sorted:

如果您希望对唯一值进行排序,您还可以对它们进行排序:

import numpy as np
for col in list(df):
    print(col)
    print(np.sort(df[col].unique()))