Python 在 Pandas 数据框中的每一列中打印唯一值

Question

提问by yoshiserry

I have a dataframe (df) and want to print the unique values from each column in the dataframe.

我有一个数据框 (df)，想打印数据框中每一列的唯一值。

I need to substitute the variable (i) [column name] into the print statement

我需要将变量 (i) [列名] 替换到打印语句中

column_list = df.columns.values.tolist()
for column_name in column_list:
    print(df."[column_name]".unique()

Update

更新

When I use this: I get "Unexpected EOF Parsing"with no extra details.

当我使用它时：我得到“意外的 EOF 解析”，没有额外的细节。

column_list = sorted_data.columns.values.tolist()
for column_name in column_list:
      print(sorted_data[column_name].unique()

What is the difference between your syntax YS-L (above) and the below:

你的语法 YS-L（上面）和下面的有什么区别：

for column_name in sorted_data:
      print(column_name)
      s = sorted_data[column_name].unique()
      for i in s:
        print(str(i))

Answer 1

回答by YS-L

It can be written more concisely like this:

可以更简洁地写成这样：

for col in df:
    print(df[col].unique())

Generally, you can access a column of the DataFrame through indexingusing the []operator (e.g. df['col']), or through attribute(e.g. df.col).

通常，您可以通过使用运算符（例如）或属性（例如）进行索引来访问 DataFrame 的列。[]df['col']df.col

Attribute accessing makes the code a bit more concise when the target column name is known beforehand, but has several caveats -- for example, it does not work when the column name is not a valid Python identifier (e.g. df.123), or clashes with the built-in DataFrame attribute (e.g. df.index). On the other hand, the []notation should always work.

当预先知道目标列名时，属性访问使代码更加简洁，但有几个警告——例如，当列名不是有效的 Python 标识符（例如df.123）时，它不起作用，或者与构建的冲突-in DataFrame 属性（例如df.index）。另一方面，[]符号应该始终有效。

Answer 2

回答by A.Kot

If you're trying to create multiple separate dataframes as mentioned in your comments, create a dictionary of dataframes:

如果您尝试创建评论中提到的多个单独的数据框，请创建一个数据框字典：

df_dict = dict(zip([i for i in df.columns] , [pd.DataFrame(df[i].unique(), columns=[i]) for i in df.columns]))

Then you can access any dataframe easily using the name of the column:

然后您可以使用列的名称轻松访问任何数据框：

df_dict[column name]

Answer 3

回答by bhavin

cu = []
i = []
for cn in card.columns[:7]:
    cu.append(card[cn].unique())
    i.append(cn)

pd.DataFrame( cu, index=i).T

Answer 4

回答by mgoldwasser

We can make this even more concise:

我们可以让这更简洁：

df.describe(include='all').loc['unique', :]

Pandas describe gives a few key statistics about each column, but we can just grab the 'unique' statistic and leave it at that.

Pandas describe 给出了关于每列的一些关键统计数据，但我们可以只获取“唯一”统计数据并保留它。

Note that this will give a unique count of NaNfor numeric columns - if you want to include those columns as well, you can do something like this:

请注意，这将为NaN数字列提供唯一计数- 如果您还想包括这些列，您可以执行以下操作：

df.astype('object').describe(include='all').loc['unique', :]

Answer 5

回答by Sam Shanmukh

Or in short it can be written as:

或者简而言之，它可以写成：

for val in df['column_name'].unique():
    print(val)

Answer 6

回答by Simon Lo

The code below could provide you a list of unique values for each field, I find it very useful when you want to take a deeper look at the data frame:

下面的代码可以为您提供每个字段的唯一值列表，当您想更深入地查看数据框时，我发现它非常有用：

for col in list(df):
    print(col)
    print(df[col].unique())

You can also sort the unique values if you want them to be sorted:

如果您希望对唯一值进行排序，您还可以对它们进行排序：

import numpy as np
for col in list(df):
    print(col)
    print(np.sort(df[col].unique()))

Python 在 Pandas 数据框中的每一列中打印唯一值

提问by yoshiserry

回答by YS-L

回答by A.Kot

回答by bhavin

回答by mgoldwasser

回答by Sam Shanmukh

回答by Simon Lo

相关推荐

最近更新

标签

Python 在 Pandas 数据框中的每一列中打印唯一值

提问by yoshiserry

回答by YS-L

回答by A.Kot

回答by bhavin

回答by mgoldwasser

回答by Sam Shanmukh

回答by Simon Lo

相关推荐

Python BeautifulSoup webscraping find_all()：找到完全匹配

如何在Python中的二维数组中找到值的索引？

如何使用 Tkinter python 2.7 将背景图像设置为窗口

Python 如何迭代参数

相关推荐

最近更新

标签