pandas 如何从数据帧创建键:列名和值的字典:python 列中的唯一值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/44105375/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:39:37  来源:igfitidea点击:

How to create a dictionary of key : column_name and value : unique values in column in python from a dataframe

pythonlistpandasdictionary

提问by Shuvayan Das

I am trying to create a dictionary of key:value pairs where key is the column name of a dataframe and value will be a list containing all the unique values in that column.Ultimately I want to be able to filter out the key_value pairs from the dict based on conditions. This is what I have been able to do so far:

我正在尝试创建一个 key:value 对字典,其中 key 是数据框的列名,value 将是一个包含该列中所有唯一值的列表。最终我希望能够从dict 基于条件。到目前为止,这是我能够做的:

for col in col_list[1:]:
    _list = []
    _list.append(footwear_data[col].unique())
    list_name = ''.join([str(col),'_list'])

product_list = ['shoe','footwear']
color_list = []
size_list = []

Here product,color,size are all column names and the dict keys should be named accordingly like color_list etc. Ultimately I will need to access each key:value_list in the dictionary. Expected output:

这里产品、颜色、大小都是列名,字典键应该相应地命名为 color_list 等。最终我需要访问字典中的每个键:value_list。预期输出:

KEY              VALUE
color_list :    ["red","blue","black"]
size_list:  ["9","XL","32","10 inches"]

Can someone please help me regarding this?A snapshot of the data is attached.data_frame

有人可以帮我解决这个问题吗?附上数据的快照。数据帧

采纳答案by Chiheb Nexus

With a DataFramelike this:

有了DataFrame这样的:

import pandas as pd
df = pd.DataFrame([["Women", "Slip on", 7, "Black", "Clarks"], ["Women", "Slip on", 8, "Brown", "Clarcks"], ["Women", "Slip on", 7, "Blue", "Clarks"]], columns= ["Category", "Sub Category", "Size", "Color", "Brand"])

print(df)

Output:

输出:

  Category Sub Category  Size  Color    Brand
0    Women      Slip on     7  Black   Clarks
1    Women      Slip on     8  Brown  Clarcks
2    Women      Slip on     7   Blue   Clarks

You can convert your DataFrame into dict and create your new dict when mapping the the columns of the DataFrame, like this example:

您可以将 DataFrame 转换为 dict 并在映射 DataFrame 的列时创建新的 dict,如下例所示:

new_dict = {"color_list": list(df["Color"]), "size_list": list(df["Size"])}
# OR:
#new_dict = {"color_list": [k for k in df["Color"]], "size_list": [k for k in df["Size"]]}

print(new_dict)

Output:

输出:

{'color_list': ['Black', 'Brown', 'Blue'], 'size_list': [7, 8, 7]}

In order to have a unique values, you can use setlike this example:

为了有一个唯一的值,你可以set像这个例子一样使用:

new_dict = {"color_list": list(set(df["Color"])), "size_list": list(set(df["Size"]))}
print(new_dict)

Output:

输出:

{'color_list': ['Brown', 'Blue', 'Black'], 'size_list': [8, 7]}

Or, like what @Ami Tavory said in his answer, in order to have the whole unique keys and values from your DataFrame, you can simply do this:

或者,就像@Ami Tavory 在他的回答中所说的那样,为了从您的 DataFrame 中获得整个唯一的键和值,您可以简单地执行以下操作:

new_dict = {k:list(df[k].unique()) for k in df.columns}
print(new_dict)

Output:

输出:

{'Brand': ['Clarks', 'Clarcks'],
 'Category': ['Women'],
 'Color': ['Black', 'Brown', 'Blue'],
 'Size': [7, 8],
 'Sub Category': ['Slip on']}

回答by Ami Tavory

I am trying to create a dictionary of key:value pairs where key is the column name of a dataframe and value will be a list containing all the unique values in that column.

我正在尝试创建一个 key:value 对的字典,其中 key 是数据框的列名,value 将是一个包含该列中所有唯一值的列表。

You could use a simple dictionary comprehensionfor that.

你可以使用一个简单的字典理解来做到这一点。

Say you start with

说你开始

import pandas as pd

df = pd.DataFrame({'a': [1, 2, 1], 'b': [1, 4, 5]})

Then the following comprehension solves it:

那么下面的理解就解决了:

>>> {c: list(df[c].unique()) for c in df.columns}
{'a': [1, 2], 'b': [1, 4, 5]}

回答by arnold

If I understand your question correctly, you may need setinstead of list. Probably at this piece of code, you might add setto get the unique values of the given list.

如果我正确理解您的问题,您可能需要set而不是列表。可能在这段代码中,您可能会添加set以获取给定列表的唯一值。

for col in col_list[1:]:
    _list = []
    _list.append(footwear_data[col].unique())
    list_name = ''.join([str(col),'_list'])
    list_name = set(list_name)

Sample of usage

使用示例

>>> a_list = [7, 8, 7, 9, 10, 9]
>>> set(a_list)
    {8, 9, 10, 7}

回答by Waqar

Here how i did it let me know if it helps

在这里我是怎么做的让我知道它是否有帮助

import pandas as pd

df = pd.read_csv("/path/to/csv/file")

colList = list(df)
dic = {}
for x in colList:
    _list = []
    _list.append(list(set(list(df[x]))))
    list_name = ''.join([str(x), '_list'])
    dic[str(x)+"_list"] = _list


print dic

Output:

输出:

{'Color_list': [['Blue', 'Orange', 'Black', 'Red']], 'Size_list': [['9', '8', '10 inches', 'XL', '7']], 'Brand_list': [['Clarks']], 'Sub_list': [['SO', 'FOR']], 'Category_list': [['M', 'W']]}

MyCsv File

MyCsv 文件

Category,Sub,Size,Color,Brand
W,SO,7,Blue,Clarks
W,SO,7,Blue,Clarks
W,SO,7,Black,Clarks
W,SO,8,Orange,Clarks
W,FOR,8,Red,Clarks
M,FOR,9,Black,Clarks
M,FOR,10 inches,Blue,Clarks
M,FOR,XL,Blue,Clarks