Python ValueError：值的长度与索引的长度不匹配 | Pandas DataFrame.unique()

Question

提问by Mayeul sgc

I am trying to get a new dataset, or change the value of the current dataset columns to their unique values. Here is an example of what I am trying to get :

我正在尝试获取新数据集，或将当前数据集列的值更改为其唯一值。这是我想要得到的一个例子：

   A B
 -----
0| 1 1
1| 2 5
2| 1 5
3| 7 9
4| 7 9
5| 8 9

Wanted Result    Not Wanted Result
       A B            A B
     -----          -----
    0| 1 1         0| 1 1
    1| 2 5         1| 2 5
    2| 7 9         2| 
    3| 8           3| 7 9
                   4|
                   5| 8

I don't really care about the index but it seems to be the problem. My code so far is pretty simple, I tried 2 approaches, 1 with a new dataFrame and one without.

我并不真正关心索引，但这似乎是问题所在。到目前为止，我的代码非常简单，我尝试了 2 种方法，1 种使用新数据帧，另一种不使用。

#With New DataFrame
 def UniqueResults(dataframe):
    df = pd.DataFrame()
    for col in dataframe:
        S=pd.Series(dataframe[col].unique())
        df[col]=S.values
    return df

#Without new DataFrame
def UniqueResults(dataframe):
    for col in dataframe:
        dataframe[col]=dataframe[col].unique()
    return dataframe

I have the error "Length of Values does not match length of index" both times.

我两次都有错误“值的长度与索引的长度不匹配”。

Answer 1

回答by Psidom

The error comes up when you are trying to assign a list of numpy array of different length to a data frame, and it can be reproduced as follows:

当您尝试将不同长度的 numpy 数组列表分配给数据帧时会出现错误，并且可以按如下方式重现：

A data frame of four rows:

四行的数据框：

df = pd.DataFrame({'A': [1,2,3,4]})

Now trying to assign a list/array of two elements to it:

现在尝试为它分配一个包含两个元素的列表/数组：

df['B'] = [3,4]   # or df['B'] = np.array([3,4])

Both errors out:

两个错误都出来了：

ValueError: Length of values does not match length of index

ValueError：值的长度与索引的长度不匹配

Because the data frame has four rows but the list and array has only two elements.

因为数据框有四行，而列表和数组只有两个元素。

Work around Solution(use with caution): convert the list/array to a pandas Series, and then when you do assignment, missing index in the Series will be filled with NaN:

变通解决方案（谨慎使用）：将列表/数组转换为熊猫系列，然后当您进行赋值时，系列中缺少的索引将填充为NaN：

df['B'] = pd.Series([3,4])

df
#   A     B
#0  1   3.0
#1  2   4.0
#2  3   NaN          # NaN because the value at index 2 and 3 doesn't exist in the Series
#3  4   NaN

For your specific problem, if you don't care about the index or the correspondence of values between columns, you can reset index for each column after dropping the duplicates:

对于您的具体问题，如果您不关心索引或列之间值的对应关系，则可以在删除重复项后为每列重置索引：

df.apply(lambda col: col.drop_duplicates().reset_index(drop=True))

#   A     B
#0  1   1.0
#1  2   5.0
#2  7   9.0
#3  8   NaN

Python ValueError：值的长度与索引的长度不匹配 | Pandas DataFrame.unique()

提问by Mayeul sgc

回答by Psidom

相关推荐

最近更新

标签

Python ValueError：值的长度与索引的长度不匹配 | Pandas DataFrame.unique()

提问by Mayeul sgc

回答by Psidom

相关推荐

如何卸载 Python 和所有软件包

Python 如何删除那些“\x00\x00”

Python 替换熊猫数据框中的部分字符串

Python 将数据框写入给定路径的excel文件

相关推荐

最近更新

标签