Python ValueError:值的长度与索引的长度不匹配 | Pandas DataFrame.unique()

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/42382263/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 21:38:06  来源:igfitidea点击:

ValueError: Length of values does not match length of index | Pandas DataFrame.unique()

pythonpandasdataframe

提问by Mayeul sgc

I am trying to get a new dataset, or change the value of the current dataset columns to their unique values. Here is an example of what I am trying to get :

我正在尝试获取新数据集,或将当前数据集列的值更改为其唯一值。这是我想要得到的一个例子:

   A B
 -----
0| 1 1
1| 2 5
2| 1 5
3| 7 9
4| 7 9
5| 8 9

Wanted Result    Not Wanted Result
       A B            A B
     -----          -----
    0| 1 1         0| 1 1
    1| 2 5         1| 2 5
    2| 7 9         2| 
    3| 8           3| 7 9
                   4|
                   5| 8

I don't really care about the index but it seems to be the problem. My code so far is pretty simple, I tried 2 approaches, 1 with a new dataFrame and one without.

我并不真正关心索引,但这似乎是问题所在。到目前为止,我的代码非常简单,我尝试了 2 种方法,1 种使用新数据帧,另一种不使用。

#With New DataFrame
 def UniqueResults(dataframe):
    df = pd.DataFrame()
    for col in dataframe:
        S=pd.Series(dataframe[col].unique())
        df[col]=S.values
    return df

#Without new DataFrame
def UniqueResults(dataframe):
    for col in dataframe:
        dataframe[col]=dataframe[col].unique()
    return dataframe

I have the error "Length of Values does not match length of index" both times.

我两次都有错误“值的长度与索引的长度不匹配”。

回答by Psidom

The error comes up when you are trying to assign a list of numpy array of different length to a data frame, and it can be reproduced as follows:

当您尝试将不同长度的 numpy 数组列表分配给数据帧时会出现错误,并且可以按如下方式重现:

A data frame of four rows:

四行的数据框:

df = pd.DataFrame({'A': [1,2,3,4]})

Now trying to assign a list/array of two elements to it:

现在尝试为它分配一个包含两个元素的列表/数组:

df['B'] = [3,4]   # or df['B'] = np.array([3,4])

Both errors out:

两个错误都出来了:

ValueError: Length of values does not match length of index

ValueError:值的长度与索引的长度不匹配

Because the data frame has four rows but the list and array has only two elements.

因为数据框有四行,而列表和数组只有两个元素。

Work around Solution(use with caution): convert the list/array to a pandas Series, and then when you do assignment, missing index in the Series will be filled with NaN:

变通解决方案(谨慎使用):将列表/数组转换为熊猫系列,然后当您进行赋值时,系列中缺少的索引将填充为NaN

df['B'] = pd.Series([3,4])

df
#   A     B
#0  1   3.0
#1  2   4.0
#2  3   NaN          # NaN because the value at index 2 and 3 doesn't exist in the Series
#3  4   NaN


For your specific problem, if you don't care about the index or the correspondence of values between columns, you can reset index for each column after dropping the duplicates:

对于您的具体问题,如果您不关心索引或列之间值的对应关系,则可以在删除重复项后为每列重置索引:

df.apply(lambda col: col.drop_duplicates().reset_index(drop=True))

#   A     B
#0  1   1.0
#1  2   5.0
#2  7   9.0
#3  8   NaN