pandas 熊猫元素比较并创建选择

Question

提问by mati

In a dataframe I would like to compare the elements of a column with a value and sort the elements which pass the comparison into a new column.

在数据框中，我想将列的元素与值进行比较，并对将比较传递到新列的元素进行排序。

df = pandas.DataFrame([{'A':3,'B':10},
                       {'A':2, 'B':30},
                       {'A':1,'B':20},
                       {'A':2,'B':15},
                       {'A':2,'B':100}])

df['C'] = [x for x in df['B'] if x > 18]

I can't find out what's wrongs and why I get:

我不知道出了什么问题以及为什么我得到：

ValueError: Length of values does not match length of index

ValueError：值的长度与索引的长度不匹配

Answer 1

采纳答案by Saranya Krishnamurthy

As Darren mentioned, all columns in a DataFrameshould have same length.

正如达伦所提到的， a 中的所有列都DataFrame应该具有相同的长度。

When you try print [x for x in df['B'] if x > 18], you get only [30, 20, 100]values. But you have got five index/rows. That's the reason you get Length of values does not match length of indexerror.

当您尝试时print [x for x in df['B'] if x > 18]，您只会获得[30, 20, 100]值。但是你有五个索引/行。这就是你得到Length of values does not match length of index错误的原因。

You can change your code as follows:

您可以按如下方式更改代码：

df['C'] = [x if x > 18 else None for x in df['B']]
print df

You will get:

你会得到：

   A    B      C
0  3   10    NaN
1  2   30   30.0
2  1   20   20.0
3  2   15    NaN
4  2  100  100.0

Answer 2

回答by jezrael

I think you can use locwith boolean indexing:

我想你可以用loc用boolean indexing：

print (df)
   A    B
0  3   10
1  2   30
2  1   20
3  2   15
4  2  100

print (df['B'] > 18)
0    False
1     True
2     True
3    False
4     True
Name: B, dtype: bool

df.loc[df['B'] > 18, 'C'] = df['B']
print (df)
   A    B      C
0  3   10    NaN
1  2   30   30.0
2  1   20   20.0
3  2   15    NaN
4  2  100  100.0

If you need select by condition use boolean indexing:

如果您需要按条件选择，请使用boolean indexing：

print (df[df['B'] > 18])
   A    B
1  2   30
2  1   20
4  2  100

If need something more faster, use where:

如果需要更快的速度，请使用where：

df['C'] = df.B.where(df['B'] > 18)

Timings(len(df)=50k):

时间( len(df)=50k):

In [1367]: %timeit (a(df))
The slowest run took 8.34 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 1.14 ms per loop

In [1368]: %timeit (b(df1))
100 loops, best of 3: 15.5 ms per loop

In [1369]: %timeit (c(df2))
100 loops, best of 3: 2.93 ms per loop

Code for timings:

计时代码：

import pandas as pd

df = pd.DataFrame([{'A':3,'B':10},
                       {'A':2, 'B':30},
                       {'A':1,'B':20},
                       {'A':2,'B':15},
                       {'A':2,'B':100}])
print (df)
df = pd.concat([df]*10000).reset_index(drop=True)
df1 = df.copy()
df2 = df.copy()

def a(df):
    df['C'] = df.B.where(df['B'] > 18)
    return df

def b(df1):    
    df['C'] = ([x if x > 18 else None for x in df['B']])
    return df

def c(df2):    
    df.loc[df['B'] > 18, 'C'] = df['B']
    return df

print (a(df))
print (b(df1))
print (c(df2))

Answer 3

回答by Darren Cook

All columns in a DataFramehave to be the same length. Because you are filtering away some values, you are trying to insert fewer values into column C than are in columns A and B.

a 中的所有列DataFrame的长度必须相同。因为您要过滤掉一些值，所以您尝试在 C 列中插入的值少于在 A 和 B 列中插入的值。

So, your two options are to start a new DataFrame for C:

因此，您的两个选择是为启动一个新的 DataFrame C：

dfC = [x for x in df['B'] if x > 18]

or but some dummy value in the column for when x is not 18+. E.g.:

或者当 x 不是 18+ 时列中的一些虚拟值。例如：

df['C'] = np.where(df['B'] > 18, True, False)

Or even:

甚至：

df['C'] = np.where(df['B'] > 18, 'Yay', 'Nay')

P.S. Also take a look at: Pandas conditional creation of a series/dataframe columnfor other ways to do this.

PS 另请参阅：Pandas conditional creation of a series/dataframe column以使用其他方法来执行此操作。

pandas 熊猫元素比较并创建选择

提问by mati

采纳答案by Saranya Krishnamurthy

回答by jezrael

回答by Darren Cook

相关推荐

最近更新

标签

pandas 熊猫元素比较并创建选择

提问by mati

采纳答案by Saranya Krishnamurthy

回答by jezrael

回答by Darren Cook

相关推荐

无法将字符串转换为浮点数 - Pandas 读取列

pandas.core.config.OptionError: "没有这样的键：'display.unicode.east_asian_width'"

pandas 熊猫替换列子集的空值

pandas 同一图上的 Python 并排箱线图

相关推荐

最近更新

标签