pandas 熊猫中的条件替换
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16153530/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Conditional replacement in pandas
提问by hmelberg
I have a dataframe spanning several years and at some point they changed the codes for ethnicity. So I need to recode the values conditional on the year - which is another column in the same dataframe. For instance 1 to 3, 2 to 3, 3 to 4 and so on:
我有一个跨越几年的数据框,在某些时候他们改变了种族代码。所以我需要重新编码以年份为条件的值 - 这是同一数据框中的另一列。例如 1 到 3、2 到 3、3 到 4 等等:
old = [1, 2, 3, 4, 5, 91]
new = [3, 3, 4, 2, 1, 6]
And this is only done for the years 1996 to 2001. The values for the other years in the same column (ethnicity) must not be changed. Hoping to avoid too many inefficient loops, I tried:
并且这仅在 1996 年到 2001 年进行。同一列(种族)中其他年份的值不得更改。希望避免太多低效循环,我尝试了:
recode_years = range(1996,2002)
for year in recode_years:
df['ethnicity'][df.year==year].replace(old, new, inplace=True)
But the original values in the dataframe did not change. The replace method itself replaced and returned the new values correctly, but the inplace option seems not to affect the original dataframe when applying a conditional. This may be obvious to experienced Pandas users, but surely there must be some simple way of doing this instead of looping over every singel element?
但是数据框中的原始值没有改变。替换方法本身正确地替换并返回了新值,但在应用条件时,就地选项似乎不会影响原始数据帧。这对于有经验的 Pandas 用户来说可能是显而易见的,但肯定必须有一些简单的方法来做到这一点,而不是遍历每个单个元素?
Edit (x2): Her is an an example of another approach which also did not work ('Length of replacements must equal series length' and "TypeError: array cannot be safely cast to required type"):
编辑(x2):她是另一种方法的一个例子,它也不起作用(“替换长度必须等于系列长度”和“类型错误:数组不能安全地转换为所需类型”):
oldNewMap = {1:2, 2:3}
df2 = DataFrame({"year":[2000,2000,2000,2001,2001,2001],"ethnicity":[1,2,1,2,3,1]})
df2['ethnicity'][df2.year==2000] = df2['ethnicity'][df2.year==2000].map(oldNewMap)
Edit: It seems to be a problems specific to the installation/version since this works fine on my other computer.
编辑:这似乎是安装/版本特有的问题,因为这在我的另一台计算机上运行良好。
回答by BrenBarn
It may just be simpler to do it a different way:
以不同的方式做可能更简单:
oldNewMap = {1: 3, 2: 3, 3: 4, 4: 2, 5: 1, 91: 6}
df['ethnicity'][df.year==year] = df['ethnicity'][df.year==year].map(oldNewMap)

