Python 如果熊猫数据框字符串列缺少值,如何将其小写?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/22245171/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to lowercase a pandas dataframe string column if it has missing values?
提问by P.Escondido
The following code does not work.
以下代码不起作用。
import pandas as pd
import numpy as np
df=pd.DataFrame(['ONE','Two', np.nan],columns=['x'])
xLower = df["x"].map(lambda x: x.lower())
How should I tweak it to get xLower = ['one','two',np.nan] ? Efficiency is important since the real data frame is huge.
我应该如何调整它以获得 xLower = ['one','two',np.nan] ?效率很重要,因为真实的数据框很大。
采纳答案by behzad.nouri
use pandas vectorized string methods; as in the documentation:
使用熊猫矢量化字符串方法;如文档中所示:
these methods exclude missing/NA values automatically
这些方法自动排除缺失/NA 值
.str.lower()is the very first example there;
.str.lower()是那里的第一个例子;
>>> df['x'].str.lower()
0 one
1 two
2 NaN
Name: x, dtype: object
回答by Wojciech Walczak
A possible solution:
一个可能的解决方案:
import pandas as pd
import numpy as np
df=pd.DataFrame(['ONE','Two', np.nan],columns=['x'])
xLower = df["x"].map(lambda x: x if type(x)!=str else x.lower())
print (xLower)
And a result:
结果:
0 one
1 two
2 NaN
Name: x, dtype: object
Not sure about the efficiency though.
虽然不确定效率。
回答by Mike W
Another possible solution, in case the column has not only strings but numbers too, is to use astype(str).str.lower()or to_string(na_rep='')because otherwise, given that a number is not a string, when lowered it will return NaN, therefore:
另一个可能的解决方案,如果该列不仅有字符串而且还有数字,则使用astype(str).str.lower()或to_string(na_rep='')因为否则,鉴于数字不是字符串,降低时它将返回NaN,因此:
import pandas as pd
import numpy as np
df=pd.DataFrame(['ONE','Two', np.nan,2],columns=['x'])
xSecureLower = df['x'].to_string(na_rep='').lower()
xLower = df['x'].str.lower()
then we have:
然后我们有:
>>> xSecureLower
0 one
1 two
2
3 2
Name: x, dtype: object
and not
并不是
>>> xLower
0 one
1 two
2 NaN
3 NaN
Name: x, dtype: object
edit:
编辑:
if you don't want to lose the NaNs, then using map will be better, (from @wojciech-walczak, and @cs95 comment) it will look something like this
如果你不想失去 NaN,那么使用 map 会更好,(来自 @wojciech-walczak 和 @cs95 评论)它看起来像这样
xSecureLower = df['x'].map(lambda x: x.lower() if isinstance(x,str) else x)
回答by Ch HaXam
copy your Dataframe column and simply apply
复制您的 Dataframe 列并简单地应用
df=data['x']
newdf=df.str.lower()
回答by Farid
you can try this one also,
你也可以试试这个
df= df.applymap(lambda s:s.lower() if type(s) == str else s)
回答by deepesh
May be using List comprehension
可能正在使用列表理解
import pandas as pd
import numpy as np
df=pd.DataFrame(['ONE','Two', np.nan],columns=['Name']})
df['Name'] = [str(i).lower() for i in df['Name']]
print(df)
回答by cs95
Pandas >= 0.25: Remove Case Distinctions with str.casefold
Pandas >= 0.25:删除大小写差异 str.casefold
Starting from v0.25, I recommend using the "vectorized" string method str.casefoldif you're dealing with unicode data (it works regardless of string or unicodes):
从 v0.25 开始,str.casefold如果您正在处理 unicode 数据,我建议使用“矢量化”字符串方法(无论是字符串还是 unicode,它都可以工作):
s = pd.Series(['lower', 'CAPITALS', np.nan, 'SwApCaSe'])
s.str.casefold()
0 lower
1 capitals
2 NaN
3 swapcase
dtype: object
Also see related GitHub issue GH25405.
另请参阅相关的 GitHub 问题GH25405。
casefoldlends itself to more aggressive case-folding comparison. It also handles NaNs gracefully (just as str.lowerdoes).
casefold有助于更积极的案例折叠比较。它还可以优雅地处理 NaN(就像那样str.lower)。
But why is this better?
但为什么这更好?
The difference is seen with unicodes. Taking the example in the python str.casefolddocs,
使用 unicode 可以看到差异。以pythonstr.casefold文档中的示例为例,
Casefolding is similar to lowercasing but more aggressive because it is intended to remove all case distinctions in a string. For example, the German lowercase letter
'?'is equivalent to"ss". Since it is already lowercase,lower()would do nothing to'?';casefold()converts it to"ss".
Casefolding 类似于小写,但更具侵略性,因为它旨在消除字符串中的所有大小写区别。例如,德语小写字母
'?'相当于"ss". 由于它已经是小写字母,lower()因此不会对'?';casefold()将其转换为"ss".
Compare the output of lowerfor,
比较lowerfor的输出,
s = pd.Series(["der Flu?"])
s.str.lower()
0 der flu?
dtype: object
Versus casefold,
与casefold,
s.str.casefold()
0 der fluss
dtype: object
Also see Python: lower() vs. casefold() in string matching and converting to lowercase.
回答by Ashutosh Shankar
Use apply function,
使用应用功能,
Xlower = df['x'].apply(lambda x: x.upper()).head(10)
回答by Aravinda_gn
# Apply lambda function
df['original_category'] = df['original_category'].apply(lambda x:x.lower())

