Python 如果熊猫数据框字符串列缺少值，如何将其小写？

Question

提问by P.Escondido

The following code does not work.

以下代码不起作用。

import pandas as pd
import numpy as np
df=pd.DataFrame(['ONE','Two', np.nan],columns=['x']) 
xLower = df["x"].map(lambda x: x.lower())

How should I tweak it to get xLower = ['one','two',np.nan] ? Efficiency is important since the real data frame is huge.

我应该如何调整它以获得 xLower = ['one','two',np.nan] ？效率很重要，因为真实的数据框很大。

Answer 1

采纳答案by behzad.nouri

use pandas vectorized string methods; as in the documentation:

使用熊猫矢量化字符串方法；如文档中所示：

these methods exclude missing/NA values automatically

这些方法自动排除缺失/NA 值

.str.lower()is the very first example there;

.str.lower()是那里的第一个例子；

>>> df['x'].str.lower()
0    one
1    two
2    NaN
Name: x, dtype: object

Answer 2

回答by Wojciech Walczak

A possible solution:

一个可能的解决方案：

import pandas as pd
import numpy as np

df=pd.DataFrame(['ONE','Two', np.nan],columns=['x']) 
xLower = df["x"].map(lambda x: x if type(x)!=str else x.lower())
print (xLower)

And a result:

结果：

0    one
1    two
2    NaN
Name: x, dtype: object

Not sure about the efficiency though.

虽然不确定效率。

Answer 3

回答by Mike W

Another possible solution, in case the column has not only strings but numbers too, is to use astype(str).str.lower()or to_string(na_rep='')because otherwise, given that a number is not a string, when lowered it will return NaN, therefore:

另一个可能的解决方案，如果该列不仅有字符串而且还有数字，则使用astype(str).str.lower()或to_string(na_rep='')因为否则，鉴于数字不是字符串，降低时它将返回NaN，因此：

import pandas as pd
import numpy as np
df=pd.DataFrame(['ONE','Two', np.nan,2],columns=['x']) 
xSecureLower = df['x'].to_string(na_rep='').lower()
xLower = df['x'].str.lower()

then we have:

然后我们有：

>>> xSecureLower
0    one
1    two
2   
3      2
Name: x, dtype: object

and not

并不是

>>> xLower
0    one
1    two
2    NaN
3    NaN
Name: x, dtype: object

edit:

编辑：

if you don't want to lose the NaNs, then using map will be better, (from @wojciech-walczak, and @cs95 comment) it will look something like this

如果你不想失去 NaN，那么使用 map 会更好，（来自 @wojciech-walczak 和 @cs95 评论）它看起来像这样

xSecureLower = df['x'].map(lambda x: x.lower() if isinstance(x,str) else x)

Answer 4

回答by Ch HaXam

copy your Dataframe column and simply apply

复制您的 Dataframe 列并简单地应用

df=data['x']
newdf=df.str.lower()

Answer 5

回答by Farid

you can try this one also,

你也可以试试这个

df= df.applymap(lambda s:s.lower() if type(s) == str else s)

Answer 6

回答by deepesh

May be using List comprehension

可能正在使用列表理解

import pandas as pd
import numpy as np
df=pd.DataFrame(['ONE','Two', np.nan],columns=['Name']})
df['Name'] = [str(i).lower() for i in df['Name']] 

print(df)

Answer 7

回答by cs95

Pandas >= 0.25: Remove Case Distinctions with `str.casefold`

Pandas >= 0.25：删除大小写差异 `str.casefold`

Starting from v0.25, I recommend using the "vectorized" string method str.casefoldif you're dealing with unicode data (it works regardless of string or unicodes):

从 v0.25 开始，str.casefold如果您正在处理 unicode 数据，我建议使用“矢量化”字符串方法（无论是字符串还是 unicode，它都可以工作）：

s = pd.Series(['lower', 'CAPITALS', np.nan, 'SwApCaSe'])
s.str.casefold()

0       lower
1    capitals
2         NaN
3    swapcase
dtype: object

Also see related GitHub issue GH25405.

另请参阅相关的 GitHub 问题GH25405。

casefoldlends itself to more aggressive case-folding comparison. It also handles NaNs gracefully (just as str.lowerdoes).

casefold有助于更积极的案例折叠比较。它还可以优雅地处理 NaN（就像那样str.lower）。

But why is this better?

但为什么这更好？

The difference is seen with unicodes. Taking the example in the python str.casefolddocs,

使用 unicode 可以看到差异。以pythonstr.casefold文档中的示例为例，

Casefolding is similar to lowercasing but more aggressive because it is intended to remove all case distinctions in a string. For example, the German lowercase letter '?'is equivalent to "ss". Since it is already lowercase, lower()would do nothing to '?'; casefold()converts it to "ss".

Casefolding 类似于小写，但更具侵略性，因为它旨在消除字符串中的所有大小写区别。例如，德语小写字母'?'相当于"ss". 由于它已经是小写字母，lower()因此不会对'?'; casefold()将其转换为"ss".

Compare the output of lowerfor,

比较lowerfor的输出，

s = pd.Series(["der Flu?"])
s.str.lower()

0    der flu?
dtype: object

Versus casefold,

与casefold,

s.str.casefold()

0    der fluss
dtype: object

Also see Python: lower() vs. casefold() in string matching and converting to lowercase.

另请参阅Python：lower() 与 casefold() 中的字符串匹配和转换为小写。

Answer 8

回答by Ashutosh Shankar

Use apply function,

使用应用功能，

Xlower = df['x'].apply(lambda x: x.upper()).head(10)

Answer 9

回答by Aravinda_gn

# Apply lambda function

df['original_category'] = df['original_category'].apply(lambda x:x.lower())

Python 如果熊猫数据框字符串列缺少值，如何将其小写？

提问by P.Escondido

采纳答案by behzad.nouri

回答by Wojciech Walczak

回答by Mike W

回答by Ch HaXam

回答by Farid

回答by deepesh

回答by cs95

Pandas >= 0.25: Remove Case Distinctions with `str.casefold`

Pandas >= 0.25：删除大小写差异 `str.casefold`

But why is this better?

但为什么这更好？

回答by Ashutosh Shankar

回答by Aravinda_gn

相关推荐

最近更新

标签

Python 如果熊猫数据框字符串列缺少值，如何将其小写？

提问by P.Escondido

采纳答案by behzad.nouri

回答by Wojciech Walczak

回答by Mike W

回答by Ch HaXam

回答by Farid

回答by deepesh

回答by cs95

Pandas >= 0.25: Remove Case Distinctions with str.casefold

Pandas >= 0.25：删除大小写差异 str.casefold

But why is this better?

但为什么这更好？

回答by Ashutosh Shankar

回答by Aravinda_gn

相关推荐

Python/Pandas DataFrame 中的频率图

Python 找不到 vcvarsall.bat

Python Pandas：从多级列索引中删除一个级别？

将 LinearSVC 的决策函数转换为概率（Scikit learn python）

相关推荐

最近更新

标签

Pandas >= 0.25: Remove Case Distinctions with `str.casefold`

Pandas >= 0.25：删除大小写差异 `str.casefold`