Python 使用 Pandas 将整个数据帧从小写转换为大写

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39512002/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 22:21:57  来源:igfitidea点击:

Convert whole dataframe from lower case to upper case with Pandas

pythonpandastype-conversionuppercaselowercase

提问by Federico Gentile

I have a dataframe like the one displayed below:

我有一个如下所示的数据框:

# Create an example dataframe about a fictional army
raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks'],
            'company': ['1st', '1st', '2nd', '2nd'],
            'deaths': ['kkk', 52, '25', 616],
            'battles': [5, '42', 2, 2],
            'size': ['l', 'll', 'l', 'm']}
df = pd.DataFrame(raw_data, columns = ['regiment', 'company', 'deaths', 'battles', 'size'])

enter image description here

在此处输入图片说明

My goal is to transform every single string inside of the dataframe to upper case so that it looks like this:

我的目标是将数据帧内的每个字符串都转换为大写,使其看起来像这样:

enter image description here

在此处输入图片说明

Notice: all data types are objects and must not be changed; the output must contain all objects. I want to avoid to convert every single column one by one... I would like to do it generally over the whole dataframe possibly.

注意:所有数据类型都是对象,不得更改;输出必须包含所有对象。我想避免一一转换每一列......我想通常在整个数据帧上进行。

What I tried so far is to do this but without success

到目前为止我尝试过的是这样做但没有成功

df.str.upper()

回答by Nehal J Wani

astype()will cast each series to the dtypeobject (string) and then call the str()method on the converted series to get the string literally and call the function upper()on it. Note that after this, the dtype of all columns changes to object.

astype()会将每个系列转换为dtype对象(字符串),然后在转换后的系列上调用str()方法以逐字获取字符串并对其调用函数upper()。请注意,在此之后,所有列的 dtype 都会更改为 object。

In [17]: df
Out[17]: 
     regiment company deaths battles size
0  Nighthawks     1st    kkk       5    l
1  Nighthawks     1st     52      42   ll
2  Nighthawks     2nd     25       2    l
3  Nighthawks     2nd    616       2    m

In [18]: df.apply(lambda x: x.astype(str).str.upper())
Out[18]: 
     regiment company deaths battles size
0  NIGHTHAWKS     1ST    KKK       5    L
1  NIGHTHAWKS     1ST     52      42   LL
2  NIGHTHAWKS     2ND     25       2    L
3  NIGHTHAWKS     2ND    616       2    M

You can later convert the 'battles' column to numeric again, using to_numeric():

您可以稍后使用to_numeric()再次将 'battles' 列转换为数字:

In [42]: df2 = df.apply(lambda x: x.astype(str).str.upper())

In [43]: df2['battles'] = pd.to_numeric(df2['battles'])

In [44]: df2
Out[44]: 
     regiment company deaths  battles size
0  NIGHTHAWKS     1ST    KKK        5    L
1  NIGHTHAWKS     1ST     52       42   LL
2  NIGHTHAWKS     2ND     25        2    L
3  NIGHTHAWKS     2ND    616        2    M

In [45]: df2.dtypes
Out[45]: 
regiment    object
company     object
deaths      object
battles      int64
size        object
dtype: object

回答by VincentQT

this can be solved by the following applymap operation:

这可以通过以下 applymap 操作解决:

df = df.applymap(lambda s:s.lower() if type(s) == str else s)

回答by IanS

Since stronly works for series, you can apply it to each column individually then concatenate:

由于str仅适用于系列,您可以将其单独应用于每一列,然后连接:

In [6]: pd.concat([df[col].astype(str).str.upper() for col in df.columns], axis=1)
Out[6]: 
     regiment company deaths battles size
0  NIGHTHAWKS     1ST    KKK       5    L
1  NIGHTHAWKS     1ST     52      42   LL
2  NIGHTHAWKS     2ND     25       2    L
3  NIGHTHAWKS     2ND    616       2    M


Edit: performance comparison

编辑:性能比较

In [10]: %timeit df.apply(lambda x: x.astype(str).str.upper())
100 loops, best of 3: 3.32 ms per loop

In [11]: %timeit pd.concat([df[col].astype(str).str.upper() for col in df.columns], axis=1)
100 loops, best of 3: 3.32 ms per loop

Both answers perform equally on a small dataframe.

两个答案在小数据帧上的表现相同。

In [15]: df = pd.concat(10000 * [df])

In [16]: %timeit pd.concat([df[col].astype(str).str.upper() for col in df.columns], axis=1)
10 loops, best of 3: 104 ms per loop

In [17]: %timeit df.apply(lambda x: x.astype(str).str.upper())
10 loops, best of 3: 130 ms per loop

On a large dataframe my answer is slightly faster.

在大型数据帧上,我的答案略快。

回答by Ganesh Bhat

try this

尝试这个

df2 = df2.apply(lambda x: x.str.upper() if x.dtype == "object" else x)  

回答by Alex Montoya

if you want to conserv de dtype use is isinstance(obj,type)

如果你想保存 de dtype 使用是 isinstance(obj,type)

df.apply(lambda x: x.str.upper().str.strip() if isinstance(x, object) else x)

回答by Shaz

Loops are very slow instead of using apply function to each and cell in a row, try to get columns names in a list and then loop over list of columns to convert each column text to lowercase.

循环非常慢,而不是对一行中的每个和单元格使用应用函数,尝试获取列表中的列名称,然后遍历列列表以将每列文本转换为小写。

Code below is the vector operation which is faster than apply function.

下面的代码是比应用函数更快的向量操作。

for columns in dataset.columns:
    dataset[columns] = dataset[columns].str.lower()