Python 使用 Pandas 将整个数据帧从小写转换为大写

Question

提问by Federico Gentile

I have a dataframe like the one displayed below:

我有一个如下所示的数据框：

# Create an example dataframe about a fictional army
raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks'],
            'company': ['1st', '1st', '2nd', '2nd'],
            'deaths': ['kkk', 52, '25', 616],
            'battles': [5, '42', 2, 2],
            'size': ['l', 'll', 'l', 'm']}
df = pd.DataFrame(raw_data, columns = ['regiment', 'company', 'deaths', 'battles', 'size'])

My goal is to transform every single string inside of the dataframe to upper case so that it looks like this:

我的目标是将数据帧内的每个字符串都转换为大写，使其看起来像这样：

Notice: all data types are objects and must not be changed; the output must contain all objects. I want to avoid to convert every single column one by one... I would like to do it generally over the whole dataframe possibly.

注意：所有数据类型都是对象，不得更改；输出必须包含所有对象。我想避免一一转换每一列......我想通常在整个数据帧上进行。

What I tried so far is to do this but without success

到目前为止我尝试过的是这样做但没有成功

df.str.upper()

Answer 1

回答by Nehal J Wani

astype()will cast each series to the dtypeobject (string) and then call the str()method on the converted series to get the string literally and call the function upper()on it. Note that after this, the dtype of all columns changes to object.

astype()会将每个系列转换为dtype对象（字符串），然后在转换后的系列上调用str()方法以逐字获取字符串并对其调用函数upper()。请注意，在此之后，所有列的 dtype 都会更改为 object。

In [17]: df
Out[17]: 
     regiment company deaths battles size
0  Nighthawks     1st    kkk       5    l
1  Nighthawks     1st     52      42   ll
2  Nighthawks     2nd     25       2    l
3  Nighthawks     2nd    616       2    m

In [18]: df.apply(lambda x: x.astype(str).str.upper())
Out[18]: 
     regiment company deaths battles size
0  NIGHTHAWKS     1ST    KKK       5    L
1  NIGHTHAWKS     1ST     52      42   LL
2  NIGHTHAWKS     2ND     25       2    L
3  NIGHTHAWKS     2ND    616       2    M

You can later convert the 'battles' column to numeric again, using to_numeric():

您可以稍后使用to_numeric()再次将 'battles' 列转换为数字：

In [42]: df2 = df.apply(lambda x: x.astype(str).str.upper())

In [43]: df2['battles'] = pd.to_numeric(df2['battles'])

In [44]: df2
Out[44]: 
     regiment company deaths  battles size
0  NIGHTHAWKS     1ST    KKK        5    L
1  NIGHTHAWKS     1ST     52       42   LL
2  NIGHTHAWKS     2ND     25        2    L
3  NIGHTHAWKS     2ND    616        2    M

In [45]: df2.dtypes
Out[45]: 
regiment    object
company     object
deaths      object
battles      int64
size        object
dtype: object

Answer 2

回答by VincentQT

this can be solved by the following applymap operation:

这可以通过以下 applymap 操作解决：

df = df.applymap(lambda s:s.lower() if type(s) == str else s)

Answer 3

回答by IanS

Since stronly works for series, you can apply it to each column individually then concatenate:

由于str仅适用于系列，您可以将其单独应用于每一列，然后连接：

In [6]: pd.concat([df[col].astype(str).str.upper() for col in df.columns], axis=1)
Out[6]: 
     regiment company deaths battles size
0  NIGHTHAWKS     1ST    KKK       5    L
1  NIGHTHAWKS     1ST     52      42   LL
2  NIGHTHAWKS     2ND     25       2    L
3  NIGHTHAWKS     2ND    616       2    M

Edit: performance comparison

编辑：性能比较

In [10]: %timeit df.apply(lambda x: x.astype(str).str.upper())
100 loops, best of 3: 3.32 ms per loop

In [11]: %timeit pd.concat([df[col].astype(str).str.upper() for col in df.columns], axis=1)
100 loops, best of 3: 3.32 ms per loop

Both answers perform equally on a small dataframe.

两个答案在小数据帧上的表现相同。

In [15]: df = pd.concat(10000 * [df])

In [16]: %timeit pd.concat([df[col].astype(str).str.upper() for col in df.columns], axis=1)
10 loops, best of 3: 104 ms per loop

In [17]: %timeit df.apply(lambda x: x.astype(str).str.upper())
10 loops, best of 3: 130 ms per loop

On a large dataframe my answer is slightly faster.

在大型数据帧上，我的答案略快。

Answer 4

回答by Ganesh Bhat

try this

尝试这个

df2 = df2.apply(lambda x: x.str.upper() if x.dtype == "object" else x)

Answer 5

回答by Alex Montoya

if you want to conserv de dtype use is isinstance(obj,type)

如果你想保存 de dtype 使用是 isinstance(obj,type)

df.apply(lambda x: x.str.upper().str.strip() if isinstance(x, object) else x)

Answer 6

回答by Shaz

Loops are very slow instead of using apply function to each and cell in a row, try to get columns names in a list and then loop over list of columns to convert each column text to lowercase.

循环非常慢，而不是对一行中的每个和单元格使用应用函数，尝试获取列表中的列名称，然后遍历列列表以将每列文本转换为小写。

Code below is the vector operation which is faster than apply function.

下面的代码是比应用函数更快的向量操作。

for columns in dataset.columns:
    dataset[columns] = dataset[columns].str.lower()

Python 使用 Pandas 将整个数据帧从小写转换为大写

提问by Federico Gentile

回答by Nehal J Wani

回答by VincentQT

回答by IanS

回答by Ganesh Bhat

回答by Alex Montoya

回答by Shaz

相关推荐

最近更新

标签

Python 使用 Pandas 将整个数据帧从小写转换为大写

提问by Federico Gentile

回答by Nehal J Wani

回答by VincentQT

回答by IanS

回答by Ganesh Bhat

回答by Alex Montoya

回答by Shaz

相关推荐

Python Gunicorn，没有名为“myproject”的模块

Python ValueError：无法将大小为 30470400 的数组重塑为形状 (50,1104,104)

如何在 Windows 命令提示符下运行 python 文件？

Python 使用 scapy 读取 PCAP 文件

相关推荐

最近更新

标签