Pandas 数据框:截断字符串字段
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/42752914/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas dataframe: truncate string fields
提问by Botond
I have a dataframe and would like to truncate each field to up to 20 characters. I've been naively trying the following:
我有一个数据框,想将每个字段截断为最多 20 个字符。我一直在天真地尝试以下操作:
df = df.astype(str).apply(lambda x: x[:20])
however it has no effect whatsoever. If, however, I wanted to add an 'Y' to each field, this works like a charm:
但是它没有任何效果。但是,如果我想为每个字段添加一个“Y”,这就像一个魅力:
df = df.astype(str).apply(lambda x: x+'Y')
What am I doing wrong?
我究竟做错了什么?
回答by smile-on
Simple one liner to trim long string field in Pandas DataFrame:
在 Pandas DataFrame 中修剪长字符串字段的简单一行:
df['short_str'] = df['long_str'].str.slice(0,3)
回答by MaxU
you can use .str.slice()method:
你可以使用.str.slice()方法:
Demo:
演示:
In [177]: df = pd.DataFrame({
...: 'a': pd.util.testing.rands_array(30, 10),
...: 'b': pd.util.testing.rands_array(30, 10),
...: })
...:
In [178]: df
Out[178]:
a b
0 Mlf6nOsC8S6vv8OxW5ZOWifg3EoqAb XSGLdkaewwZlNeZ4uTTivi2nMQFc6S
1 0E4XCBaYFBTSalUMPGpXmke6dQGbkW KlHuVhbNgQL9HLHYQq3fEdqEIciOhX
2 URODJeLA0uLvcKBEXPyrmnnNU40MDl NaY8LURHjgmT1pRrDnbPAeLZq3ANaL
3 OYA1ahlwVtEVnDOAkZgxNkbvZ7W8Rf mIzkeLhM7SqYH17vGDzL6DJjSYftGs
4 uFC1shE02UfxS0VhDASmF8vh9XxFYX fQOxjDjFehTNT27seOtCAAPW0as9Up
5 Ja33vQym6L0Ko2Kcf8cg7OMBKMitg5 iGdCvYTyZlR23NeeTAjG1PoL8mWm3j
6 iNZdXaVpB4zXClxTLt738DY7i6xs6p q9VKg5fZdItmUpZiQrR6XW5WHmd33l
7 WWnViRRMPkbXNQOHeqGmzETDpGPRl9 t3I8Ve3ybCJcXajF8pydnwNZQWslTN
8 5oMFy2PBe1zUIE3XdraMwlrd5MKcx2 gSLtgXJwiS1HugLORXherFT4l1k5QV
9 weV8BlyJrtRbWpSCxSbj8cSyZxusFR ylLWort9o8mHWQQ3JB1Twb0xRbLhot
In [179]: df.apply(lambda x: x.str.slice(0, 20))
Out[179]:
a b
0 Mlf6nOsC8S6vv8OxW5ZO XSGLdkaewwZlNeZ4uTTi
1 0E4XCBaYFBTSalUMPGpX KlHuVhbNgQL9HLHYQq3f
2 URODJeLA0uLvcKBEXPyr NaY8LURHjgmT1pRrDnbP
3 OYA1ahlwVtEVnDOAkZgx mIzkeLhM7SqYH17vGDzL
4 uFC1shE02UfxS0VhDASm fQOxjDjFehTNT27seOtC
5 Ja33vQym6L0Ko2Kcf8cg iGdCvYTyZlR23NeeTAjG
6 iNZdXaVpB4zXClxTLt73 q9VKg5fZdItmUpZiQrR6
7 WWnViRRMPkbXNQOHeqGm t3I8Ve3ybCJcXajF8pyd
8 5oMFy2PBe1zUIE3XdraM gSLtgXJwiS1HugLORXhe
9 weV8BlyJrtRbWpSCxSbj ylLWort9o8mHWQQ3JB1T
回答by jezrael
I think need str
for indexing with str:
df = df.astype(str).apply(lambda x: x.str[:20])
Sample:
样本:
df = pd.DataFrame({'A':[1,2,3],
'B':[4,5,6],
'C':[7,8,9],
'D':[1,3,5],
'E':[5,3,6],
'F':[7,4,3]}) * 1000
print (df)
A B C D E F
0 1000 4000 7000 1000 5000 7000
1 2000 5000 8000 3000 3000 4000
2 3000 6000 9000 5000 6000 3000
df = df.astype(str).apply(lambda x: x.str[:2])
print (df)
A B C D E F
0 10 40 70 10 50 70
1 20 50 80 30 30 40
2 30 60 90 50 60 30
Another solution with applymap
:
另一个解决方案applymap
:
df = df.astype(str).applymap(lambda x: x[:2])
print (df)
A B C D E F
0 10 40 70 10 50 70
1 20 50 80 30 30 40
2 30 60 90 50 60 30
Problem of your solution is if x[:20]
select only first 20 rows in each column.
您的解决方案的问题是如果x[:20]
在每列中只选择前 20 行。
You can test it by custom function:
您可以通过自定义函数对其进行测试:
def f(x):
print (x)
print (x[:2])
df = df.astype(str).apply(f)
print (df)