Pandas DataFrames:如何在没有空格的情况下包装文本
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34376896/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas DataFrames: how to wrap text with no whitespace
提问by user1956609
I'm viewing a Pandas DataFrame in a Jupyter Notebook, and my DataFrame contains URL request strings that can be hundreds of characters long without any whitespace separating characters.
我正在 Jupyter Notebook 中查看 Pandas DataFrame,我的 DataFrame 包含 URL 请求字符串,该字符串长度可能为数百个字符,没有任何空格分隔字符。
Pandas seems to only wrap text in a cell when there's whitespace, as shown on the attached picture:
Pandas 似乎只在有空格时在单元格中换行,如附图所示:
If there isn't whitespace, the string is displayed in a single line, and if there isn't enough space my options are either to see a '...' or I have to set display.max_colwidth
to a huge number and now I have a hard-to-read table with a lot of scrolling.
如果没有空格,则字符串显示在一行中,如果没有足够的空间,我的选择要么是查看“...”,要么必须设置display.max_colwidth
为一个巨大的数字,现在我有一个大量滚动的难以阅读的表格。
Is there a way to force Pandas to wrap text, say, every 100 characters, regardless of whether there is whitespace?
有没有办法强制 Pandas 换行文本,比如每 100 个字符,而不管是否有空格?
回答by paulo.filip3
You can set
你可以设置
import pandas as pd
pd.set_option('display.max_colwidth', 0)
and then each column will be just as big as it needs to bein order to fully display it's content. It will not wrap the textcontent of the cells though (unless they contain spaces).
然后每一列将和它需要的一样大,以便完全显示它的内容。它不会包装单元格的文本内容(除非它们包含空格)。
回答by mr_snuffles
If you're only in this for ad-hoc, temporary display purposes in Jupyter, you can simply insert whitespace every 100 characters:
如果你只是为了在 Jupyter 中进行临时的临时显示,你可以简单地每 100 个字符插入一个空格:
chunk_size = 100
块大小 = 100
data['new_column'] = [' '.join([val[0+i:chunk_size+i] for i in range(0, len(string), chunk_size)] for val in data['old_column']
data['new_column'] = [''.join([val[0+i:chunk_size+i] for i in range(0, len(string), chunk_size)] for val in data['old_column']
Though it looks like the reason this is a problem in the first place is because multiple features are collapsed into a single column. It's hard to say without seeing your larger dataset, but if they all follow they same pattern, I'd strongly suggest splitting this out into multiple features (browser, browser version, OS, OS version, etc), which will make any additional work with this dataset easier.
虽然看起来这是一个问题的首要原因是因为多个功能被折叠到一个列中。很难说没有看到更大的数据集,但如果它们都遵循相同的模式,我强烈建议将其拆分为多个功能(浏览器、浏览器版本、操作系统、操作系统版本等),这将进行任何额外的工作有了这个数据集就更容易了。
回答by O.Suleiman
You can use str.wrap
method:
您可以使用str.wrap
方法:
df['user_agent'] = df['user_agent'].str.wrap(100) #to set max line width of 100
回答by vestland
If you don't mind solving this before you put the whole thing into a dataframe, you can do it like described here. In your particular case, if you'd like each line to be 10 characters long, you would have:
如果您不介意在将整个内容放入数据帧之前解决这个问题,您可以按照此处所述进行操作。在您的特定情况下,如果您希望每行的长度为 10 个字符,您可以:
# Input
line = 'Mozilla/5.0 (Windows NT 6.3; Trident/7.0; rv:11.0; GomezAgent 3.0)
like Gecko'
n = 10
# Split
line = [line[i:i+n] for i in range(0, len(line), n)]
# The rest is easy
df = pd.DataFrame(line)
print(df)
Without the white spaces, you'll get:
没有空格,你会得到:
And by the way, the white space at the beginning of the last row occurs because there are not 10 characters to fill the row like there is in the preceding rows. In jupyter you could remedy this by using df.style.set_properties(**{'text-align': 'left'})
:
顺便说一下,在最后一行的开头出现空白是因为没有 10 个字符来填充该行,就像前几行那样。在 jupyter 中,您可以使用df.style.set_properties(**{'text-align': 'left'})
以下方法解决此问题:
回答by Pato Navarro
You can create a new column with the first 100 characters of the data
您可以使用数据的前 100 个字符创建一个新列
data['new_column'] = [i[:100] for i in data['old_column']]