pandas 在熊猫中设置最大字符串长度

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/27722658/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:48:15  来源:igfitidea点击:

Set max string length in pandas

pythonpandas

提问by bcollins

I want my dataframe to auto-truncate strings which are longer than a certain length.

我希望我的数据框自动截断长度超过特定长度的字符串。

basically:

基本上:

pd.set_option('auto_truncate_string_exceeding_this_length', 255)

Any ideas? I have hundreds of columns and don't want to iterate over every data point. If this can be achieved during import that would also be fine (e.g. pd.read_csv())

有任何想法吗?我有数百列,不想遍历每个数据点。如果这可以在导入期间实现,那也很好(例如 pd.read_csv())

Thanks.

谢谢。

回答by EdChum

I'm not sure you can do this on the whole df, the following would work after loading:

我不确定您是否可以在整个 df 上执行此操作,加载后以下内容将起作用:

In [21]:

df = pd.DataFrame({"a":['jasjdhadasd']*5, "b":arange(5)})
df
Out[21]:
             a  b
0  jasjdhadasd  0
1  jasjdhadasd  1
2  jasjdhadasd  2
3  jasjdhadasd  3
4  jasjdhadasd  4
In [22]:

for col in df:
    if is_string_like(df[col]):
        df[col] = df[col].str.slice(0,5)
df
Out[22]:
       a  b
0  jasjd  0
1  jasjd  1
2  jasjd  2
3  jasjd  3
4  jasjd  4

EDIT

编辑

I think if you specified the dtypes in the args to read_csvthen you could set the max length:

我认为如果您在 args 中指定了 dtypes,read_csv那么您可以设置最大长度:

df = pd.read_csv('file.csv', dtype=(np.str, maxlen))

df = pd.read_csv('file.csv', dtype=(np.str, maxlen))

I will try this and confirm shortly

我会尝试这个并尽快确认

UPDATE

更新

Sadly you cannot specify the length, an error is raised if you try this:

遗憾的是,您无法指定长度,如果您尝试这样做,则会引发错误:

NotImplementedError: the dtype <U5 is not supported for parsing

when attempting to pass the arg dtype=(str,5)

尝试传递 arg 时 dtype=(str,5)

回答by ilmucio

pd.set_option('display.max_colwidth', 255)

pd.set_option('display.max_colwidth', 255)

回答by Ali Faizan

You can use read_csv converters. Lets say you want to truncate column name abc, you can pass a dictionary with function like

您可以使用read_csv 转换器。假设你想截断列名abc,你可以传递一个具有类似功能的字典

def auto_truncate(val):
    return val[:255]
df = pd.read_csv('file.csv', converters={'abc': auto_truncate}

If you have columns with different lengths

如果您有不同长度的列

df = pd.read_csv('file.csv', converters={'abc': lambda: x: x[:255], 'xyz': lambda: x: x[:512]}

Make sure column type is string. Column index can also be used instead of name in converters dict.

确保列类型是字符串。列索引也可以用来代替转换器字典中的名称。