pandas 在熊猫中设置最大字符串长度
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/27722658/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Set max string length in pandas
提问by bcollins
I want my dataframe to auto-truncate strings which are longer than a certain length.
我希望我的数据框自动截断长度超过特定长度的字符串。
basically:
基本上:
pd.set_option('auto_truncate_string_exceeding_this_length', 255)
Any ideas? I have hundreds of columns and don't want to iterate over every data point. If this can be achieved during import that would also be fine (e.g. pd.read_csv())
有任何想法吗?我有数百列,不想遍历每个数据点。如果这可以在导入期间实现,那也很好(例如 pd.read_csv())
Thanks.
谢谢。
回答by EdChum
I'm not sure you can do this on the whole df, the following would work after loading:
我不确定您是否可以在整个 df 上执行此操作,加载后以下内容将起作用:
In [21]:
df = pd.DataFrame({"a":['jasjdhadasd']*5, "b":arange(5)})
df
Out[21]:
a b
0 jasjdhadasd 0
1 jasjdhadasd 1
2 jasjdhadasd 2
3 jasjdhadasd 3
4 jasjdhadasd 4
In [22]:
for col in df:
if is_string_like(df[col]):
df[col] = df[col].str.slice(0,5)
df
Out[22]:
a b
0 jasjd 0
1 jasjd 1
2 jasjd 2
3 jasjd 3
4 jasjd 4
EDIT
编辑
I think if you specified the dtypes in the args to read_csvthen you could set the max length:
我认为如果您在 args 中指定了 dtypes,read_csv那么您可以设置最大长度:
df = pd.read_csv('file.csv', dtype=(np.str, maxlen))
df = pd.read_csv('file.csv', dtype=(np.str, maxlen))
I will try this and confirm shortly
我会尝试这个并尽快确认
UPDATE
更新
Sadly you cannot specify the length, an error is raised if you try this:
遗憾的是,您无法指定长度,如果您尝试这样做,则会引发错误:
NotImplementedError: the dtype <U5 is not supported for parsing
when attempting to pass the arg dtype=(str,5)
尝试传递 arg 时 dtype=(str,5)
回答by ilmucio
pd.set_option('display.max_colwidth', 255)
pd.set_option('display.max_colwidth', 255)
回答by Ali Faizan
You can use read_csv converters. Lets say you want to truncate column name abc, you can pass a dictionary with function like
您可以使用read_csv 转换器。假设你想截断列名abc,你可以传递一个具有类似功能的字典
def auto_truncate(val):
return val[:255]
df = pd.read_csv('file.csv', converters={'abc': auto_truncate}
If you have columns with different lengths
如果您有不同长度的列
df = pd.read_csv('file.csv', converters={'abc': lambda: x: x[:255], 'xyz': lambda: x: x[:512]}
Make sure column type is string. Column index can also be used instead of name in converters dict.
确保列类型是字符串。列索引也可以用来代替转换器字典中的名称。

