pandas 在熊猫中设置最大字符串长度

Question

提问by bcollins

I want my dataframe to auto-truncate strings which are longer than a certain length.

我希望我的数据框自动截断长度超过特定长度的字符串。

basically:

基本上：

pd.set_option('auto_truncate_string_exceeding_this_length', 255)

Any ideas? I have hundreds of columns and don't want to iterate over every data point. If this can be achieved during import that would also be fine (e.g. pd.read_csv())

有任何想法吗？我有数百列，不想遍历每个数据点。如果这可以在导入期间实现，那也很好（例如 pd.read_csv()）

Thanks.

谢谢。

Answer 1

回答by EdChum

I'm not sure you can do this on the whole df, the following would work after loading:

我不确定您是否可以在整个 df 上执行此操作，加载后以下内容将起作用：

In [21]:

df = pd.DataFrame({"a":['jasjdhadasd']*5, "b":arange(5)})
df
Out[21]:
             a  b
0  jasjdhadasd  0
1  jasjdhadasd  1
2  jasjdhadasd  2
3  jasjdhadasd  3
4  jasjdhadasd  4
In [22]:

for col in df:
    if is_string_like(df[col]):
        df[col] = df[col].str.slice(0,5)
df
Out[22]:
       a  b
0  jasjd  0
1  jasjd  1
2  jasjd  2
3  jasjd  3
4  jasjd  4

EDIT

编辑

I think if you specified the dtypes in the args to read_csvthen you could set the max length:

我认为如果您在 args 中指定了 dtypes，read_csv那么您可以设置最大长度：

df = pd.read_csv('file.csv', dtype=(np.str, maxlen))

I will try this and confirm shortly

我会尝试这个并尽快确认

UPDATE

更新

Sadly you cannot specify the length, an error is raised if you try this:

遗憾的是，您无法指定长度，如果您尝试这样做，则会引发错误：

NotImplementedError: the dtype <U5 is not supported for parsing

when attempting to pass the arg dtype=(str,5)

尝试传递 arg 时 dtype=(str,5)

Answer 2

回答by ilmucio

pd.set_option('display.max_colwidth', 255)

Answer 3

回答by Ali Faizan

You can use read_csv converters. Lets say you want to truncate column name abc, you can pass a dictionary with function like

您可以使用read_csv 转换器。假设你想截断列名abc，你可以传递一个具有类似功能的字典

def auto_truncate(val):
    return val[:255]
df = pd.read_csv('file.csv', converters={'abc': auto_truncate}

If you have columns with different lengths

如果您有不同长度的列

df = pd.read_csv('file.csv', converters={'abc': lambda: x: x[:255], 'xyz': lambda: x: x[:512]}

Make sure column type is string. Column index can also be used instead of name in converters dict.

确保列类型是字符串。列索引也可以用来代替转换器字典中的名称。

pandas 在熊猫中设置最大字符串长度

提问by bcollins

回答by EdChum

回答by ilmucio

回答by Ali Faizan

相关推荐

最近更新

标签

pandas 在熊猫中设置最大字符串长度

提问by bcollins

回答by EdChum

回答by ilmucio

回答by Ali Faizan

相关推荐

pandas 如何使 x 和 y 轴标签的文本大小以及 matplotlib 和 prettyplotlib 图形上的标题更大

pandas 使用 to_csv() 后关闭文件

连接、pandas 和 join_axes

Pandas read_fwf 不加载文件的整个内容

相关推荐

最近更新

标签