pandas 在熊猫中截断列宽

Question

提问by Luke

I'm reading in large csv files into pandas some of them with String columns in the thousands of characters. Is there any quick way to limit the width of a column, i.e. only keep the first 100 characters?

我正在将大型 csv 文件读入 Pandas，其中一些带有数千个字符的字符串列。有没有什么快速的方法来限制列的宽度，即只保留前 100 个字符？

Answer 1

回答by DSM

If you can read the whole thing into memory, you can use the strmethod for vector operations:

如果可以将整个内容读入内存，则可以使用该str方法进行向量操作：

>>> df = pd.read_csv("toolong.csv")
>>> df
   a                       b  c
0  1  1256378916212378918293  2

[1 rows x 3 columns]
>>> df["b"] = df["b"].str[:10]
>>> df
   a           b  c
0  1  1256378916  2

[1 rows x 3 columns]

Also note that you can get a Series with lengths using

另请注意，您可以使用以下方法获得具有长度的系列

>>> df["b"].str.len()
0    10
Name: b, dtype: int64

I was originally wondering if

我最初想知道是否

>>> pd.read_csv("toolong.csv", converters={"b": lambda x: x[:5]})
   a      b  c
0  1  12563  2

[1 rows x 3 columns]

would be better but I don't actually know if the converters are called row-by-row or after the fact on the whole column.

会更好，但我实际上不知道转换器是逐行调用还是在整列之后调用。

pandas 在熊猫中截断列宽

提问by Luke

回答by DSM

相关推荐

最近更新

标签

pandas 在熊猫中截断列宽

提问by Luke

回答by DSM

相关推荐

使用多个 If-else 创建 Pandas 变量

Pandas HDF5 作为数据库

在 Pandas Dataframe pd.concat 之后，我得到了 NaN

Pandas：dropna 后就地重命名的特殊性能下降

相关推荐

最近更新

标签