在 Pandas 中将对象转换为字符串
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/42396530/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Convert object to string in pandas
提问by subro
I have variable in pandas dataframe with values as below
我在 Pandas 数据框中有变量,其值如下
print (df.xx)
1 5679558
2 (714) 254
3 0
4 00000000
5 000000000
6 00000000000
7 000000001
8 000000002
9 000000003
10 000000004
11 000000005
print (df.dtypes)
xx object
I am like below in order to convert this as num
我就像下面为了将其转换为 num
try:
print df.xx.apply(str).astype(int)
except ValueError:
pass
I did try like this
我确实尝试过这样
tin.tin = tin.tin.to_string().astype(int)
But this giving me MemoryError
, as I have 3M rows.
但这给了我MemoryError
,因为我有 300 万行。
Can some body help me in stripping special chars and converting as int64?
有人可以帮助我去除特殊字符并转换为 int64 吗?
采纳答案by EdChum
You can test if the string isdigit
and then use the boolean mask to convert those rows only in a vectorised manner and use to_numeric
with param errors='coerce'
:
您可以测试字符串isdigit
,然后使用布尔掩码仅以矢量化方式转换这些行并to_numeric
与 param 一起使用errors='coerce'
:
In [88]:
df.loc[df['xxx'].str.isdigit(), 'xxx'] = pd.to_numeric(df['xxx'], errors='coerce')
df
Out[88]:
xxx
0 5.67956e+06
1 (714) 254
2 0
3 0
4 0
5 0
6 1
7 2
8 3
9 4
10 5
回答by omri_saadon
You could split your huge dataframe into chunks, for example this method can do it where you can decide what is the chunk size:
您可以将庞大的数据帧拆分为多个块,例如,此方法可以在您可以决定块大小的情况下执行此操作:
def splitDataFrameIntoSmaller(df, chunkSize = 10000):
listOfDf = list()
numberChunks = len(df) // chunkSize + 1
for i in range(numberChunks):
listOfDf.append(df[i*chunkSize:(i+1)*chunkSize])
return listOfDf
After you have chunks, you can apply your function on each chunk separately.
有了块后,您可以分别在每个块上应用您的函数。