Python 如何舍入/删除熊猫列中的“.0”零?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/42403907/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
how to round/remove traling ".0" zeros in pandas column?
提问by medev21
I'm trying to see if I can remove the trailing zeros from this phone number column.
我正在尝试查看是否可以从此电话号码列中删除尾随零。
Example:
例子:
0
1 8.00735e+09
2 4.35789e+09
3 6.10644e+09
The type in this column is an object, and I tried to round it but I am getting an error. I checked a couple of them I know they are in this format "8007354384.0", and want to get rid of the trailing zeros with the decimal point.
此列中的类型是一个对象,我尝试对其进行舍入,但出现错误。我检查了其中的几个,我知道它们采用这种格式“8007354384.0”,并且想要去掉带有小数点的尾随零。
Sometimes I received in this format and sometimes I don't, they will be integer numbers. I would like to check if the phone column has a trailing zero, then remove it.
有时我以这种格式接收,有时我不接收,它们将是整数。我想检查电话列是否有尾随零,然后将其删除。
I have this code but I'm stuck on how to check for trailing zeros for each row.
我有这个代码,但我坚持如何检查每一行的尾随零。
data.ix[data.phone.str.contains('.0'), 'phone']
I get an error => *** ValueError: cannot index with vector containing NA / NaN values
. I believe the issue is because some rows have empty data, which sometime I do receive. The code above should be able to skip an empty row.
我收到一个错误 => *** ValueError: cannot index with vector containing NA / NaN values
。我相信这个问题是因为有些行有空数据,有时我确实收到了。上面的代码应该能够跳过一个空行。
Does anybody have any suggestions? I'm new to pandas but so far it's an useful library. Your help will be appreciated.
有人有什么建议吗?我是熊猫的新手,但到目前为止它是一个有用的库。您的帮助将不胜感激。
NoteThe provided example above, the first row has an empty data, which I do sometimes I get. Just to make sure this is not represented as 0 for phone number.
注意上面提供的示例,第一行有一个空数据,我有时会这样做。只是为了确保电话号码不表示为 0。
Also empty data is considered a string, so it's a mix of floats and string, if rows are empty.
空数据也被视为字符串,因此如果行为空,则它是浮点数和字符串的混合。
回答by piRSquared
use astype(np.int64)
用 astype(np.int64)
s = pd.Series(['', 8.00735e+09, 4.35789e+09, 6.10644e+09])
mask = pd.to_numeric(s).notnull()
s.loc[mask] = s.loc[mask].astype(np.int64)
s
0
1 8007350000
2 4357890000
3 6106440000
dtype: object
回答by Ken Wei
In Pandas/NumPy, integers are not allowed to take NaN values, and arrays/series (including dataframe columns) are homogeneous in their datatype --- so having a column of integers where some entries are None
/np.nan
is downright impossible.
在 Pandas/NumPy 中,整数不允许采用 NaN 值,并且数组/系列(包括数据帧列)的数据类型是同质的——因此有一列整数,其中某些条目是None
/np.nan
是完全不可能的。
EDIT:data.phone.astype('object')
should do the trick; in this case, Pandas treats your column as a series of generic Python objects, rather than a specific datatype (e.g. str
/float
/int
), at the cost of performance if you intend to run any heavy computations with this data (probably not in your case).
编辑:data.phone.astype('object')
应该可以解决问题;在这种情况下,熊猫把你列了一系列通用的Python对象,而不是特定的数据类型(如str
/ float
/ int
),以性能为代价,如果你打算(在你的情况可能不会)运行任何重计算这个数据.
Assuming you want to keep those NaN entries, your approach of converting to strings is a valid possibility:
假设您想保留这些 NaN 条目,您转换为字符串的方法是一种有效的可能性:
data.phone.astype(str).str.split('.', expand = True)[0]
data.phone.astype(str).str.split('.', expand = True)[0]
should give you what you're looking for (there are alternative string methods you can use, such as .replace
or .extract
, but .split
seems the most straightforward in this case).
应该给你你正在寻找的东西(你可以使用替代的字符串方法,比如.replace
or .extract
,但.split
在这种情况下似乎是最直接的)。
Alternatively, if you are only interested in the display of floats (unlikely I'd suppose), you can do pd.set_option('display.float_format','{:.0f}'.format)
, which doesn't actually affect your data.
或者,如果您只对浮点数的显示感兴趣(我认为不太可能),您可以这样做pd.set_option('display.float_format','{:.0f}'.format)
,这实际上不会影响您的数据。
回答by Некто
Just do
做就是了
data['phone'] = data['phone'].astype(str)
data['phone'] = data['phone'].str.replace('.0', ' ')
which uses a regex style lookupon all entries in the column and replaces any '.0' matches with blank space. For example
它对列中的所有条目使用正则表达式样式查找,并用空格替换任何“.0”匹配项。例如
data = pd.DataFrame(
data = [['bob','39384954.0'],['Lina','23827484.0']],
columns = ['user','phone'], index = [1,2]
)
data['phone'] = data['phone'].astype(str)
data['phone'] = data['phone'].str.replace('.0', ' ')
print data
user phone
1 bob 39384954
2 Lina 23827484
回答by Brohm
This answerby cs95 removes trailing “.0” in one row.
cs95 的这个答案在一行中删除了尾随的“.0”。
df = df.round(decimals=0).astype(object)
回答by U10-Forward
Try str.isnumeric
with astype
and loc
:
尝试str.isnumeric
使用astype
和loc
:
s = pd.Series(['', 8.00735e+09, 4.35789e+09, 6.10644e+09])
c = s.str.isnumeric().astype(bool)
s.loc[c] = s.loc[c].astype(np.int64)
print(s)
And now:
现在:
print(s)
Outputs:
输出:
0
1 8007350000
2 4357890000
3 6106440000
dtype: object
回答by S.V
Here is a solution using pandas nullable integers(the solution assumes that input Series values are either empty strings or floating point numbers):
这是使用Pandas 可为空整数的解决方案(该解决方案假定输入系列值是空字符串或浮点数):
import pandas as pd, numpy as np
s = pd.Series(['', 8.00735e+09, 4.35789e+09, 6.10644e+09])
s.replace('', np.nan).astype('Int64')
Output (pandas-0.25.1):
输出 (pandas-0.25.1):
0 NaN
1 8007350000
2 4357890000
3 6106440000
dtype: Int64
Advantages of the solution:
该解决方案的优点:
- The output values are either integers or missing values (not 'object' data type)
- Efficient
- 输出值是整数或缺失值(不是“对象”数据类型)
- 高效的
回答by erncyp
import numpy as np
import pandas as pd
s = pd.Series([ None, np.nan, '',8.00735e+09, 4.35789e+09, 6.10644e+09])
s_new = s.fillna('').astype(str).str.replace(".0","",regex=False)
s_new
Here I filled null values with empty string, converted series to string type, replaced .0
with empty string.
This outputs:
这里我用空字符串填充空值,将系列转换为字符串类型,替换.0
为空字符串。
这输出:
0
1
2
3 8007350000
4 4357890000
5 6106440000
dtype: object
回答by chrisckwong821
import numpy as np
tt = 8.00735e+09
time = int(np.format_float_positional(tt)[:-1])
回答by Marcel Flygare
It depends on the data format the telephone number is stored.
这取决于电话号码存储的数据格式。
If it is in an numberic format changing to an integer might solve the problem
如果它是数字格式更改为整数可能会解决问题
df = pd.DataFrame({'TelephoneNumber': [123.0, 234]})
df['TelephoneNumber'] = df['TelephoneNumber'].astype('int32')
If it is really a string you can replace and re-assign the column.
如果它确实是一个字符串,您可以替换并重新分配该列。
df2 = pd.DataFrame({'TelephoneNumber': ['123.0', '234']})
df2['TelephoneNumber'] = df2['TelephoneNumber'].str.replace('.0', '')
回答by Shyam Bhimani
So Pandas automatically assign data type by looking at type of data in the event when you have mix type of data like some rows are NaN and some has int value there is huge possibilities it would assign dtype: object
or float64
因此,通过在类型事件的数据看,当你有像一些行数据的混合型熊猫自动分配数据类型为NaN和一些具有int值也将指派巨大的可能性,dtype: object
或float64
EX 1:
例 1:
import pandas as pd
data = [['tom', 10934000000], ['nick', 1534000000], ['juli', 1412000000]]
df = pd.DataFrame(data, columns = ['Name', 'Phone'])
>>> df
Name Phone
0 tom 10934000000
1 nick 1534000000
2 juli 1412000000
>>> df.dtypes
Name object
Phone int64
dtype: object
In above example pandas assume data type int64 reason being neither of row has NaN and all the rows in Phone column has integer value.
在上面的示例中,pandas 假设数据类型为 int64,原因是行中的任何一行都没有 NaN,并且 Phone 列中的所有行都具有整数值。
EX 2:
例 2:
>>> data = [['tom'], ['nick', 1534000000], ['juli', 1412000000]]
>>> df = pd.DataFrame(data, columns = ['Name', 'Phone'])
>>> df
Name Phone
0 tom NaN
1 nick 1.534000e+09
2 juli 1.412000e+09
>>> df.dtypes
Name object
Phone float64
dtype: object
To answer to your actual question, to get rid of .0 at the end you can do something like this
要回答你的实际问题,最后摆脱 .0 你可以做这样的事情
Solution 1:
解决方案1:
>>> data = [['tom', 9785000000.0], ['nick', 1534000000.0], ['juli', 1412000000]]
>>> df = pd.DataFrame(data, columns = ['Name', 'Phone'])
>>> df
Name Phone
0 tom 9.785000e+09
1 nick 1.534000e+09
2 juli 1.412000e+09
>>> df['Phone'] = df['Phone'].astype(int).astype(str)
>>> df
Name Phone
0 tom 9785000000
1 nick 1534000000
2 juli 1412000000
Solution 2:
解决方案2:
>>> df['Phone'] = df['Phone'].astype(str).str.replace('.0', '', regex=False)
>>> df
Name Phone
0 tom 9785000000
1 nick 1534000000
2 juli 1412000000