Python 如何舍入/删除熊猫列中的“.0”零?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/42403907/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 21:39:16  来源:igfitidea点击:

how to round/remove traling ".0" zeros in pandas column?

pythonpandas

提问by medev21

I'm trying to see if I can remove the trailing zeros from this phone number column.

我正在尝试查看是否可以从此电话号码列中删除尾随零。

Example:

例子:

0
1      8.00735e+09
2      4.35789e+09
3      6.10644e+09

The type in this column is an object, and I tried to round it but I am getting an error. I checked a couple of them I know they are in this format "8007354384.0", and want to get rid of the trailing zeros with the decimal point.

此列中的类型是一个对象,我尝试对其进行舍入,但出现错误。我检查了其中的几个,我知道它们采用这种格式“8007354384.0”,并且想要去掉带有小数点的尾随零。

Sometimes I received in this format and sometimes I don't, they will be integer numbers. I would like to check if the phone column has a trailing zero, then remove it.

有时我以这种格式接收,有时我不接收,它们将是整数。我想检查电话列是否有尾随零,然后将其删除。

I have this code but I'm stuck on how to check for trailing zeros for each row.

我有这个代码,但我坚持如何检查每一行的尾随零。

data.ix[data.phone.str.contains('.0'), 'phone']

I get an error => *** ValueError: cannot index with vector containing NA / NaN values. I believe the issue is because some rows have empty data, which sometime I do receive. The code above should be able to skip an empty row.

我收到一个错误 => *** ValueError: cannot index with vector containing NA / NaN values。我相信这个问题是因为有些行有空数据,有时我确实收到了。上面的代码应该能够跳过一个空行。

Does anybody have any suggestions? I'm new to pandas but so far it's an useful library. Your help will be appreciated.

有人有什么建议吗?我是熊猫的新手,但到目前为止它是一个有用的库。您的帮助将不胜感激。

NoteThe provided example above, the first row has an empty data, which I do sometimes I get. Just to make sure this is not represented as 0 for phone number.

注意上面提供的示例,第一行有一个空数据,我有时会这样做。只是为了确保电话号码不表示为 0。

Also empty data is considered a string, so it's a mix of floats and string, if rows are empty.

空数据也被视为字符串,因此如果行为空,则它是浮点数和字符串的混合。

回答by piRSquared

use astype(np.int64)

astype(np.int64)

s = pd.Series(['', 8.00735e+09, 4.35789e+09, 6.10644e+09])
mask = pd.to_numeric(s).notnull()
s.loc[mask] = s.loc[mask].astype(np.int64)
s

0              
1    8007350000
2    4357890000
3    6106440000
dtype: object

回答by Ken Wei

In Pandas/NumPy, integers are not allowed to take NaN values, and arrays/series (including dataframe columns) are homogeneous in their datatype --- so having a column of integers where some entries are None/np.nanis downright impossible.

在 Pandas/NumPy 中,整数不允许采用 NaN 值,并且数组/系列(包括数据帧列)的数据类型是同质的——因此有一列整数,其中某些条目是None/np.nan是完全不可能的

EDIT:data.phone.astype('object')should do the trick; in this case, Pandas treats your column as a series of generic Python objects, rather than a specific datatype (e.g. str/float/int), at the cost of performance if you intend to run any heavy computations with this data (probably not in your case).

编辑:data.phone.astype('object')应该可以解决问题;在这种情况下,熊猫把你列了一系列通用的Python对象,而不是特定的数据类型(如str/ float/ int),以性能为代价,如果你打算(在你的情况可能不会)运行任何重计算这个数据.

Assuming you want to keep those NaN entries, your approach of converting to strings is a valid possibility:

假设您想保留这些 NaN 条目,您转换为字符串的方法是一种有效的可能性:

data.phone.astype(str).str.split('.', expand = True)[0]

data.phone.astype(str).str.split('.', expand = True)[0]

should give you what you're looking for (there are alternative string methods you can use, such as .replaceor .extract, but .splitseems the most straightforward in this case).

应该给你你正在寻找的东西(你可以使用替代的字符串方法,比如.replaceor .extract,但.split在这种情况下似乎是最直接的)。

Alternatively, if you are only interested in the display of floats (unlikely I'd suppose), you can do pd.set_option('display.float_format','{:.0f}'.format), which doesn't actually affect your data.

或者,如果您只对浮点数的显示感兴趣(我认为不太可能),您可以这样做pd.set_option('display.float_format','{:.0f}'.format),这实际上不会影响您的数据。

回答by Некто

Just do

做就是了

data['phone'] = data['phone'].astype(str)          
data['phone'] = data['phone'].str.replace('.0', ' ')

which uses a regex style lookupon all entries in the column and replaces any '.0' matches with blank space. For example

它对列中的所有条目使用正则表达式样式查找,并用空格替换任何“.0”匹配项。例如

data = pd.DataFrame(
    data = [['bob','39384954.0'],['Lina','23827484.0']], 
    columns = ['user','phone'], index = [1,2]
)

data['phone'] = data['phone'].astype(str)
data['phone'] = data['phone'].str.replace('.0', ' ')
print data

   user     phone
1   bob  39384954
2  Lina  23827484

回答by Brohm

This answerby cs95 removes trailing “.0” in one row.

cs95 的这个答案在一行中删除了尾随的“.0”。

df = df.round(decimals=0).astype(object)

回答by U10-Forward

Try str.isnumericwith astypeand loc:

尝试str.isnumeric使用astypeloc

s = pd.Series(['', 8.00735e+09, 4.35789e+09, 6.10644e+09])
c = s.str.isnumeric().astype(bool)
s.loc[c] = s.loc[c].astype(np.int64)
print(s)

And now:

现在:

print(s)

Outputs:

输出:

0              
1    8007350000
2    4357890000
3    6106440000
dtype: object

回答by S.V

Here is a solution using pandas nullable integers(the solution assumes that input Series values are either empty strings or floating point numbers):

这是使用Pandas 可为空整数的解决方案(该解决方案假定输入系列值是空字符串或浮点数):

import pandas as pd, numpy as np
s = pd.Series(['', 8.00735e+09, 4.35789e+09, 6.10644e+09])
s.replace('', np.nan).astype('Int64')

Output (pandas-0.25.1):

输出 (pandas-0.25.1):

0           NaN
1    8007350000
2    4357890000
3    6106440000
dtype: Int64

Advantages of the solution:

该解决方案的优点:

  • The output values are either integers or missing values (not 'object' data type)
  • Efficient
  • 输出值是整数或缺失值(不是“对象”数据类型)
  • 高效的

回答by erncyp

import numpy as np
import pandas as pd

s = pd.Series([ None, np.nan, '',8.00735e+09,  4.35789e+09, 6.10644e+09])

s_new = s.fillna('').astype(str).str.replace(".0","",regex=False)
s_new

Here I filled null values with empty string, converted series to string type, replaced .0with empty string.
This outputs:

这里我用空字符串填充空值,将系列转换为字符串类型,替换.0为空字符串。
这输出:

0              
1              
2              
3    8007350000
4    4357890000
5    6106440000
dtype: object

回答by chrisckwong821

import numpy as np
tt = 8.00735e+09
time = int(np.format_float_positional(tt)[:-1])

回答by Marcel Flygare

It depends on the data format the telephone number is stored.

这取决于电话号码存储的数据格式。

If it is in an numberic format changing to an integer might solve the problem

如果它是数字格式更改为整数可能会解决问题

df = pd.DataFrame({'TelephoneNumber': [123.0, 234]})
df['TelephoneNumber'] =  df['TelephoneNumber'].astype('int32')

If it is really a string you can replace and re-assign the column.

如果它确实是一个字符串,您可以替换并重新分配该列。

df2 = pd.DataFrame({'TelephoneNumber': ['123.0', '234']})
df2['TelephoneNumber'] = df2['TelephoneNumber'].str.replace('.0', '')

回答by Shyam Bhimani

So Pandas automatically assign data type by looking at type of data in the event when you have mix type of data like some rows are NaN and some has int value there is huge possibilities it would assign dtype: objector float64

因此,通过在类型事件的数据看,当你有像一些行数据的混合型熊猫自动分配数据类型为NaN和一些具有int值也将指派巨大的可能性,dtype: objectfloat64

EX 1:

例 1:

import pandas as pd

data = [['tom', 10934000000], ['nick', 1534000000], ['juli', 1412000000]]
df = pd.DataFrame(data, columns = ['Name', 'Phone'])

>>> df
   Name        Phone
0   tom  10934000000
1  nick   1534000000
2  juli   1412000000

>>> df.dtypes
Name     object
Phone     int64
dtype: object

In above example pandas assume data type int64 reason being neither of row has NaN and all the rows in Phone column has integer value.

在上面的示例中,pandas 假设数据类型为 int64,原因是行中的任何一行都没有 NaN,并且 Phone 列中的所有行都具有整数值。

EX 2:

例 2:

 >>> data = [['tom'], ['nick', 1534000000], ['juli', 1412000000]]
 >>> df = pd.DataFrame(data, columns = ['Name', 'Phone'])
 >>> df

 Name         Phone
0   tom           NaN
1  nick  1.534000e+09
2  juli  1.412000e+09

>>> df.dtypes
Name      object
Phone    float64
dtype: object

To answer to your actual question, to get rid of .0 at the end you can do something like this

要回答你的实际问题,最后摆脱 .0 你可以做这样的事情

Solution 1:

解决方案1:

>>> data = [['tom', 9785000000.0], ['nick', 1534000000.0], ['juli', 1412000000]]
>>> df = pd.DataFrame(data, columns = ['Name', 'Phone'])
>>> df
   Name         Phone
0   tom  9.785000e+09
1  nick  1.534000e+09
2  juli  1.412000e+09

>>> df['Phone'] = df['Phone'].astype(int).astype(str)
>>> df
   Name       Phone
0   tom  9785000000
1  nick  1534000000
2  juli  1412000000

Solution 2:

解决方案2:

>>> df['Phone'] = df['Phone'].astype(str).str.replace('.0', '', regex=False)
>>> df
   Name       Phone
0   tom  9785000000
1  nick  1534000000
2  juli  1412000000