在 Python Pandas DataFrame 中将 timedelta64[ns] 列转换为秒
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/26456825/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Convert timedelta64[ns] column to seconds in Python Pandas DataFrame
提问by Nyxynyx
A pandas DataFrame column durationcontains timedelta64[ns]as shown. How can you convert them to seconds?
Pandas DataFrame 列duration包含timedelta64[ns]如图所示。如何将它们转换为秒?
0 00:20:32
1 00:23:10
2 00:24:55
3 00:13:17
4 00:18:52
Name: duration, dtype: timedelta64[ns]
I tried the following
我尝试了以下
print df[:5]['duration'] / np.timedelta64(1, 's')
but got the error
但得到了错误
Traceback (most recent call last):
File "test.py", line 16, in <module>
print df[0:5]['duration'] / np.timedelta64(1, 's')
File "C:\Python27\lib\site-packages\pandas\core\series.py", line 130, in wrapper
"addition and subtraction, but the operator [%s] was passed" % name)
TypeError: can only operate on a timedeltas for addition and subtraction, but the operator [__div__] was passed
Also tried
也试过
print df[:5]['duration'].astype('timedelta64[s]')
but received the error
但收到错误
Traceback (most recent call last):
File "test.py", line 17, in <module>
print df[:5]['duration'].astype('timedelta64[s]')
File "C:\Python27\lib\site-packages\pandas\core\series.py", line 934, in astype
values = com._astype_nansafe(self.values, dtype)
File "C:\Python27\lib\site-packages\pandas\core\common.py", line 1653, in _astype_nansafe
raise TypeError("cannot astype a timedelta from [%s] to [%s]" % (arr.dtype,dtype))
TypeError: cannot astype a timedelta from [timedelta64[ns]] to [timedelta64[s]]
采纳答案by unutbu
This works properly in the current version of Pandas (version 0.14):
这在当前版本的 Pandas(0.14 版)中正常工作:
In [132]: df[:5]['duration'] / np.timedelta64(1, 's')
Out[132]:
0 1232
1 1390
2 1495
3 797
4 1132
Name: duration, dtype: float64
Here is a workaround for older versions of Pandas/NumPy:
这是旧版本的 Pandas/NumPy 的解决方法:
In [131]: df[:5]['duration'].values.view('<i8')/10**9
Out[131]: array([1232, 1390, 1495, 797, 1132], dtype=int64)
timedelta64 and datetime64 data are stored internally as 8-byte ints (dtype
'<i8'). So the above views the timedelta64s as 8-byte ints and then does integer
division to convert nanoseconds to seconds.
timedelta64 和 datetime64 数据在内部存储为 8 字节整数 (dtype
'<i8')。因此,上面将 timedelta64s 视为 8 字节整数,然后进行整数除法以将纳秒转换为秒。
Note that you need NumPy version 1.7 or newerto work with datetime64/timedelta64s.
请注意,您需要 NumPy 1.7 或更高版本才能使用 datetime64/timedelta64s。
回答by Gunay Anach
Just realized it's an old thread, anyway leaving it here if wanderers like me clicks only on top 5 results on the search engine and ends up here.
刚刚意识到这是一个旧线程,无论如何,如果像我这样的流浪者只点击搜索引擎上的前 5 个结果并最终到达这里,就将它留在这里。
Make sure that your types are correct.
确保您的类型正确。
If you want to convert datetimeto seconds, just sum up seconds for each hour, minute and seconds of the datetime object if its for duration within one date.
- hours - hours x 3600 = seconds
- minutes - minutes x 60 = seconds
- seconds - seconds
如果要将datetime转换为seconds,只需将 datetime 对象的每一小时、分钟和秒的秒数相加(如果它的持续时间在一个日期内)。
- 小时 - 小时 x 3600 = 秒
- 分钟 - 分钟 x 60 = 秒
- 秒 - 秒
linear_df['duration'].dt.hour*3600 + linear_df['duration'].dt.minute*60 + linear_df['duration'].dt.second
linear_df['duration'].dt.hour*3600 + linear_df['duration'].dt.minute*60 + linear_df['duration'].dt.second
- If you want to convert timedeltato secondsuse the one bellow.
- 如果要将timedelta转换为秒,请使用下面的一个。
linear_df[:5]['duration'].astype('timedelta64[s]')
linear_df[:5]['duration'].astype('timedelta64[s]')
I got it to work like this:
我让它像这样工作:
start_dt and end_dt columns are in this format:
start_dt 和 end_dt 列采用以下格式:
import datetime
linear_df[:5]['start_dt']
0 1970-02-22 21:32:48.000
1 2016-12-30 17:47:33.216
2 2016-12-31 09:33:27.931
3 2016-12-31 09:52:53.486
4 2016-12-31 10:29:44.611
Name: start_dt, dtype: datetime64[ns]
Had my duration in timedelta64[ns] format, which was subtraction of startand enddatetime values.
我的持续时间为 timedelta64[ns] 格式,这是减去开始和结束日期时间值。
linear_df['duration'] = linear_df['end_dt'] - linear_df['start_dt']
Resulted duration column look like this
结果持续时间列如下所示
linear_df[:5]['duration']
0 0 days 00:00:14
1 2 days 17:44:50.558000
2 0 days 15:37:28.418000
3 0 days 18:45:45.727000
4 0 days 19:21:27.159000
Name: duration, dtype: timedelta64[ns]
Using pandas I had my duration seconds between two dates in float. Easier to compare or filter your duration afterwards.
使用熊猫,我的两个日期之间的持续时间秒数处于浮动状态。之后更容易比较或过滤您的持续时间。
linear_df[:5]['duration'].astype('timedelta64[s]')
0 14.0
1 236690.0
2 56248.0
3 67545.0
4 69687.0
Name: duration, dtype: float64
In my case if I want to get all duration which is more than 1 second.
就我而言,如果我想获得超过 1 秒的所有持续时间。
Hope it helps.
希望能帮助到你。
回答by Pardhu
We can simply use the pandas apply()function
我们可以简单地使用 pandas apply()函数
def get_seconds(time_delta):
return time_delta.seconds
def get_microseconds(time_delta):
return time_delta.micro_seconds
time_delta_series = df['duration']
converted_series = time_delta_series.apply(get_seconds)
print(converted_series)
回答by wwii
Use the Series dt accessorto get access to the methods and attributes of a datetime (timedelta) series.
使用Series dt 访问器来访问日期时间 (timedelta) 系列的方法和属性。
>>> s
0 -1 days +23:45:14.304000
1 -1 days +23:46:57.132000
2 -1 days +23:49:25.913000
3 -1 days +23:59:48.913000
4 00:00:00.820000
dtype: timedelta64[ns]
>>>
>>> s.dt.total_seconds()
0 -885.696
1 -782.868
2 -634.087
3 -11.087
4 0.820
dtype: float64
There are other Pandas Series Accessorsfor String, Categorical, and Sparse data types.
还有其他用于字符串、分类和稀疏数据类型的Pandas系列访问器。
回答by AntoineP
Use the 'total_seconds()' function :
使用“total_seconds()”函数:
df['durationSeconds'] = df['duration'].dt.total_seconds()

