pandas 时间增量到熊猫数据框中的字符串类型
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/51101432/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
timedelta to string type in pandas dataframe
提问by Chenrui Su
I have a dataframe df
and its first column is timedelta64
我有一个数据框df
,它的第一列是timedelta64
df.info():
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 686 entries, 0 to 685
Data columns (total 6 columns):
0 686 non-null timedelta64[ns]
1 686 non-null object
2 686 non-null object
3 686 non-null object
4 686 non-null object
5 686 non-null object
If I print(df[0][2])
, for example, it will give me 0 days 05:01:11
. However, I don't want the 0 days
filed. I only want 05:01:11
to be printed. Could someone teaches me how to do this? Thanks so much!
print(df[0][2])
例如,如果 I ,它会给我0 days 05:01:11
。但是,我不想要0 days
归档。我只想05:01:11
被打印。有人可以教我如何做到这一点吗?非常感谢!
回答by jezrael
It is possible by:
可以通过以下方式:
df['duration1'] = df['duration'].astype(str).str[-18:-10]
But solution is not general, if input is 3 days 05:01:11
it remove 3 days
too.
但是解决方案不是通用的,如果输入也是3 days 05:01:11
删除3 days
。
So solution working only for timedeltas less as one day correctly.
因此,解决方案仅适用于少于一天的时间增量。
More general solution is create custom format:
更通用的解决方案是创建自定义格式:
N = 10
np.random.seed(11230)
rng = pd.date_range('2017-04-03 15:30:00', periods=N, freq='13.5H')
df = pd.DataFrame({'duration': np.abs(np.random.choice(rng, size=N) -
np.random.choice(rng, size=N)) })
df['duration1'] = df['duration'].astype(str).str[-18:-10]
def f(x):
ts = x.total_seconds()
hours, remainder = divmod(ts, 3600)
minutes, seconds = divmod(remainder, 60)
return ('{}:{:02d}:{:02d}').format(int(hours), int(minutes), int(seconds))
df['duration2'] = df['duration'].apply(f)
print (df)
duration duration1 duration2
0 2 days 06:00:00 06:00:00 54:00:00
1 2 days 19:30:00 19:30:00 67:30:00
2 1 days 03:00:00 03:00:00 27:00:00
3 0 days 00:00:00 00:00:00 0:00:00
4 4 days 12:00:00 12:00:00 108:00:00
5 1 days 03:00:00 03:00:00 27:00:00
6 0 days 13:30:00 13:30:00 13:30:00
7 1 days 16:30:00 16:30:00 40:30:00
8 0 days 00:00:00 00:00:00 0:00:00
9 1 days 16:30:00 16:30:00 40:30:00
回答by jayreed1
datetime.timedelta
already formats the way you'd like. The crux of this issue is that Pandas internally converts to numpy.timedelta
.
datetime.timedelta
已经按照您想要的方式格式化。这个问题的关键是 Pandas 在内部转换为numpy.timedelta
.
import pandas as pd
from datetime import timedelta
time_1 = timedelta(days=3, seconds=3400)
time_2 = timedelta(days=0, seconds=3400)
print(time_1)
print(time_2)
times = pd.Series([time_1, time_2])
# Times are converted to Numpy timedeltas.
print(times)
# Convert to string after converting to datetime.timedelta.
times = times.apply(
lambda numpy_td: str(timedelta(seconds=numpy_td.total_seconds())))
print(times)
So, convert to a datetime.timedelta
and then str
(to prevent conversion back to numpy.timedelta
) before printing.
因此,在打印之前转换为 adatetime.timedelta
然后str
(以防止转换回numpy.timedelta
)。
3 days, 0:56:40
0:56:400
0 3 days 00:56:40
1 0 days 00:56:40
dtype: timedelta64[ns]
0 3 days, 0:56:40
1 0:56:40
dtype: object
I came here looking for answers to the same question, so I felt I should add further clarification. : )
我来到这里寻找同一问题的答案,所以我觉得我应该进一步澄清。:)
回答by Optimesh
Given OP is ok with an object column (a little verbose):
鉴于 OP 可以使用对象列(有点冗长):
def splitter(td):
td = str(td).split(' ')[-1:][0]
return td
df['split'] = df['timediff'].apply(splitter)
Basically we're taking the timedelta column, transforming the contents to a string, then splitting the string (creates a list) and taking the last item of that list, which would be the hh:mm:ss component.
基本上,我们使用 timedelta 列,将内容转换为字符串,然后拆分字符串(创建一个列表)并获取该列表的最后一项,即 hh:mm:ss 组件。
Note that specifying ' '
for what to split by is redundant here.
请注意,' '
此处指定要拆分的内容是多余的。
Alternative one liner:
替代一种衬垫:
df['split2'] = df['timediff'].astype('str').str.split().str[-1]
which is very similar, but not very pretty IMHO. Also, the output includes milliseconds, which is not the case in the first solution. I'm not sure what the reason for that is (please comment if you do). If your data is big it might be worthwhile to time these different approaches.
这是非常相似的,但不是很漂亮恕我直言。此外,输出包括毫秒,这在第一个解决方案中不是这种情况。我不确定这是什么原因(如果你这样做,请发表评论)。如果您的数据很大,那么对这些不同的方法进行计时可能是值得的。
回答by Simon G.
Here's a short and robust version using apply()
:
这是一个简短而强大的版本,使用apply()
:
df['timediff_string'] = df['timediff'].apply(
lambda x: f'{x.components.hours:02d}:{x.components.minutes:02d}:{x.components.seconds:02d}'
if not pd.isnull(x) else ''
)
This leverages the componentsattribute of pandas Timedelta objects and also handles empty values (NaT).
这利用了Pandas Timedelta 对象的components属性并处理空值 (NaT)。
If the timediffcolumn does not contain pandas Timedelta objects, you can convert it:
如果timediff列不包含 Pandas Timedelta 对象,则可以对其进行转换:
df['timediff'] = pd.to_timedelta(df['timediff'])