pandas 时间增量到熊猫数据框中的字符串类型

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/51101432/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:45:29  来源:igfitidea点击:

timedelta to string type in pandas dataframe

pythonpandas

提问by Chenrui Su

I have a dataframe dfand its first column is timedelta64

我有一个数据框df,它的第一列是timedelta64

df.info():

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 686 entries, 0 to 685
Data columns (total 6 columns):
0    686 non-null timedelta64[ns]
1    686 non-null object
2    686 non-null object
3    686 non-null object
4    686 non-null object
5    686 non-null object

If I print(df[0][2]), for example, it will give me 0 days 05:01:11. However, I don't want the 0 daysfiled. I only want 05:01:11to be printed. Could someone teaches me how to do this? Thanks so much!

print(df[0][2])例如,如果 I ,它会给我0 days 05:01:11。但是,我不想要0 days归档。我只想05:01:11被打印。有人可以教我如何做到这一点吗?非常感谢!

回答by jezrael

It is possible by:

可以通过以下方式:

df['duration1'] = df['duration'].astype(str).str[-18:-10]

But solution is not general, if input is 3 days 05:01:11it remove 3 daystoo.

但是解决方案不是通用的,如果输入也是3 days 05:01:11删除3 days

So solution working only for timedeltas less as one day correctly.

因此,解决方案仅适用于少于一天的时间增量。

More general solution is create custom format:

更通用的解决方案是创建自定义格式

N = 10
np.random.seed(11230)
rng = pd.date_range('2017-04-03 15:30:00', periods=N, freq='13.5H')
df = pd.DataFrame({'duration': np.abs(np.random.choice(rng, size=N) - 
                                 np.random.choice(rng, size=N)) })  

df['duration1'] = df['duration'].astype(str).str[-18:-10]

def f(x):
    ts = x.total_seconds()
    hours, remainder = divmod(ts, 3600)
    minutes, seconds = divmod(remainder, 60)
    return ('{}:{:02d}:{:02d}').format(int(hours), int(minutes), int(seconds)) 

df['duration2'] = df['duration'].apply(f)
print (df)

         duration duration1  duration2
0 2 days 06:00:00  06:00:00   54:00:00
1 2 days 19:30:00  19:30:00   67:30:00
2 1 days 03:00:00  03:00:00   27:00:00
3 0 days 00:00:00  00:00:00    0:00:00
4 4 days 12:00:00  12:00:00  108:00:00
5 1 days 03:00:00  03:00:00   27:00:00
6 0 days 13:30:00  13:30:00   13:30:00
7 1 days 16:30:00  16:30:00   40:30:00
8 0 days 00:00:00  00:00:00    0:00:00
9 1 days 16:30:00  16:30:00   40:30:00

回答by jayreed1

datetime.timedeltaalready formats the way you'd like. The crux of this issue is that Pandas internally converts to numpy.timedelta.

datetime.timedelta已经按照您想要的方式格式化。这个问题的关键是 Pandas 在内部转换为numpy.timedelta.

import pandas as pd
from datetime import timedelta

time_1 = timedelta(days=3, seconds=3400)
time_2 = timedelta(days=0, seconds=3400)
print(time_1)
print(time_2)

times = pd.Series([time_1, time_2])

# Times are converted to Numpy timedeltas.
print(times)

# Convert to string after converting to datetime.timedelta.
times = times.apply(
    lambda numpy_td: str(timedelta(seconds=numpy_td.total_seconds())))

print(times)

So, convert to a datetime.timedeltaand then str(to prevent conversion back to numpy.timedelta) before printing.

因此,在打印之前转换为 adatetime.timedelta然后str(以防止转换回numpy.timedelta)。

3 days, 0:56:40
0:56:400

0   3 days 00:56:40
1   0 days 00:56:40
dtype: timedelta64[ns]

0    3 days, 0:56:40
1            0:56:40
dtype: object

I came here looking for answers to the same question, so I felt I should add further clarification. : )

我来到这里寻找同一问题的答案,所以我觉得我应该进一步澄清。:)

回答by Optimesh

Given OP is ok with an object column (a little verbose):

鉴于 OP 可以使用对象列(有点冗长):

def splitter(td):

  td = str(td).split(' ')[-1:][0]

  return td


df['split'] = df['timediff'].apply(splitter)

Basically we're taking the timedelta column, transforming the contents to a string, then splitting the string (creates a list) and taking the last item of that list, which would be the hh:mm:ss component.

基本上,我们使用 timedelta 列,将内容转换为字符串,然后拆分字符串(创建一个列表)并获取该列表的最后一项,即 hh:mm:ss 组件。

Note that specifying ' 'for what to split by is redundant here.

请注意,' '此处指定要拆分的内容是多余的。

Alternative one liner:

替代一种衬垫:

df['split2'] = df['timediff'].astype('str').str.split().str[-1]

which is very similar, but not very pretty IMHO. Also, the output includes milliseconds, which is not the case in the first solution. I'm not sure what the reason for that is (please comment if you do). If your data is big it might be worthwhile to time these different approaches.

这是非常相似的,但不是很漂亮恕我直言。此外,输出包括毫秒,这在第一个解决方案中不是这种情况。我不确定这是什么原因(如果你这样做,请发表评论)。如果您的数据很大,那么对这些不同的方法进行计时可能是值得的。

回答by Simon G.

Here's a short and robust version using apply():

这是一个简短而强大的版本,使用apply()

df['timediff_string'] = df['timediff'].apply(
    lambda x: f'{x.components.hours:02d}:{x.components.minutes:02d}:{x.components.seconds:02d}'
              if not pd.isnull(x) else ''
)

This leverages the componentsattribute of pandas Timedelta objects and also handles empty values (NaT).

这利用了Pandas Timedelta 对象的components属性并处理空值 (NaT)。

If the timediffcolumn does not contain pandas Timedelta objects, you can convert it:

如果timediff列不包含 Pandas Timedelta 对象,则可以对其进行转换:

df['timediff'] = pd.to_timedelta(df['timediff'])