pandas 时间增量到熊猫数据框中的字符串类型

Question

提问by Chenrui Su

I have a dataframe dfand its first column is timedelta64

我有一个数据框df，它的第一列是timedelta64

df.info():

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 686 entries, 0 to 685
Data columns (total 6 columns):
0    686 non-null timedelta64[ns]
1    686 non-null object
2    686 non-null object
3    686 non-null object
4    686 non-null object
5    686 non-null object

If I print(df[0][2]), for example, it will give me 0 days 05:01:11. However, I don't want the 0 daysfiled. I only want 05:01:11to be printed. Could someone teaches me how to do this? Thanks so much!

print(df[0][2])例如，如果 I ，它会给我0 days 05:01:11。但是，我不想要0 days归档。我只想05:01:11被打印。有人可以教我如何做到这一点吗？非常感谢！

Answer 1

回答by jezrael

It is possible by:

可以通过以下方式：

df['duration1'] = df['duration'].astype(str).str[-18:-10]

But solution is not general, if input is 3 days 05:01:11it remove 3 daystoo.

但是解决方案不是通用的，如果输入也是3 days 05:01:11删除3 days。

So solution working only for timedeltas less as one day correctly.

因此，解决方案仅适用于少于一天的时间增量。

More general solution is create custom format:

更通用的解决方案是创建自定义格式：

N = 10
np.random.seed(11230)
rng = pd.date_range('2017-04-03 15:30:00', periods=N, freq='13.5H')
df = pd.DataFrame({'duration': np.abs(np.random.choice(rng, size=N) - 
                                 np.random.choice(rng, size=N)) })  

df['duration1'] = df['duration'].astype(str).str[-18:-10]

def f(x):
    ts = x.total_seconds()
    hours, remainder = divmod(ts, 3600)
    minutes, seconds = divmod(remainder, 60)
    return ('{}:{:02d}:{:02d}').format(int(hours), int(minutes), int(seconds)) 

df['duration2'] = df['duration'].apply(f)
print (df)

         duration duration1  duration2
0 2 days 06:00:00  06:00:00   54:00:00
1 2 days 19:30:00  19:30:00   67:30:00
2 1 days 03:00:00  03:00:00   27:00:00
3 0 days 00:00:00  00:00:00    0:00:00
4 4 days 12:00:00  12:00:00  108:00:00
5 1 days 03:00:00  03:00:00   27:00:00
6 0 days 13:30:00  13:30:00   13:30:00
7 1 days 16:30:00  16:30:00   40:30:00
8 0 days 00:00:00  00:00:00    0:00:00
9 1 days 16:30:00  16:30:00   40:30:00

Answer 2

回答by jayreed1

datetime.timedeltaalready formats the way you'd like. The crux of this issue is that Pandas internally converts to numpy.timedelta.

datetime.timedelta已经按照您想要的方式格式化。这个问题的关键是 Pandas 在内部转换为numpy.timedelta.

import pandas as pd
from datetime import timedelta

time_1 = timedelta(days=3, seconds=3400)
time_2 = timedelta(days=0, seconds=3400)
print(time_1)
print(time_2)

times = pd.Series([time_1, time_2])

# Times are converted to Numpy timedeltas.
print(times)

# Convert to string after converting to datetime.timedelta.
times = times.apply(
    lambda numpy_td: str(timedelta(seconds=numpy_td.total_seconds())))

print(times)

So, convert to a datetime.timedeltaand then str(to prevent conversion back to numpy.timedelta) before printing.

因此，在打印之前转换为 adatetime.timedelta然后str（以防止转换回numpy.timedelta）。

3 days, 0:56:40
0:56:400

0   3 days 00:56:40
1   0 days 00:56:40
dtype: timedelta64[ns]

0    3 days, 0:56:40
1            0:56:40
dtype: object

I came here looking for answers to the same question, so I felt I should add further clarification. : )

我来到这里寻找同一问题的答案，所以我觉得我应该进一步澄清。:)

Answer 3

回答by Optimesh

Given OP is ok with an object column (a little verbose):

鉴于 OP 可以使用对象列（有点冗长）：

def splitter(td):

  td = str(td).split(' ')[-1:][0]

  return td


df['split'] = df['timediff'].apply(splitter)

Basically we're taking the timedelta column, transforming the contents to a string, then splitting the string (creates a list) and taking the last item of that list, which would be the hh:mm:ss component.

基本上，我们使用 timedelta 列，将内容转换为字符串，然后拆分字符串（创建一个列表）并获取该列表的最后一项，即 hh:mm:ss 组件。

Note that specifying ' 'for what to split by is redundant here.

请注意，' '此处指定要拆分的内容是多余的。

Alternative one liner:

替代一种衬垫：

df['split2'] = df['timediff'].astype('str').str.split().str[-1]

which is very similar, but not very pretty IMHO. Also, the output includes milliseconds, which is not the case in the first solution. I'm not sure what the reason for that is (please comment if you do). If your data is big it might be worthwhile to time these different approaches.

这是非常相似的，但不是很漂亮恕我直言。此外，输出包括毫秒，这在第一个解决方案中不是这种情况。我不确定这是什么原因（如果你这样做，请发表评论）。如果您的数据很大，那么对这些不同的方法进行计时可能是值得的。

Answer 4

回答by Simon G.

Here's a short and robust version using apply():

这是一个简短而强大的版本，使用apply()：

df['timediff_string'] = df['timediff'].apply(
    lambda x: f'{x.components.hours:02d}:{x.components.minutes:02d}:{x.components.seconds:02d}'
              if not pd.isnull(x) else ''
)

This leverages the componentsattribute of pandas Timedelta objects and also handles empty values (NaT).

这利用了Pandas Timedelta 对象的components属性并处理空值 (NaT)。

If the timediffcolumn does not contain pandas Timedelta objects, you can convert it:

如果timediff列不包含 Pandas Timedelta 对象，则可以对其进行转换：

df['timediff'] = pd.to_timedelta(df['timediff'])

pandas 时间增量到熊猫数据框中的字符串类型

提问by Chenrui Su

回答by jezrael

回答by jayreed1

回答by Optimesh

回答by Simon G.

相关推荐

最近更新

标签

pandas 时间增量到熊猫数据框中的字符串类型

提问by Chenrui Su

回答by jezrael

回答by jayreed1

回答by Optimesh

回答by Simon G.

相关推荐

pandas Python中的石斑鱼和轴的长度必须相同

Pandas 在保存为 CSV 时更改 NaN 值的格式

pandas 合并多个大型DataFrame的有效方法

pandas 熊猫：将数据框中的多列汇总到一个新列

相关推荐

最近更新

标签