我有一个df与通常的时间戳作为索引:
2011-04-01 09:30:00 2011-04-01 09:30:10 ... 2011-04-01 09:36:20 ... 2011-04-01 09:37:30
如何创建具有相同时间戳的数据帧的列,但四舍五入到最接近的第5分钟间隔?喜欢这个:
index new_col 2011-04-01 09:30:00 2011-04-01 09:35:00 2011-04-01 09:30:10 2011-04-01 09:35:00 2011-04-01 09:36:20 2011-04-01 09:40:00 2011-04-01 09:37:30 2011-04-01 09:40:00
解决方法
The
round_to_5min(t)
solution using timedelta
arithmetic是正确的但复杂而且很慢.相反,在熊猫中使用漂亮的Timstamp:
import numpy as np import pandas as pd ns5min=5*60*1000000000 # 5 minutes in nanoseconds pd.to_datetime(((df.index.astype(np.int64) // ns5min + 1 ) * ns5min))
我们来比较一下速度:
rng = pd.date_range('1/1/2014','1/2/2014',freq='S') print len(rng) # 86401 # ipython %timeit %timeit pd.to_datetime(((rng.astype(np.int64) // ns5min + 1 ) * ns5min)) # 1000 loops,best of 3: 1.01 ms per loop %timeit rng.map(round_to_5min) # 1 loops,best of 3: 1.03 s per loop
只要约1000倍快!