C# LINQ 按时间段聚合和分组
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/8856266/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
LINQ aggregate and group by periods of time
提问by Jason Sturges
I'm trying to understand how LINQ can be used to group data by intervals of time; and then ideally aggregate each group.
我试图了解如何使用 LINQ 按时间间隔对数据进行分组;然后理想地聚合每个组。
Finding numerous examples with explicit date ranges, I'm trying to group by periods such as 5-minutes, 1-hour, 1-day.
找到许多具有明确日期范围的示例,我试图按 5 分钟、1 小时、1 天等时间段进行分组。
For example, I have a class that wraps a DateTime with a value:
例如,我有一个用一个值包装 DateTime 的类:
public class Sample
{
public DateTime timestamp;
public double value;
}
These observations are contained as a series in a List collection:
这些观察结果作为一系列包含在 List 集合中:
List<Sample> series;
So, to group by hourly periods of time and aggregate value by average, I'm trying to do something like:
因此,要按每小时的时间段分组并按平均值汇总值,我正在尝试执行以下操作:
var grouped = from s in series
group s by new TimeSpan(1, 0, 0) into g
select new { timestamp = g.Key, value = g.Average(s => s.value };
This is fundamentally flawed, as it groups the TimeSpan itself. I can't understand how to use the TimeSpan (or any data type representing an interval) in the query.
这从根本上是有缺陷的,因为它对 TimeSpan 本身进行了分组。我无法理解如何在查询中使用 TimeSpan(或任何表示间隔的数据类型)。
采纳答案by BrokenGlass
You could round the time stamp to the next boundary (i.e. down to the closest 5 minute boundary in the past) and use that as your grouping:
您可以将时间戳四舍五入到下一个边界(即向下到过去最近的 5 分钟边界)并将其用作分组:
var groups = series.GroupBy(x =>
{
var stamp = x.timestamp;
stamp = stamp.AddMinutes(-(stamp.Minute % 5));
stamp = stamp.AddMilliseconds(-stamp.Millisecond - 1000 * stamp.Second);
return stamp;
})
.Select(g => new { TimeStamp = g.Key, Value = g.Average(s => s.value) })
.ToList();
Above achieves that by using a modified time stamp in the grouping, which sets the minutes to the previous 5 minute boundary and removes the seconds and milliseconds. The same approach of course can be used for other time periods, i.e. hours and days.
以上通过在分组中使用修改后的时间戳来实现这一点,它将分钟设置为前 5 分钟的边界并删除秒和毫秒。当然,同样的方法可以用于其他时间段,即小时和天。
Edit:
编辑:
Based on this made up sample input:
基于这个组成的样本输入:
var series = new List<Sample>();
series.Add(new Sample() { timestamp = DateTime.Now.AddMinutes(3) });
series.Add(new Sample() { timestamp = DateTime.Now.AddMinutes(4) });
series.Add(new Sample() { timestamp = DateTime.Now.AddMinutes(5) });
series.Add(new Sample() { timestamp = DateTime.Now.AddMinutes(6) });
series.Add(new Sample() { timestamp = DateTime.Now.AddMinutes(7) });
series.Add(new Sample() { timestamp = DateTime.Now.AddMinutes(15) });
3 groups were produced for me, one with grouping timestamp 3:05, one with 3:10 and one with 3:20 pm (your results may vary based on current time).
为我制作了 3 个组,一组的分组时间戳为 3:05,一组为 3:10,另一组为下午 3:20(您的结果可能因当前时间而异)。
回答by Michael
For grouping by hour you need to group by the hour part of your timestamp which could be done as so:
对于按小时分组,您需要按时间戳的小时部分分组,可以这样做:
var groups = from s in series
let groupKey = new DateTime(s.timestamp.Year, s.timestamp.Month, s.timestamp.Day, s.timestamp.Hour, 0, 0)
group s by groupKey into g select new
{
TimeStamp = g.Key,
Value = g.Average(a=>a.value)
};
回答by Duane McKinney
I'm very late to the game on this one, but I came accross this while searching for something else, and I thought i had a better way.
我在这个游戏中已经很晚了,但是我在寻找其他东西时遇到了这个问题,我认为我有更好的方法。
series.GroupBy (s => s.timestamp.Ticks / TimeSpan.FromHours(1).Ticks)
.Select (s => new {
series = s
,timestamp = s.First ().timestamp
,average = s.Average (x => x.value )
}).Dump();
Here is a sample linqpad program so you can validate and test
这是一个示例 linqpad 程序,因此您可以验证和测试
void Main()
{
List<Sample> series = new List<Sample>();
Random random = new Random(DateTime.Now.Millisecond);
for (DateTime i = DateTime.Now.AddDays(-5); i < DateTime.Now; i += TimeSpan.FromMinutes(1))
{
series.Add(new UserQuery.Sample(){ timestamp = i, value = random.NextDouble() * 100 });
}
//series.Dump();
series.GroupBy (s => s.timestamp.Ticks / TimeSpan.FromHours(1).Ticks)
.Select (s => new {
series = s
,timestamp = s.First ().timestamp
,average = s.Average (x => x.value )
}).Dump();
}
// Define other methods and classes here
public class Sample
{
public DateTime timestamp;
public double value;
}
回答by MemeDeveloper
I'd suggest using new DateTime()to avoidany issues with sub milliseconddifferences
我建议使用new DateTime()来避免任何亚毫秒差异的 问题
var versionsGroupedByRoundedTimeAndAuthor = db.Versions.GroupBy(g =>
new
{
UserID = g.Author.ID,
Time = RoundUp(g.Timestamp, TimeSpan.FromMinutes(2))
});
With
和
private DateTime RoundUp(DateTime dt, TimeSpan d)
{
return new DateTime(((dt.Ticks + d.Ticks - 1) / d.Ticks) * d.Ticks);
}
N.B. I am here grouping by Author.ID as well as the rounded TimeStamp.
注意我在这里按 Author.ID 以及四舍五入的时间戳进行分组。
RoundUp function taken from @dtb answer here https://stackoverflow.com/a/7029464/661584
取自 @dtb 的 RoundUp 函数在这里回答https://stackoverflow.com/a/7029464/661584
Read about how equality down to the millisecond doesn't always mean equality here Why does this unit test fail when testing DateTime equality?
在这里阅读有关毫秒级的相等性并不总是意味着相等性的信息为什么在测试 DateTime 相等性时此单元测试会失败?
回答by Jan
Even though I am really late, here are my 2 cents:
即使我真的迟到了,这是我的 2 美分:
I wanted to Round() the time values down AND up in 5 minute intervals:
我想以 5 分钟为间隔将时间值上下舍入:
10:31 --> 10:30
10:33 --> 10:35
10:36 --> 10:35
This can be achieved by converting to TimeSpan.Tick and converting back to DateTime and using Math.Round():
这可以通过转换为 TimeSpan.Tick 并转换回 DateTime 并使用 Math.Round() 来实现:
public DateTime GetShiftedTimeStamp(DateTime timeStamp, int minutes)
{
return
new DateTime(
Convert.ToInt64(
Math.Round(timeStamp.Ticks / (decimal)TimeSpan.FromMinutes(minutes).Ticks, 0, MidpointRounding.AwayFromZero)
* TimeSpan.FromMinutes(minutes).Ticks));
}
The shiftedTimeStamp can be used in linq grouping as shown above.
shiftTimeStamp 可用于 linq 分组,如上所示。
回答by Migit
I improved on BrokenGlass's answer by making it more generic and added safeguards. With his current answer, if you chose an interval of 9, it will not do what you'd expect. The same goes for any number 60 is not divisible by. For this example, I'm using 9 and starting at midnight (0:00).
我改进了 BrokenGlass 的答案,使其更通用并增加了保护措施。根据他目前的回答,如果您选择 9 的间隔,则不会达到您的预期。这同样适用于任何不能被 整除的数字 60。对于此示例,我使用 9 并从午夜 (0:00) 开始。
- Everything from 0:00 to 0:08.999 will be put into a group of 0:00 as you'd expect. It will keep doing this until you get to the grouping that starts at 0:54.
- At 0:54, it will only group things from 0:54 to 0:59.999 instead of going up to 01:03.999.
- 如您所料,从 0:00 到 0:08.999 的所有内容都将被放入一组 0:00 中。它将继续这样做,直到您到达从 0:54 开始的分组。
- 在 0:54,它只会对从 0:54 到 0:59.999 的内容进行分组,而不是上升到 01:03.999。
For me, this is a massive issue.
对我来说,这是一个很大的问题。
I'm not sure how to fix that, but you can add safeguards.
Changes:
我不确定如何解决这个问题,但您可以添加保护措施。
变化:
- Any minute where 60 % [interval] equals 0 will be an acceptable interval. The if statements below safeguard this.
Hour intervals work as well.
double minIntervalAsDouble = Convert.ToDouble(minInterval); if (minIntervalAsDouble <= 0) { string message = "minInterval must be a positive number, exiting"; Log.getInstance().Info(message); throw new Exception(message); } else if (minIntervalAsDouble < 60.0 && 60.0 % minIntervalAsDouble != 0) { string message = "60 must be divisible by minInterval...exiting"; Log.getInstance().Info(message); throw new Exception(message); } else if (minIntervalAsDouble >= 60.0 && (24.0 % (minIntervalAsDouble / 60.0)) != 0 && (24.0 % (minIntervalAsDouble / 60.0) != 24.0)) { //hour part must be divisible... string message = "If minInterval is greater than 60, 24 must be divisible by minInterval/60 (hour value)...exiting"; Log.getInstance().Info(message); throw new Exception(message); } var groups = datas.GroupBy(x => { if (minInterval < 60) { var stamp = x.Created; stamp = stamp.AddMinutes(-(stamp.Minute % minInterval)); stamp = stamp.AddMilliseconds(-stamp.Millisecond); stamp = stamp.AddSeconds(-stamp.Second); return stamp; } else { var stamp = x.Created; int hourValue = minInterval / 60; stamp = stamp.AddHours(-(stamp.Hour % hourValue)); stamp = stamp.AddMilliseconds(-stamp.Millisecond); stamp = stamp.AddSeconds(-stamp.Second); stamp = stamp.AddMinutes(-stamp.Minute); return stamp; } }).Select(o => new { o.Key, min = o.Min(f=>f.Created), max = o.Max(f=>f.Created), o }).ToList();
- 60 % [间隔] 等于 0 的任何分钟都是可接受的间隔。下面的 if 语句保护了这一点。
小时间隔也有效。
double minIntervalAsDouble = Convert.ToDouble(minInterval); if (minIntervalAsDouble <= 0) { string message = "minInterval must be a positive number, exiting"; Log.getInstance().Info(message); throw new Exception(message); } else if (minIntervalAsDouble < 60.0 && 60.0 % minIntervalAsDouble != 0) { string message = "60 must be divisible by minInterval...exiting"; Log.getInstance().Info(message); throw new Exception(message); } else if (minIntervalAsDouble >= 60.0 && (24.0 % (minIntervalAsDouble / 60.0)) != 0 && (24.0 % (minIntervalAsDouble / 60.0) != 24.0)) { //hour part must be divisible... string message = "If minInterval is greater than 60, 24 must be divisible by minInterval/60 (hour value)...exiting"; Log.getInstance().Info(message); throw new Exception(message); } var groups = datas.GroupBy(x => { if (minInterval < 60) { var stamp = x.Created; stamp = stamp.AddMinutes(-(stamp.Minute % minInterval)); stamp = stamp.AddMilliseconds(-stamp.Millisecond); stamp = stamp.AddSeconds(-stamp.Second); return stamp; } else { var stamp = x.Created; int hourValue = minInterval / 60; stamp = stamp.AddHours(-(stamp.Hour % hourValue)); stamp = stamp.AddMilliseconds(-stamp.Millisecond); stamp = stamp.AddSeconds(-stamp.Second); stamp = stamp.AddMinutes(-stamp.Minute); return stamp; } }).Select(o => new { o.Key, min = o.Min(f=>f.Created), max = o.Max(f=>f.Created), o }).ToList();
Put whatever you'd like in the select statement! I put in min/max because it was easier to test it.
在 select 语句中放入您想要的任何内容!我输入了最小/最大,因为它更容易测试。
回答by vipes
I know this doesn't directly answer the question, but I was googling around looking for a very similar solution to aggregate candle data for stocks / crypto currencies from a smaller minute period to a higher minute period (5, 10, 15, 30). You can't simply go back from the current minute taking X at a time, as the timestamps for the aggregated periods won't be consistent. You also have to watch out that there's enough data at the start and end of the list to populate a full candlestick of the larger period. Given that, the solution I came up with was as follows. (It assumes that the candles for the smaller period, as indicated by rawPeriod, are sorted by ascending Timestamp.)
我知道这并不能直接回答问题,但我一直在寻找一个非常相似的解决方案来将股票/加密货币的蜡烛数据从较小的分钟时间段聚合到较高的分钟时间段(5、10、15、30) . 您不能简单地从当前分钟返回 X,因为聚合时间段的时间戳将不一致。您还必须注意列表的开头和结尾处有足够的数据来填充较大时期的完整烛台。鉴于此,我想出的解决方案如下。(它假设较小时期的蜡烛,如 rawPeriod 所示,按时间戳升序排序。)
public class Candle
{
public long Id { get; set; }
public Period Period { get; set; }
public DateTime Timestamp { get; set; }
public double High { get; set; }
public double Low { get; set; }
public double Open { get; set; }
public double Close { get; set; }
public double BuyVolume { get; set; }
public double SellVolume { get; set; }
}
public enum Period
{
Minute = 1,
FiveMinutes = 5,
QuarterOfAnHour = 15,
HalfAnHour = 30
}
private List<Candle> AggregateCandlesIntoRequestedTimePeriod(Period rawPeriod, Period requestedPeriod, List<Candle> candles)
{
if (rawPeriod != requestedPeriod)
{
int rawPeriodDivisor = (int) requestedPeriod;
candles = candles
.GroupBy(g => new { TimeBoundary = new DateTime(g.Timestamp.Year, g.Timestamp.Month, g.Timestamp.Day, g.Timestamp.Hour, (g.Timestamp.Minute / rawPeriodDivisor) * rawPeriodDivisor , 0) })
.Where(g => g.Count() == rawPeriodDivisor )
.Select(s => new Candle
{
Period = requestedPeriod,
Timestamp = s.Key.TimeBoundary,
High = s.Max(z => z.High),
Low = s.Min(z => z.Low),
Open = s.First().Open,
Close = s.Last().Close,
BuyVolume = s.Sum(z => z.BuyVolume),
SellVolume = s.Sum(z => z.SellVolume),
})
.OrderBy(o => o.Timestamp)
.ToList();
}
return candles;
}

