Just sharing some of my inconsequential lunch conversations with you... RSS  

Friday, April 04, 2008

Yet another LINQ success story

I tend to overuse new technology just for fun. But here's a great real reason to use LINQ: I have a huge log file (EL logger) with some profiling data (among other logging). LINQ is a great way to solve this class of problems:

// sample: "Message: [00:00:34.2389312] Initialize "
var regex = new Regex(@"Message: \[(?<hour>\d\d):(?<minute>\d\d):(?<second>\d\d).(?<milisecond>\d\d\d\d)\d+\] (?<message>\w+)");

var groupedMessages =
from message in
from line in ReadLinesFromFile(@"D:\Logs\Link.MetaHeuristics.Business.log")
let match = regex.Match(line)
where match.Success
select new
Message = match.Groups["message"].Value,
Time = new TimeSpan(
group message by message.Message into g
select new {
Message = g.Key,
AverageTime = new TimeSpan((long)g.Average(x => x.Time.Ticks)),
MaxTime = new TimeSpan((long)g.Max(x => x.Time.Ticks)),
MinTime = new TimeSpan((long)g.Min(x => x.Time.Ticks)),
SumTime = new TimeSpan((long)g.Sum(x => x.Time.Ticks)),
CountOf = g.Count()

Console.WriteLine("{0,38} {1,5} {2,14} {3,14}", "Message", "Count", "SumTime", "AverageTime");
foreach (var item in groupedMessages.OrderBy(p => -p.SumTime.Ticks))
Console.WriteLine("{0,38} {1,5} {2,14} {3,14}", item.Message, item.CountOf, item.SumTime, item.AverageTime);

I'm using Don Box's ReadLinesFromFile helper - yes, the WideFinder C# naive implementation:

// LINQ-compatible streaming I/O helper
// taken from: http://www.pluralsight.com/blogs/dbox/archive/2007/10/09/48719.aspx

public static IEnumerable<string> ReadLinesFromFile(string filename)
using (StreamReader reader = new StreamReader(filename))
while (true)
string s = reader.ReadLine();

if (s == null)

yield return s;

It took about 11s to consolidate a 500MB log file on an old Pentium D. Not impressive, I know, but highly readable and easy to write. And last but not least: declarative enough to allow future PLINQ optimizations :)

PS: PLINQ tests will need some refactoring, as IO is probably the current bottleneck (CPU doesn't get to 40%...).

No comments:

Development Catharsis :: Copyright 2006 Mário Romano