Just sharing some of my inconsequential lunch conversations with you... RSS  

Friday, April 04, 2008

Yet another LINQ success story

I tend to overuse new technology just for fun. But here's a great real reason to use LINQ: I have a huge log file (EL logger) with some profiling data (among other logging). LINQ is a great way to solve this class of problems:

// sample: "Message: [00:00:34.2389312] Initialize "
var regex = new Regex(@"Message: \[(?<hour>\d\d):(?<minute>\d\d):(?<second>\d\d).(?<milisecond>\d\d\d\d)\d+\] (?<message>\w+)");

var groupedMessages =
from message in
from line in ReadLinesFromFile(@"D:\Logs\Link.MetaHeuristics.Business.log")
let match = regex.Match(line)
where match.Success
select new
{
Message = match.Groups["message"].Value,
Time = new TimeSpan(
0,
Convert.ToInt32(match.Groups["hour"].Value),
Convert.ToInt32(match.Groups["minute"].Value),
Convert.ToInt32(match.Groups["second"].Value),
Convert.ToInt32(match.Groups["milisecond"].Value)
)
}
group message by message.Message into g
select new {
Message = g.Key,
AverageTime = new TimeSpan((long)g.Average(x => x.Time.Ticks)),
MaxTime = new TimeSpan((long)g.Max(x => x.Time.Ticks)),
MinTime = new TimeSpan((long)g.Min(x => x.Time.Ticks)),
SumTime = new TimeSpan((long)g.Sum(x => x.Time.Ticks)),
CountOf = g.Count()
};

Console.WriteLine("");
Console.WriteLine("{0,38} {1,5} {2,14} {3,14}", "Message", "Count", "SumTime", "AverageTime");
foreach (var item in groupedMessages.OrderBy(p => -p.SumTime.Ticks))
{
Console.WriteLine("{0,38} {1,5} {2,14} {3,14}", item.Message, item.CountOf, item.SumTime, item.AverageTime);
}

I'm using Don Box's ReadLinesFromFile helper - yes, the WideFinder C# naive implementation:

// LINQ-compatible streaming I/O helper
// taken from: http://www.pluralsight.com/blogs/dbox/archive/2007/10/09/48719.aspx

public static IEnumerable<string> ReadLinesFromFile(string filename)
{
using (StreamReader reader = new StreamReader(filename))
{
while (true)
{
string s = reader.ReadLine();

if (s == null)
{
break;
}

yield return s;
}
}
}

It took about 11s to consolidate a 500MB log file on an old Pentium D. Not impressive, I know, but highly readable and easy to write. And last but not least: declarative enough to allow future PLINQ optimizations :)


PS: PLINQ tests will need some refactoring, as IO is probably the current bottleneck (CPU doesn't get to 40%...).

No comments:

Development Catharsis :: Copyright 2006 Mário Romano