Just sharing some of my inconsequential lunch conversations with you... RSS  

Saturday, October 20, 2007

More of the same with LINQ

As I've mentioned earlier I loved the Don Box's generator approach for reading lines from file. So I set my self to do something of a plagiarism and decided to do the same for directory traversal.

After a first disappointment (DirectoryInfo.GetDirectories and Path.DirectoryInfo were implemented on the class library years ago, so there was no reason do it with FindFirst/FindNext as I wanted), I had to realign my goal: I then decided to do some querying over directory contents.

Here's my first approach:



var query =
from files in RecursiveGetFiles(new DirectoryInfo(@"c:\projects"))
group files by files.Extension into g
orderby g.Count() descending
select new { Extension = g.Key, Count = g.Count() };

foreach (var extension in query)
{
Console.WriteLine(
"{0}: {1}"
,
extension.Extension,
extension.Count
);
}

<update>

Here's a sample over a project of mine:

.cs:

2591

.resx:

991

.aspx:

809

.ascx:

570

.dll:

365

.scc:

189

.txt:

119

.xml:

111


</update>


And there, it return the count of extensions for my projects directory. Those of you from the Unix community may feel terrible disappointed: a simple shell command of 2 or 3 utilities piped together would do the same. PERL, Ruby or Python would do it with ease. And you're right, in C# I even had to write the helpers:

public static IEnumerable<FileInfo> RecursiveGetFiles(DirectoryInfo directoryInfo)
{
foreach (DirectoryInfo subDirectoryInfo in RecursiveSubDirectories(directoryInfo))
{
foreach (FileInfo fileInfo in subDirectoryInfo.GetFiles())
{
yield return fileInfo;
}
}
}

static IEnumerable<DirectoryInfo> RecursiveSubDirectories(DirectoryInfo directoryInfo)
{
yield return directoryInfo;

foreach (DirectoryInfo directory in directoryInfo.GetDirectories())
{
foreach (DirectoryInfo subDirectory in RecursiveSubDirectories(directory))
{
yield return subDirectory;
}
}
}

But there's something kind of extraordinary going on here: the language I used for the querying was:
  1. (kind of) Natural - for those who know SQL
  2. Declarative / simple to write/read/mantain source
  3. Got the power of relational algebra
  4. As I didn't write the imperative 'how to', but the declarative 'what', the compiler didn't loss any optimization opportunity
  5. Strongly typing - and with that, statement completion
Probably not to others, but some of these are new to us on the .NET world.

On a future post I'll try and get read of the helpers - do it just using LINQ, on one expression. Maybe with some help from here.

2 comments:

André Cardoso said...

Very good and very natural the recursive and enumeration of the directories and files.

It looks simpler and reads better (great qualities for future maintenance), and the performance is best left to compiler optimizations, more cores and only as a last resource to the programmer :)

Maybe just compare it and time it against the imperative way to show the simplicity and view the performance penalty (or not).

Mário Romano said...

Thanks for your suggestion, André, but I'm afraid blogger.com post limit could be reached :P

Development Catharsis :: Copyright 2006 Mário Romano