Just sharing some of my inconsequential lunch conversations with you... RSS  

Tuesday, December 04, 2007

Amdahl’s Law

Amdahl’s Law in plain terms: the maximum achievable parallel speedup is limited by the amount of sequential code remaining. This makes sense: at some point, even with an infinite number of processors, the only work remaining is the sequential code: and its performance is limited by serial execution.

Taken from the 'Parallel Extensions to the .NET Framework' help, in 'Performance Tips': have realistic expectations!

Here are the tips:

  1. Target areas of your program where algorithms are computationally expensive and/or data sets are large (e.g. > 1000 elements). In such cases, there will likely be benefit to using parallelism.

  2. Parallelize outer loops, not inner loops. So, in this example:
              foreach (var c in customers)
    {
    var q = from s in sales
    where s.CustomerName = c.Name
    select s.Units * s.Produce.Price;
    c.Total = q.Sum();
    }
    It is often better to parallelize the outer foreach loop rather than the inner query:
              Parallel.ForEach(customers, c =>
    {
    var q = from s in sales
    where s.CustomerName = c.Name
    select s.Units * s.Produce.Price;
    c.Total = q.Sum();
    });
    Of course, there are exceptions such as with the following:
              for(int i=0; i<2; i++)
    for(int j=0; j=<1000000; j++) { ... }
    On a dual core, parallelizing the outer loop may be fine (if the inner loop doesn’t block), but on an 8-way just parallelizing the outer loop won't scale better than 2x due to the restriction on the outer loop's size. In such a case, it’s better to parallelize the inner, or both.

  3. Avoid requiring ordering where possible in PLINQ, either by explicitly turning off order preservation (off is the default) and avoiding operators that add ordering expectations. This leads to performance slow-downs. See the Ordering expectations section in the Parallelism Blockers topic for more information.

  4. Prefer having independent loop iterations and Task bodies to using synchronization. Adding synchronization to the bodies of parallel work will affect the scalability. In some cases, this is unavoidable in order to achieve correctness.

  5. Be conscious about the number of Tasks created explicitly. Although Tasks are lightweight, creating one incurs a few object allocations and some synchronization necessary to enqueue the Task into the scheduler’s queue. There are some examples in the documentation that show divide-and-conquer, recursive creation of Tasks. And in those examples, we’ve seen that at some point the overhead of Task object allocation may dominate the recursive algorithm in question.

  6. f. And lastly, the least technical advice of all of these: Have realistic expectations. There is a phenomenon known as Amdahl’s Law, which states (in plain terms) that the maximum achievable parallel speedup is limited by the amount of sequential code remaining. This makes sense: at some point, even with an infinite number of processors, the only work remaining is the sequential code: and its performance is limited by serial execution.

No comments:

Development Catharsis :: Copyright 2006 Mário Romano