Monday, June 27, 2011

Gestalt-Driven Development: Memes, Alleles and Architectural Fitness

Claiming that Evolution is merely a “theory” is akin to claiming the same about Gravity. Evolution is a beautiful and elegant idea, which also happens to precisely describe and predict how living organisms change and diversify over time in Meatspace. It is also a fabulously useful metaphor outside the realm of biology, and I have appropriated it, along with Memetics as cornerstones of Gestalt-Driven Development.

Note: I have only provided definitions of new terms I have introduced or terms that I have subtly changed. For formal definitions of common terms follow the links to the appropriate entries in the Encyclopaedia Galactica.

A Software Architect typically has a number of products, technologies, approaches, patterns, practices, processes, methodologies, frameworks, libraries, algorithms and languages in their Software Architect’s “toolbox”. All of the aforementioned have a lifecycle; they are added  as new software paradigms emerge or become fashionable, refined and expanded as they mature, then used less as they go out of fashion, and finally relegated to the bottom of the toolbox or discarded entirely when they are no longer relevant. The contents of this toolbox can be thought of as Architectural Memes.

Architectural Memes that are repetitively used together form an Architectural Meme Complex or Memeplex. A simple example of a Memeplex would be C#, WF, WCF, OOP, Windows Server AppFabric, SQL Server and Visual Studio. Another would be HTML5, SVG, CSS3, DRY, JavaScript, jQuery, and AJAX.

Architectural Memes and Memeplexes will be more or less appropriate for a system given how well they meet the requirements of that system or part thereof. The measure of how well a given Architectural Meme or Memeplex meets the requirements of a system, sub-system or component is measured against the Fitness Criteria for that system, sub-system or component. The Fitness Criteria represent the requirements of the software system, expressed in such a way that the relative fitness of any given Architectural Meme or Memeplex can be quantified.

Obviously there exist overlapping or competitive Architectural Memes and Memeplexes, i.e. they can be used to address the same requirements. These are Meme Alleles. Meme Alleles are Memes or Memeplexes that can be used to meet similar requirements, and who’s Architectural Fitness will be evaluated using the same set of Fitness Criteria. An over-simplified example of Architectural Alleles would a Relational Database (RDBMS) implementation and a NoSQL database implementation given the following fitness criterion:  Enterprise-scale storage and retrieval of structured data. RDBMS products from different companies would also be Memetic Alleles in this case.

Note: A Gene and a Meme are not entirely analogous. A Meme is technically analogous to an instance of a Gene, an Allele, rather than the Gene itself. Hopefully scholars of Genetics and Memetics will forgive the minor poetic licence that I am taking with my definition of Meme Allele.

As I mentioned before Architectural Memes are not limited to technologies; they include processes and practices e.g. Waterfall and Agile, which will have better or worse fitness depending on the nature of the software being developed, management or the client’s ability/willingness to embrace the uncertainty innate in software development, the geographic and cultural distribution of the team, etc. Architectural Memes also include approaches, patterns and frameworks, e.g.  OOP, SOLID, GRASP, ESB, SOA, MVC, MVP, MVVM, BDD, FDD, DDD, TDD, TOGAF, ITIL, and many, many more wonderfully hermetic acronyms.

In Gestalt-Driven Development it is the task of the Software Architect (and the whole software engineering team) to identify the Memes and Meme Complexes that might be applicable to the system, ascertain the fitness of those Memes to derive the set of appropriate building blocks, and then compose those building blocks into a System Architecture, or Phenotype. Obviously this requires that the requirements have been transformed and refactored into one or multiple sets of Fitness Criteria. This is a shared task that starts with requirements analysis and ends with the engineering team.  

Using Gestalt-Driven Development and the metaphor outlined above a Software Architect is able to test and verify the suitability of an architecture to the requirements of the system early in the development process and continuously evolve that architecture throughout the process as requirements mutate and emerge.

Note: Obviously random mutation of code and data are not a good thing in all but the rarest cases. GDD proposes a high level approach to software development and architecture, not the use of genetic algorithms (though it does not preclude their use).

I am still in the process of formalizing GDD so I will definitely post a lot more about it in the future.

Thursday, June 23, 2011

Adventures in Over-Engineering: Deterministic COM Object Lifetime Management in VSTO Office Add-Ins

Ah, how wonderfully easy Microsoft makes it to interoperate with “legacy” COM components from managed code (.Net). So easy in fact that one might completely overlook the fact that an apparently managed API is actually just a thin shim over a COM API. Unfortunately, one cannot safely ignore the COM-ness of these APIs; some idiosyncrasies “bleed” into your code, particularly those related to memory allocation and lifetime management. The Visual Studio Tools for Office (VSTO) add-in APIs are a good example of this.

Earlier this year I wrote a fairly simple status reporting application using InfoPath, SharePoint and Excel. Users fill out weekly InfoPath status report forms, which they then submit to SharePoint. A manager can then open one of the InfoPath forms from the Windows Shell (using SharePoint’s WebDAV capabilities) and use InfoPath’s built-in “Export to Excel” capabilities to do a bulk import of the status report XML data into an Excel workbook. I wrote approximately 800 lines of VBA (yes, I did just say “VBA”) code to sort, filter, format and analyze the imported Excel data so that a manager can visually identify areas of risk.

VBA is not the most modern or elegant language, but the tools are mature and it gets the job done for simple applications. Here is an example of a VBA function that modifies the color of the “Priority” field of a Activity record:

Sub ColorPriority(SheetName As String)
    'Critical: Red
    'High: Orange
    Worksheets(SheetName).Select
    nCol = Col("Priority")
    For nRow = 2 To ActiveSheet.UsedRange.Rows.Count
        Select Case Cells(nRow, nCol).Value
            Case "Critical":
                Cells(nRow, nCol).Interior.Color = RGB(255, 0, 0)
                Cells(nRow, nCol).AddComment ("Priority is Critical. Typically this indicates a status item that needs follow up.")
            Case "High":
                Cells(nRow, nCol).Interior.Color = RGB(255, 102, 0)
                Cells(nRow, nCol).AddComment ("Priority is High. Typically this indicates that the status of this item should be tracked.")
        End Select
    Next nRow
End Sub

Note: The Col() function is a custom function that I wrote that simply looks up the column number for a given column name in an Excel Table. I still find it strange that a built-in function for this purpose is not provided.

The above code is pretty simple; it expresses the intent with no need for any “plumbing” code.

I then decided that I wanted to make this into a fully-fledged system that enables Excel to connect directly to a SharePoint Forms Library, get a list of submitted status reports, do a custom import of the form XML data, and then process, analyze and format the generated Excel Workbook. Since I also still routinely fall for the “Newer, Sexier Technology MUST Be Better” trap, I decided I would write the whole thing in .Net 4.0. I used the following for the add-in:

  • C# for the main add-in code and worksheet processor
  • C# and WPF for the add-in controls library
  • F# and the SharePoint Foundation Client Object Model for the SharePoint Client library
  • F# for the InfoPath XML parser library
  • IronPython for a test harness

I started by porting all of the Excel formatting code from VBA to C#. It was dead easy; I simply cut and paste the VBA into Visual Studio as C# comments and then transformed the code in place. As I was doing the port a little voice kept saying “It cannot be this easy; these are really COM APIs after all, and where there is COM there is trouble.” I chose to placate that little voice in the name of expediency by promising that I would come back later and harden and optimize the code, which would include looking into any COM-related issues.

So the C# for the function above VBA become:

private void ColorPriority(string sheetName)
{
     //Critical: Red
     //High: Orange
     _excelWorkBook.Sheets[sheetName].Select();
     var activeSheet = _excelWorkBook.ActiveSheet;
     var nCol = Col("Priority");
     for(var nRow = 2; nRow <= activeSheet.UsedRange.Rows.Count; nRow++)
     {
         var cell = activeSheet.Cells[nRow, nCol];
         switch((string)cell.Value)
         {
             case "Critical":
                 cell.Interior.Color = Red;
                 AddFormattedComment(cell, "Priority is Critical. Typically this indicates a status item that needs follow up.");
                 break;
             case "High":
                 cell.Interior.Color = Orange;
                 AddFormattedComment(cell, "Priority is High. Typically this indicates that the status of this item should be tracked.");
                 break;
         }
     }
 }

Other than the absence of some syntactic sugar provided by VBA, e.g. automatically adding obvious appropriately-named variables to the current scope, the C# code above looks pretty similar to the VBA.

After porting all of the VBA code and developing the rest of the add-in it came time to keep my promise to the little voice by hardening and optimizing the code. I added Asserts that I had missed (Code Contracts would probably have been overkill), error and exception handling, found a number of opportunities for improving the performance by doing operations concurrently, e.g file retrieval and XML parsing, and a number of other performance related optimizations. I then moved onto to looking for COM Interop related issues.

Note: In a effort to improve the overall performance of the add-in I did find a best practice gem. In my original code I was adding individual cells and rows to excel. I found some guidance suggesting that I should build up [multidimensional] arrays that represent the cells and then set the value of an entire range to the array. It significantly improved the overall performance of the add-in. As an example it improved the performance of the InfoPath forms import by a whopping 600%!

Before I go into any details on the solution let’s look at where COM-related issues might show up in the above code. Each of the red rectangles in the following diagram represents a Runtime Callable Wrapper (RCW) which is a .Net object that is a proxy for an underlying COM object. The lifecycle of the underlying COM object is linked to the lifecycle of the RCW.  Note that the RCWs in the body of the for loop will be allocated through every execution of the loop. It is also very important to be aware that by simply using the ‘.’ operator one is inadvertently causing additional COM objects to be created; COM objects whose lifetimes are not under the developer’s explicit control by default.

code

Given that I live in the Internet era of software development the first thing that I did was to do a search online for common COM-related issues in VSTO add-ins. I found a few, but the most common one was the apparent need for deterministic clean-up of COM objects that are instantiated by the managed Office Add-in APIs.

The .Net COM Interop implementation is designed in such a way that it will ultimately take care of releasing the underlying COM object associated with a Runtime Callable Wrapper (RCW), when the RCW is no longer “reachable”, i.e. it has gone out of scope or has been assigned null (in C#). I say “ultimately” because the COM object is released during a lazy process called “finalization”, which may require multiple GCs before it occurs for a given object. Finalization happens on a separate thread and there are no guarantees in what order objects will be finalized. When an RCW is finalized a call to System.Runtime.InteropServices.Marshal.ReleaseComObject is made on the RCW which calls Release on the underlying COM object, which decrements its reference count and probably causes the resources associated with it to be freed. If a developer wants to explicitly release the COM object associated with an RCW they have to explicitly call ReleaseComObject. They also have to have a reference to the object they want to call it on of course. In the code above there are multiple RCWs for which there are no explicit references, i.e. local variables, on which ReleaseComObject can be called. One might think that one could just call it on the previously used object, e.g. "Marshal.ReleaseComObject(activeSheet.UsedRange.Rows), but there is no guarantee that a new RCW won’t be created just so that one can call ReleaseComObject on it. In the above example, if I call ReleaseComObject on only those RCWs that I have assigned to local variables, some COM objects will not be cleaned up until their RCW are finalized.

So, if the .NET Common Language Runtime (CLR) takes care of this for me then why on earth would I ever want to do this myself? Other than simply having more control over the lifetime of the unmanaged resources being used by your application, there are also potential bugs that can be introduced if COM Objects are not deterministically cleaned up. One example that I have seen is that Excel will not quit until all COM objects have been cleaned up, which can result in Excel hanging if it is programmatically shut down but the RCWs are not finalized in a specific order. The common wisdom seems to be that one should always call ReleaseComObject on all RCWs that do not “escape” the scope in which they are declared, in the opposite order in which they were instantiated(despite MSDN suggesting that one should only call ReleaseComObject “if it is absolutely required”). This is typically done in a finally block and requires that local variables are explicitly declared for all RCWs.

   
So following this guidance I would have to re-write the code above as follows:

private void ColorPriority(string sheetName)
{
       
    Sheets sheets = null;
    Worksheet sheet = null;    
    Worksheet activeSheet = null;
    Range usedRange = null;
    Range rows = null;
   
try
   
{
        sheets
= _excelWorkBook.Sheets;
        sheet
= sheets[sheetName];
        sheet
.Select();
        activeSheet
= _excelWorkBook.ActiveSheet;
       
var nCol = Col("Priority");
        usedRange = activeSheet.UsedRange;
        rows
= usedRange.Rows;
       
for (var nRow = 2; nRow <= rows.Count; nRow++)       
        {
          
         
   Range cells = null;            
            
Range cell = null;           
            
Interior interior = null;                
            
try
             {
                 cells = activeSheet.Cells;
                 cell = cells[nRow, nCol];
                 interior = cell.Interior;
                 switch ((string)cell.Value)
                 {
                     case "Critical":
                         interior.Color = Red;
                         //...
                         break;
                     case "High":
                         interior.Color = Orange;
                         //...
                         break;
                 }
             }
             finally
             {
                 if(interior != null) Marshal.ReleaseComObject(interior);
                 if(cell != null) Marshal.ReleaseComObject(cell);
                 if(cells != null) Marshal.ReleaseComObject(cells);
             }
         }
     }
     finally
     {
         if(rows != null) Marshal.ReleaseComObject(rows);
         if(usedRange != null) Marshal.ReleaseComObject(usedRange);
         if(activeSheet != null) Marshal.ReleaseComObject(activeSheet);
         if(sheet != null) Marshal.ReleaseComObject(sheet);
         if(sheets != null) Marshal.ReleaseComObject(sheets);
     }
}

Wow, that is far more ugly and verbose code than the original VBA! You may also note that this code introduces a potential bug; is the COM object referenced by activeSheet and sheet actually the same COM object? If they are a System.Runtime.InteropServices.InvalidComObjectException will be potentially thrown when I try to release it a second time. What would be ideal is if I could still deterministically manage the lifetime of the COM objects while maintaining the simplicity of the original VBA code.

The solution I came up with was to create a COM Object Manager class that takes care of all this for me. My solution makes use of the Dispose Pattern, the “using” statement, and C# operator overloading. Using my ComObjectManager my code looks like the following:

private void ColorPriority(string sheetName)
{
     //Critical: Red
     //High: Orange
     using(var cm = new ExcelComObjectManager())
     {
         (cm < (cm < _excelWorkBook.Sheets)[sheetName]).Select();
         var activeSheet = cm < _excelWorkBook.ActiveSheet;
         var nCol = ColumnIndex("Priority");
         for(var nRow = 2; nRow <= (cm < (cm < activeSheet.UsedRange).Rows).Count; nRow++)
         {
             var cell = cm < (cm < activeSheet.Cells)[nRow, nCol];
             switch((string)cell.Value)
             {
                 case "Critical":
                     (cm < cell.Interior).Color = Red;
                     AddFormattedComment(cell, "Priority is Critical. Typically this indicates a status item that needs follow up.");
                     break;
                 case "High":
                     (cm < cell.Interior).Color = Orange;
                     AddFormattedComment(cell, "Priority is High. Typically this indicates that the status of this item should be tracked.");
                     break;
             }
         }
     }
}

Yes, this does introduce some new syntax which requires the use of the ‘<’ operator and parentheses, but the code mostly maintains its original simplicity (in my opinion anyway).

Note: The above example demonstrates a potential issue; the example only uses one ComObjectManager for the entire function, so through every iteration of the loop objects are being added to the manager, which effectively “roots” them outside of their original scope. These objects may actually have been finalized earlier if they had not been added to the manager. The solution would be to instantiate a ComObjectManager inside the body of the loop. This is not a rule of thumb though, since there is a significant cost associated with creating the manager and then disposing it. This is something that needs to be tuned depending on the scenario.

 
The following is a simplified version of the ComObjectManager class. Note: ExcelComObjectManager inherits from ComObjectManager and only adds Excel-specific debugging capabilities.

public class ComObjectManager : IDisposable
{
     private Stack<dynamic> _objects;
    
    
public ComObjectManager()
     {
         _objects = new Stack<dynamic>();
     }
    
     private
bool _disposed = false;
    
     ~
ComObjectManager()
     {
         DoDispose();
     }
    
     public
void Dispose()
     {
         DoDispose();
         GC.SuppressFinalize(this);
     }
    
     protected
virtual void DoDispose()
     {
         if(!this._disposed)
         {
             while(_objects.Count > 0)
             {
                 var obj = _objects.Pop();
                 if(obj == null) continue;
                 Marshal.ReleaseComObject(obj);
             }
            
this._disposed = true;
         }
     }
    
     public
static dynamic operator <(ComObjectManager manager, dynamic comObject)
     {
         if(comObject == null) return null;
     
   if(manager._objects.Contains(comObject))
         {
             return comObject;
         }
         manager._objects.Push(comObject);
         return comObject;
     }
    
     public
static dynamic operator >(ComObjectManager manager, dynamic comObject)
     {
         throw new NotImplementedException();
     }
}

The use of the ComObjectManager as implemented above has significant performance penalties. As a performance optimization in my full implementation I have also added code that does the clean up asynchronously using the new System.Thread.Task API. It does improve the performance of the clean-up but the overhead of using this mechanism is still high. One can further improve the robustness of the code by testing to see that the object that is being added is indeed a COM object, since it may happen that a non-COM object is inadvertently added. This however requires a call to Marshal.IsComObject each time an object is added, which is super-expensive and is not worth the performance hit in my opinion.

So why the title of this post? After doing all of this work I came to the conclusion that there is no real utility in using this mechanism in my add-in. The typical use case is that I open Excel, invoke the add-in’s main function, save the resulting worksheet, and then close Excel. How long it takes to import and format the data is far more important than how long it might take to clean-up, and ultimately is irrelevant given that I close Excel when I am done. There may be cases where I would need to use it, e.g. if Excel remained open and an add-in was running continuously on very large worksheets, or in other cases where COM Interop is being used heavily. In this case however this was ultimately an exercise in over-engineering.

It was fun though!

Monday, June 20, 2011

A Taoist Definition of Software Architecture

The question that I am asked most often in job interviews is “What is a Software Architect?” I am never sure whether I am being asked this question so that the interviewer can compare my answer with their own understanding of the role, or because they don’t really have a clue and would like someone to explain it to them. I assume the former, though I have sometimes suspected the latter. Since I have had to answer this question so often, one would imagine that I have cogitated a suitable job-winning answer. One would imagine rightly (though it has not always won me the job). 

So what is a Software Architect then? Here are some possible definitions:

  • A Senior Senior Software Engineer.
  • A Software Engineer who can reconcile the business and technical requirements of a complex software system.
  • A Software Engineer who manages a team and owns a budget i.e. knows how to drive Microsoft Project.
  • An accomplished Software Engineer who has grown weary of writing code (Oh, say it isn’t so!).
  • A Software Engineer with an IASA, TOGAF, FEAC or similar certification.
  • A Senior Software Engineer who has a SOLID GRASP of reading-edge TLA’s including OOP, SOA, AOP, UML, ALM, BDD, DDD, TDD and RUP.

Though there is some truth, and obviously a little humour, in the above definitions, I don’t think any of them hit the mark. Note that all of the above definitions contain the words “Software Engineer”; I believe strongly that Software Architects should be accomplished Software Engineers.

In my humble opinion the key attribute that differentiates the Software Architect from the Senior Software Engineer is the number of contexts that they take into consideration when designing software solutions and the processes to deliver and operate that software.

Or if you prefer a more Taoist definition:

Software Architecture is the process of designing and developing software in such a way that it will remain in harmony with the significant contexts within which it is created and runs over its entire lifetime.

Typically, when software is designed the contexts that are considered are the functional and non-functional requirements, and the financial and schedule constraints. The requirements are captured either upfront or over the course of the development process in a list of feature narratives, use cases, user stories, epics, diagrams or formal specification language artefacts; and typically only reflect the Technical and Business [Process] Contexts of the system.

The Financial Context is often expressed in the contract or project plan and is limited to an estimate of the software development cost and duration, an estimate of the infrastructure cost, and an, often limited, estimate of the operational cost of the system.

I say “limited” because the financial model often does not include how the operational, maintenance and support costs will change over the lifetime of the software. This includes how skills, and demand for resources with those skills, will change over time, e.g. Microsoft solutions that used C++ and COM half a decade ago, in the belief that .Net was a passing fad, or could not deliver the required performance, are having a hard time finding skilled developers with domain expertise to maintain and extend their code. Available skilled resources with C++ and COM skills are getting very expensive.

Outside of operating system and game development C++ is fast becoming the “new” COBOL, or more precisely what COBOL was at the turn of the century. That said, in light of some recent speculation about Windows 8 development, C# might be about to become the new C++, but I digress.

As an[other] aside, many requirements that typically make it into the non-functional category, including performance, scalability, security, privacy, maintainability, etc., clearly belong in the functional category because they all require architecture in the small; i.e you have to code for them at the lowest level; you can’t successfully bolt them on later. I am not sure I even believe in non-functional requirements anymore; if it is non-functional then it is not a requirement at all.

The context outlined above is already a lot of context. Surely given all this, one can design and develop usable, high quality software, that meets the clients requirements? Perhaps, with a little bit of luck and a team of stars; but given that the number of software development projects that still fail despite taking all this context into consideration (and all the advances that have been made in software development practices, processes, tools and technology), the odds are stacked against a successful outcome. So what contexts are not being considered that could turn failure into success?

I posit that it is the Human Context; or more accurately, the Social, Political and Cultural contexts in which the software has to exist. It is these often ignored contexts, that I actually believe are the most important; they should subsume all other thinking about how the software should be designed and developed.

Let me summarise and describe the contexts that  I believe should be considered when designing software (in priority order).

The Social Context

Software is ultimately about improving people’s lives. One should always know whose lives are going to be affected by the software, and how. This also includes how the development of the software is going to affect people’s lives. This aspect is often overlooked, even by the most enlightened software architects; the software industry is infamous for eating its own young in an effort to deliver on time, on budget, to spec.

This also goes way beyond usability, user acceptance, and project governance. This is a “people first” approach to software design and development, from the people who develop the software and the various stakeholders, to the end users, and everyone in between.

The Political Context 

How often is it that a project fails because its executive sponsor is either promoted, moves to another division, or leaves the company? Was the project initiated to primarily further the political aims of the sponsor or the corporate cabal of which she is a member? What happens if and when the executive sponsor achieves her actual goal, or looses her influence or position to a rival? Are there any stakeholders that have a vested political interest in a project failing? It may turn out that the client/customer is optimizing for a political outcome rather than the successful delivery of the solution. If this is the case one should modify ones architecture and execution plan to support their goal (assuming of course that it is not morally reprehensible!).

I have observed this enough that I now endeavour to ascertain the often subtle and hidden political driving forces on a project, and factor those into the design, development, and operation of the software. It always helps to optimize for the same thing that the [actual] client/customer is optimizing for.

Note that this is technically a sub-context of the Social Context above, but is important enough that it should be considered as a context in its own right.

The Cultural Context

This is best illustrated by an anecdote. In the late 90s I was a Development Consultant on a massive web portal project for a European telecommunications company. Despite the system’s complexity and the available budget, the project was poorly managed and there was no significant architecture for the system, and as a result the project was haemorrhaging cash. I could not understand why the client was allowing this to happen, until the executive sponsor took me aside and explained it to me; what they really required was the appearance that they were going to be first to market with a new technology. The quality of the solution, and its longevity, were mostly irrelevant. In the culture that the solution was being developed and deployed, it was the perception that was most important, not the actual product itself. Once I understood this I could optimize appropriately.

I have worked on software projects in Africa, Europe, North and South America, Australia and Asia and I have come to recognize the significant role that culture plays in the success or failure of a project. Ignore it at your own peril!

Note again that this is technically a sub-context of the Social Context above, but is important enough that it should be considered as a context in its own right.

The Financial Context

This is often a context that gets a lot of consideration, but as I have already alluded to above, it should cover the entire lifetime of the software.

The Business Context

This context represents the superset of Business Requirements. Nothing new here, but I continue to be amazed at how hard it is to extract the Business Requirements from stakeholders, regardless of whether waterfall or agile practices are being used.

The Technical Context

The usual suspects; Applications, Data and Technical Architectures (TOGAF) and the requirements they address.

Do note my comment above about functional versus non-functional requirements. 

A good software architecture will take all of the above into account. Obviously there is the risk of information overload. The number of contexts and the contents of each that need to be considered can quickly become overwhelming, and it can be very hard to filter the noise from the data. It is also easy to run out of time by fixating on a single sub-context, e.g. the minutia of the technical requirements (my favourite). That is not an argument for ignoring any particular context; it just means that one needs to have a formal process for comparing and prioritizing elements from different contexts. I will describe the process that I use in a later post.

To keep in line with the current trend in naming software development methodologies and practices I have named the approach described above Gestalt-Driven Development”. I plan to elaborate on GDD in future posts.

Tuesday, June 14, 2011

“Must Have” .Net Tools

Every time I Install Visual Studio 2010, which I seem to do fairly often, I find myself installing the same set of tools and add-ins. I have come to rely heavily on these tools and would be very grumpy if I had to do without them. Here is the first part of my list of “must have” tools and add-ins (in how-grumpy-I-would-be-if-I-had-to-do-without-them order):



  1. JetBrains ReSharper – This one is a total no-brainer. ReSharper is a Visual Studio Developer’s best friend. It does real-time analysis of your code looking for potential errors, optimizations and coding guideline violations. It gives you inline warnings and recommendations and will modify the code for you with a single click. It adds 40 refactoring commands, enhanced IntelliSense, and a bunch of other way-cool stuff. It is also an awesome tool for teaching developers how to write high-quality C# code. Well worth the $199 to $349 (depending on the licence). Note: Even Microsoft uses ReSharper internally; I have seen ReSharper artefacts in code I have received from them.
  2. DevExpress Productivity Tools – In the unlikely case that you can’t convince your manager to pay for ReSharper, DevExpress offers a free Visual Studio add-in that provides some of the features of their paid-for products, which offer similar functionality to ReSharper.
  3. LINQPad – This free tool is definitely a “must have” for all .Net developers. It is primarily a tool for querying data and services using LINQ but it is also the best .Net prototyping and discovery tool that I have found. It will even show you the IL that is generated for various F# language constructs, which is a great way to get an understanding of how all that functional coolness is implemented.
  4. IronPython Tools for Visual Studio – I have been hooked on IronPython for a few years now. I particularly like to use it for API discovery and for quickly coding up test harnesses. It is also easy to host the IronPython runtime in your own application. I have done this in the past to provide a command console and scripting interface for an application I was developing. This add-in makes IronPython almost a first class language in Visual Studio, and includes an integrated IronPython REPL.
  5. Sysinternals Process Explorer – Replace the default Windows Task Manager with the Sysinternals Process Explorer for significantly deeper insight into, and control over, the processes that are running on your workstation. You can suspend or kill a single thread within a process (at your own peril of course), or see which process is using that DLL that you want to replace.
  6. Microsoft Expression Blend – If you are doing any non-boring WPF or Silverlight UI design and development then Blend is a must have. Yes, you can do WPF development with Visual Studio, but if you want to unleash your inner-Dieter-Rams then WPF is the tool for you. The “Ultimate” version also includes an awesome UI prototyping tool called SketchFlow. Unfortunately there is no way to buy Expression Blend by itself, so if you don’t have an MSDN subscription then you are going to have to fork out the $599 for Expression Studio.
  7. F# PowerPack – A bunch of cool and useful extra goodies for F#, written by the F# Team.
  8. Reactive Extensions (Rx) – Everything Erik Meijer and his team touch turns to developer gold, and Reactive Extensions are no exception. From the people who brought you a little thingy called “LINQ” comes Rx; push-based, observable collections and a LINQ-based programming model to go with them.
  9. Spell Checker – Because typos in code comments are just not on! There are a few spell checkers available but I use the free one written by Roman Golovin and Michael Lehenl that is available through the VS Extension Manager. Note: If you are working on very large files you may want to disable it temporarily.



Friday, June 10, 2011

Some things don’t change… much

I recently did an impromptu C# code review on a large project that I am playing Development Manager on. Though the code was fairly well written I did noticed that the developer had exclusively used the foreach statement to iterate over arrays and collections, regardless of the contained type or the number of contained items. When I asked him if he understood the performance impact of always using foreach he was not able to tell me.

In June 2003, when I worked on the .Net Common Language Runtime (CLR) Team at Microsoft, I wrote a paper in which I recommended that one use the for statement instead of foreach in performance-sensitive code paths. In early versions of C# foreach was not optimized for simple cases where it was being used to iterate over arrays of built-in types. This was primarily due to unnecessary type instantiations, virtual function calls, and boxing and unboxing. This was substantially improved in later versions of C# to the degree that iterating over an array of integers using for and foreach have about the same performance characteristics, though the IL that is generated is subtly different; the foreach IL has a couple of extra instructions.

Note: There are a number of good tools that will show the IL that is generated by the C# compiler for a given function, but the tool I like the most is LINQPad. Though it is marketed as a tool for querying relational databases and web data services using LINQ, it is also an awesome .Net prototyping tool. It also supports F#, which I currently have a major crush on! I consider LINQPad a “must have” tool for every serious .Net developer.

As the use of foreach deviates from the scenarios that have been optimised for, the performance characteristics diverge significantly from those of for. That is not to say that foreach’s performance is always worse; there are cases where using foreach results in better performance; but the bottom line is that the IL that will be generated in each case will potentially be significantly different, which will more than likely result in significantly different performance characteristics.

So what is my point exactly?

I want to make it clear that I am not picking on foreach; I use it and other potentially expensive language constructs all the time in my own code, primarily because they result in more aesthetically pleasing, elegant code (hence my current love affair with F#). The point that I am remaking is that, if you care about the quality of your code, you need to have some idea of what your code is doing under the covers.

I will admit that as the language constructs have become more sophisticated, the addition of LINQ being a good example, it has become more and more time consuming to grok those constructs all the way down to the hardware, but it remains super easy to surround a number of implementations of a function with a high-resolution timer and measure the difference in performance.

Some things don’t change… much.