I've been quietly lobbying to my programmer friends about the utility of using the old "goto" keyword. I've gotten a little push back from them, mostly dismissive scorn, but hey, not everyone can be good. :) I should start a series on this and this will serve as my first post on the glories of GOTO.
The other day I posted a twitter image showing a good case of a goto statement.
This is a utility method I use when parsing Xml documents into objects. I received some push back from this specific twitter, mainly from my friend @gbattle. The requirements I gave to Greg were: "no nested if's, no multiple exits, no unnecessary object assignments and better error explanation"
He posted about his code for doing the same thing without using gotos. There were a couple points where he was incorrect, and I wanted to walk through his post. I start trivially and then increase for performance and readability. He still made his points regardless of the library being used. There was only one point, property accessing, that does behave different in .NET than say C++.
I will accept from Greg that breaking down each error condition into exactly what caused the specific failure might be better, but on a utility method like this, if I can wrap up the failure condition into a single statement be accurate, I'm fine with it. "Node should have one child CDATA section with data." Sure, specifically telling the user which 3 words of the all inclusive error might be better, but any software dev that can't follow that error message and look at the node that caused the failure, prolly should not be programing anyway.
// For style reasons, I always put the constant on the lhs within an if statement
// for the compiler to automatically discover nasty undesired assignment errors
if (1 != node.ChildNodes.Count)
I understand the sentiment of the lhs positioning of constants, and in my c++ days, I was an advocate of his pattern; however in .NET it is an error to do an assignment in an IF evaluator. (Which was extremely frustrating starting out because I wanted to do it sometimes)
// There is no reason for you to create a new XmlNode return the value.
// You waste time creating a new object and the assignment. Not needed.
XmlNode dataNode = node.FirstChild;
The assignment operator doesn't create a new object. It creates a pointer to the return object of calling property FirstChild. Creating a pointer to a Property result is the fastest way to access the property when you are accessing it more than once. When loading an property it first must resolve the memory location of the containing object, then look into its method offset table, calling the accessor (get) method, which walks more memory offsets, to return an object. Rather than repeatedly calling the Property (get method) I store the results in a stack variable pointer to the actual data returned by the Property call. (I'm ignoring the code inlining that might occur at run time, but that doesn't meet the speed of storing the reference locally)
Now the money shot for why gotos are good in this situation is something in his code he did not account for. When he is checking the number of ChildNodes off the parent node, if it comes out to zero, he has opened his code up for a NullReferenceException. If ChildNodes.Count == 0, then FirstChild will be null. Checking FirstChild.Value when first child is null will result in an exception. And here lies the nested if statements that I'm trying to avoid.
If the ChildNodes.Count property is 0 (or negatively not 1, the actual node count we are looking for), then you must branch this utility method into either a quick return or a nested if statement that continues checking the validity of the XmlNode when there are the correct number of ChildNodes.
I presented my Garbage Collection/Memory Management talk this last weekend at the NYC Code Camp at the Microsoft Office. I had a great time and it was enjoyable catching up with old friends and meeting new people.
To the gentlemen who was asking about the Large Object Heap and DataSets. A DataSet will not be placed on the LOH. A value of a DataColumn instance might be placed there, but the DataSet itself has a footprint much smaller than the 85 K size required to be placed there.
The Large Object Heap is a heap structure used by .net to allocate large objects greater than 85K. It is not the size of the object graph that places it on the large object heap. Translated to this example, the memory allocation of the DataSet is NOT the size of all the rows and all the columns. The size of the object is determined immediately before the constructor executes (before the DataSet has any rows). This is determine purely by the size of all the fields in the object, and some additional object overhead.
When the memory size is determined and is (currently) above the threshold of 85K, it is placed in the Large Object Heap. The LOH is very similar to the Small Object Heap in behavior with the ptrNextObj and serial allocation. (The one covered in the session). However, it does not have generational support and most importantly the LOH is never compacted.
The lack of compaction is what will create out of memory exceptions because the LOH becomes fragmented and with the new allocations always appearing at the end of the heap, you can run out of memory.
Using the new 2.0 MemoryFailPoint object you can test the allocation and create an InsufficientMemoryException rather than the more fatal OutOfMemoryException.
This is only a tease because you still can't force a compaction of the LOH. It is valid, in some scenarios, force a GC.Collect to help with the SMO and free up some memory. However, the SMO is usually handled quite well by the CLR's collector and calling GC.Collect is nothing more than Jazz Hands to make you feel important. Don't call it, ever...unless you have a good reason...but never call it.