Vasu Balakrishnan’s Blog

Multithreading

leave a comment »

Here is an excellent series of posts about Multithreading. I’ve just started reading the first post. This is a must read.

As I’m reading these posts, I’m going to log my observations.

  • There are four main aspects to a multi threaded application – state, atomicity, serializability, linearizability
  • In Unix, process is the fundamental unit of concurrency, whereas in Windows its the threads. The post says Windows chose this approach because its “lighter” to create processes in Unix than in Windows. I’m not sure what “lighter” means. How Unix Process is architecturally different from a Windows Process?
  • A thread could be either a foreground / background thread and that a background thread doesn’t prevent a process from exiting.
  • ThreadPool provides the application with a pool of worker threads. They are background threads, use default stack size, runs at default priority and is in multithreaded apartment. There is only one thread pool per application.
  • A stack is a contiguous region of memory that grows downward and is used for several operations. The default stack size is 1MB. In C#, you can allocate a block of memory in stack using stackalloc. They are not subject to garbage collection and lifetime is limited to the method in which it is defined.
  • In using Stack as storage post, the author says if you’ve a static field marked as ThreadStatic, you should always check for lazy initialization. The initialization code will run only once, and if we didn’t have lazy initialization the second thread would throw an error. That’s why he has the field wrapped in a static property. Wow!!! I wouldn’t have guessed it.
  • In a multithreaded world, problems happen when you’ve several threads reading and writing or several threads writing to a shared memory location. The author says if we build immutable types, then we don’t need to synchronize access to these items. There are two types of immutable – shallow and deep. A type is shallow immutable if each field of an object never changes during its lifetime. The object is deeply immutable if each field references other objects which don’t change over time too. CLR supports limited immutability via read only fields. There is no CLR support for ensuring deep immutability.
  • Synchronization can be classified into two broad sub areas : data / control. The Data is more commonly used where we ensure that only one thread accesses data at each time. Control is used when one thread depends on other having executed some piece of code.
  • Data Synchronization relies on serializing access to shared state. This is done by creating Critical regions, which delimits several instructions that are executed by a specific thread in a serialized fashion. The usage pattern is simple – thread enters critical region, executes several instructions and leaves the critical region (assuming that there are no exceptions and thread isn’t aborted. I’ve to find out what happens in this scenario)
  • Control Synchronization is appropriate when you need to ensure collaboration between several threads. In this case, we will have threads that will need to wait while some condition isn’t true. There are several ways to implement wait logic
  • busy waiting – you’ll have an infinite cycle which will run until specific condition is true. This might be useful if the thread can do some useful work while waiting. Otherwise, its not a good approach to just keep spinning without any work.
  • real wait – In this case, OS puts the thread in a real wait state so that it wont consume processor time, but will incur thread context switch costs.
  • continuation passing – This relies on passing a closure for being executed when other specific action ends. This might be a good option where waiting is too expensive. Shared state is encapsulated in messages which are passed between threads. This topic is something that I’ve been trying to understand. I didn’t expect it to be used in this context.
  • Kernel objects – We can use kernel objects to achieve Data / Control Synchronization. There are several kernel objects. In traditional Win32, each kernel object is represented by a HANDLE, and can be in one of two states – signaled and non-signaled. Transition between the states depends on the type of kernel object we’re using. We are interested only in waiting on Synchronization kernel objects – mutexes, semaphore, event (auto/manual) and timers. The .NET framework introduces a base class (WaitHandle) on which we can wait, that is extended by mutexes, semaphore and events. AutoResetEvent / ManualResetEvent extends the EventWaitHandle. A thread gets blocked waiting for a kernel object to transition from non-signaled to signaled state, it might wake up without a corresponding transition. This happens if the thread performs so called “alertable wait”. He says in managed code, thread always performs alertable wait. I’m not really sure what they are and to me it seems to defeat the purpose of waiting on a kernel object.
  • Advertisements

    Written by nbvasu

    June 12, 2009 at 2:36 pm

    Posted in C#

    Checked Exceptions in C#

    leave a comment »

    I was wondering why C# doesn’t have Checked Exceptions like Java. I’ve heard arguments about how Checked Exceptions makes it clear as to what exceptions you can expect from calling a code.

    I did a quick search over in Stackoverflow and people do have asked the same question. I then followed this link here, and after reading it makes sense why C# language designers opted out. Also there was article from Bruce Eckel about this too. Both are worth a read.

    Written by nbvasu

    June 12, 2009 at 11:25 am

    Posted in C#

    Learning F#

    leave a comment »

    Since I’ve started to use LINQ, I was interested to learn about Functional Programming. Its a programming paradigm that embraces functions as first class citizens, enables higher order functions, avoids mutable state, etc. The more I started to use in my projects, I began to appreciate the functional style of programming, and  that’s how I got me learn F#.

    We have built a statistical module in C# to calculate mean, standard deviation, etc. I wanted to try implementing them in F#.

    module stats =
    
        let sqr x = x * x
        let mean (data : vector)  = data |> Seq.average
        let max  (data : vector)  = data |> Seq.max
        let min  (data : vector)  = data |> Seq.min
        let range (data : vector) = max data - min data
    
        let avggain (data : vector) =
            data |> Seq.filter (fun x -> x > 0.0) |> Seq.average
    
        let avgloss (data : vector) =
            data |> Seq.filter (fun x -> x < 0.0) |> Seq.average
            
        let variance (data : vector) = 
            let mean = mean(data)
            (data |> Seq.sum_by(fun x -> sqr(x - mean))) / float (data.Length - 1)
    
        let stddev (data : vector) = sqrt(variance(data)) 
        let stddev3 (data : vector) = stddev(data) * sqrt(3.0)
        let stddev12 (data : vector) = stddev(data) * sqrt(12.0)
    
        let skew (data : vector) = 
            let mean = mean(data)
            let stddev = stddev(data)
            let sum = data |> Seq.sum_by(fun x -> ((x - mean) / stddev) ** 3.0)
            let fact = (float data.Length / float ((data.Length - 1) * (data.Length - 2)))
            sum * fact
    
        let kurtosis (data : vector) =
            let mean = data |> mean
            let stddev = data |> stddev
            let len = float data.Length
            let fact1 = len * (len + 1.0) / ((len - 1.0) * (len - 2.0) * (len - 3.0))
            let fact2 = 3.0 * Math.Pow((len - 1.0), 2.0) / ((len - 2.0) * (len - 3.0))
            let sum = data |> Seq.sum_by(fun x -> Math.Pow(((x - mean) / stddev), 4.0))
            sum * fact1 - fact2
    

     

    Here is how you use them.

        let data = vector [0.20; -2.70; 1.59; 0.74; 0.31; 1.87]
        printfn "Input Data\n %A"  (data)
        printfn "Mean - %A"  (stats.mean data)
        printfn "Max - %A" (stats.max data)
        printfn "Min - %A" (stats.min data)
        printfn "Range - %A" (stats.range data)
        printfn "AvgGain - %A" (stats.avggain data)
        printfn "AvgLoss - %A" (stats.avgloss data)
        printfn "Variance - %A" (stats.variance data)
        printfn "StdDev - %A" (stats.stddev data)
        printfn "StdDev3 - %A" (stats.stddev3 data)
        printfn "StdDev12 - %A" (stats.stddev12 data)
        printfn "Skew - %A" (stats.skew data)
        printfn "Kurtosis - %A" (stats.kurtosis data)
    
        
    

    Its pretty straight forward code. I could’ve done the same in C# and with LINQ it would be pretty much be the same number of Lines. The one thing I like in this implementation is pipe forwarding operator (|>) in F#. To me it makes it so much easier to read the code. I wish I could do that in C#.

     

    Written by nbvasu

    June 10, 2009 at 4:35 pm

    Posted in F#

    Stackoverflow

    leave a comment »

    I’ve recently started to post answers at Stackoverflow.

    I think its a good way to contribute to the developer community and is a great medium to exchange ideas / solutions. I like their interface and is well done.

    Its been a week since I started to post, I’ve got a score of 145. Check out my profile here

    Written by nbvasu

    June 10, 2009 at 12:21 pm

    Posted in General

    Differences Between WPF and Silverlight

    leave a comment »

    Check this out here

    Written by nbvasu

    June 10, 2009 at 11:57 am

    Posted in General

    C# Yield Keyword

    leave a comment »

    According to MSDN “Used in an iterator block to provide value to the enumerator object or to signal the end of iteration”. But there is much more to this keyword, which I found is not explained well (at least to me) in the documentation. You’ll see that yield keyword is sort of foundation to LINQ concepts and introduces the notion of lazy evaluation (which people familiar with Functional Programming know).

    Yield allows one enumerable function to be implemented in terms of another. It avoids lot of code that we had to write in order to expose a custom enumerator. This was introduced in C# 2.0 and back then I was able to write the code below

    public class MyCollection<T>
    {
        private List<T> Items;
    
        ...
    
        public IEnumerable<T> FindAll(Predicate<T> pPredicate)
        {
            foreach (T lItem in Items)
            {
                if (pPredicate.Invoke(lItem))
                    yield return lItem;
            }
        }
    }

    I didn’t pay attention to this until I started to use LINQ / Functional Programming concepts. The reason is yield return exhibits a lazy evaluation / deferred execution behaviors. I didn’t get it at first. For this I had to understand the behavior of the IEnumerator.

    Enumerators are used to read data in a collection. Initially, the enumerator is positioned before the first element. You must call MoveNext to advance the enumerator and retrieve the value using Current property. MoveNext tells you whether you’ve hit the end of collection by returning false.

    When you use yield keyword, the compiler internally creates an Enumerator class that keeps the state of current iteration. When you are iterating a collection that is implemented using yield return, you’re moving from item to item in the Enumerator using MovingNext method. That’s the key right here for the lazy evaluation.

    When the code encounters a yield return statement, it gets the item from the enumerator and returns it to the caller. Any other code following the yield statement is not executed and the next item is not processed, until the caller requests for the next item. This is where lazy evaluation comes into picture. You don’t have to enumerate all the items in the collection, and you do so only if the caller asks for it.  Also, when a method has a yield statement, the execution of method doesn’t begin until the calling method first iterates over the resulting IEnumerable. This is known as Deferred Execution. With these, you can chain iterators without actually enumerating anything until you actually need them.

    This is the foundation block for LINQ. LINQ Operators like Select, Where, etc are implemented as iterators that are created with yield statement.

    For further reading, I would recommend these articles

    Written by nbvasu

    May 20, 2009 at 3:35 pm

    Posted in C#

    Project Euler #30

    leave a comment »

    Problem: Find the sum of all the numbers that can be written as the sum of fifth powers of their digits.

    Answer: 443839

    Solution:

    My first pass was to check all numbers between 2 and 1 million. Then I realized that I could reduce the range. The max value for a digit is 9^5 (59049) and the max number could be 59049 times number of digits. This is more of a brute force. I’m sure there is a better way to get down the max range. I’ve reused my ToDigits extension method.

    var x =  Enumerable.Range(2, 295245)
              .Where(pArg => pArg.ToDigits().Sum(pArg1 => Math.Pow(pArg1, 5)) == pArg)
              .Select(pArg => pArg);
    var lAnswer = x.Sum();

     

    Time: 930 ms

    Written by nbvasu

    May 19, 2009 at 10:47 pm

    Posted in C#, Euler