GIENO MIAO'S BLOG: 一月 2009

2009年1月30日星期五

Tips for Memoization

Introduction
We’ve been talking about functional programming quite a bit already. One of the things used frequently in functional programming is recursion, instead of imperative loop constructs. Both have their advantages, but often recursive techniques can cause significant degradations in performance. The prototypical sample is the computation of the Fibonacci sequence (a typical interview question, too). In mathematical terms, Fibonacci is expressed as:
fib : N –> N fib 1 = 1 fib 2 = 2 fib n = fib (n – 1) + fib (n – 2), n > 2
Translating this directly into functional style of code yields the following (C#):
Func fib = null; fib = n => n <= 2 ? 1 : fib(n - 1) + fib(n - 2); The reason we need to spread this across two lines is interesting by itself. If we don’t do this, the following error appears: memoize.cs(17,48): error CS0165: Use of unassigned local variable 'fib' referring to the highlighted position in the code: Func fib = n => n <= 2 ? 1 : fib(n - 1) + fib(n - 2); The reason this error pops up is because we’re defining a function in terms of itself, something that’s invalid in a variable declaration/assignment in C#, just like the following is invalid: int a = a + b; F# addresses this through the use of the rec keyword, but that’s a separate discussion. But what are we doing really when declaring the following? Func fib = null; fib = n => n <= 2 ? 1 : fib(n - 1) + fib(n - 2); Here’s the answer:

Notice the <>c__DisplayClass1, a closure. When assigning the lambda on the second line to fib, we’re capturing the fib variable itself as it appears in the lambda’s body. In more detail, this happens:

On lines IL_0007 to IL_0009 we store null as the value for fib, immediately replacing it on lines IL_000e to IL_001b with a new function retrieved from

b__0. This is where the code becomes self-referential:

As you can see on lines IL_0005 and IL_0013 we’re loading the variable we got assigned to by Main (but this code by itself doesn’t know that) in order to call it 4 lines further on, through the delegate. The rest of this code is a trivial translation of the ternary operator. Why is the interesting at all? It turns out this will be fairly important further on in this article as we’ll want to tweak this function.

What’s memoization?
Looking at our Fibonacci sequence sample again, try to imagine the call tree that results from a call like fib(10). Or let’s simplify it, consider Fib(5). Here’s the call tree:
Fib(5) Fib(4) Fib(3) Fib(2) Fib(1) Fib(2) Fib(3) Fib(2) Fib(1)
We’re calculating things over and over again. So how can we solve this? First of all, by embracing an imperative style, at the cost of the more declarative natural mapping of the original recursive definition:
uint Fib(uint n) { if (n <= 2) return 1; else { uint prev = 1; uint curr = 1; for (uint i = 0; i < t =" curr;" prev =" t;">

Measuring success
Before we claim things like “10 times better”, we should establish a baseline for comparison and create a mechanism to measure our success. As usual, we’ll rely on the System.Diagnostics.Stopwatch class to do so:
static void Test(Func fib) { Stopwatch sw = new Stopwatch(); sw.Start();
var res = from x in Range(1, 40, i => i + 1) select new { N = x, Value = fib(x) }; foreach (var i in res) Console.WriteLine("fib(" + i.N + ") = " + i.Value);
sw.Stop(); Console.WriteLine(sw.Elapsed); }
In here I’m using a generalization of Enumerable.Range I find useful (although here there’s no real need to range of uint for the input, our function could well be Func):
static IEnumerable Range(T from, T to, Func inc) where T : IComparable { for (T t = from; t.CompareTo(to) <= 0; t = inc(t)) { yield return t; } } Actually you’d call Range “For” instead and it becomes very apparent what it’s all about, isn’t it? Let’s take a look how our current implementation does: Func fib = null; fib = n => n <= 2 ? 1 : fib(n - 1) + fib(n - 2); Test(fib); Yes, it’s fine to say it: in one word terrible…

Injecting the memoizer
As mentioned before, our strategy to tackle this inefficiency will be to trade instructions for memory, essentially keeping a cache of calculated values in some kind of cache. The built-in collection type that’s ideal for this purpose is obviously the generic Dictionary in System.Collections.Generic. But how do we get it in our function definition seamlessly? In other words, given any function of arity 1 (meaning it takes in one argument, we’ll look at extending that further on), how can we sprinkle a little bit of memoization on top without changing the outer contract of the function? Here’s the code that allows us to preserve the signature but slice the memoizer in between the original function and the memoized one:

static Func Memoize(this Func f)
{
Dictionary cache = new Dictionary();
return t => {
if (cache.ContainsKey(t))
return cache[t];
else
return (cache[t] = f(t));
};
}

Actually this code can be optimized a little further using the TryGetValue method on the Dictionary class, and if you have more taste than me the else-block statement can be writter in a nicer way (if I was in a real evil mood, I’d have put it in a ternary operator conditional). I’ll leave such rewrites to the reader as an additional opportunity to express personal style :-).

Notice how the signature of the returned function is the same as the original on: that’s what makes our implementation seamless and transparent. I’m writing this as an extension method on Func, but there’s no need to do it that way. What’s more important though is how it works internally. Again you can see closures at work, because what we’ve really created here is something that looks like this:

class Memoizer
{
private Dictionary _cache = new Dictionary();
private Func _f;

internal Memoizer(Func f)
{
_f = f;
}

internal R Invoke(T t)
{
if (cache.ContainsKey(t))
return cache[t];
else
return (cache[t] = _f(t));
}
}

You can look at it as lifting an existing function into the memoizer (one per function as we need a unique cache on a function-per-function basis). Obviously you’ll need similar implementations for other function arities (including the zero-argument function, typically used for delayed computation scenarios). Here another issue pops up: the lack of Tuple types (with proper implementations for Equals and GetHashCode) that would be useful in such a case to express the dictionary’s key type. Even more, the debate on how much generic overloads to provide (Action, Func, Tuple, etc) enters the picture again. Unfortunately the type system isn’t rich enough to have a “Tuple”. At runtime there are ways to get around this, but then we enter dynamic meta-grounds again, so let’s not deviate from our path this time and keep that discussion for another time.

Putting it to the test
Back to our original code:

Func fib = null;
fib = n => n <= 2 ? 1 : fib(n - 1) + fib(n - 2); Test(fib); Easy as it seems you might think the following will do the trick: Func fib = null;
fib = n => n <= 2 ? 1 : fib(n - 1) + fib(n - 2); Test(fib.Memoize()); but unfortunately you won’t see any noticeable effect by doing so. Why? Take a closer look at what’s happening. The code above is equivalent to: Func fib = null;
fib = n => n <= 2 ? 1 : fib(n - 1) + fib(n - 2); Memoizer m = new Memoizer(fib);
Func fibm = new Func(m.Invoke);
Test(fibm);
Now we’re calling through fibm, which results in invoking the Invoke method on the (simplified) memoizer’s closure class. But look what we’re passing in to the constructor: the original fib instance, which really is a public field on another closure as explained in the introduction paragraph. So ultimately we’re just memoizing the “outer” fib function, not the “inner” recursive calls. How can we make the thing work as we expect it to? Remember from the introduction paragraph why we needed the following trick?

Func fib = null;
fib = n => n <= 2 ? 1 : fib(n - 1) + fib(n - 2); The generated code stores fib in a closure class field <>c__DisplayClass1::fib. In fact, there’s no such thing as a local variable fib in the IL-code; instead all occurrences of fib have been replaced by ldfld and stfld operations on the closure class’s field. But what’s more is that the closure class’s

b__0 method uses the same field for the recursive calls to fib (see the last figure in the introduction paragraph). That’s precisely what we need to know in order to make the memoizer work: if we assign the result of fib.Memoize() to fib again, we’re replacing the field value that’s also used in the recursive calls:

Func fib = null;
fib = n => n <= 2 ? 1 : fib(n - 1) + fib(n - 2); fib = fib.Memoize(); Test(fib); As a little quiz question: why can’t we write the following instead? Func fib = null;
fib = (n => n <= 2 ? 1 : fib(n - 1) + fib(n - 2)).Memoize(); Test(fib); And here’s the result:

Much better, actually much more than “10 times better”.

Additional quiz question

Would you be able to do all of this with expression trees, i.e. with the following hypothetical piece of code:

LambdaExpressiion> fibEx = null;
fibEx = n => n <= 2 ? 1 : fibEx(n - 1) + fibEx(n - 2); // what now? Test(fib); Why (not)?

2009年1月22日星期四

5 Thought Experiments That Might Change Your Life

The Trouble with Freedom

Having significant amounts of unstructured time in your schedule provides three benefits…

Time affluence which generates happiness.

The ability to master the small amount of structured things you leave in your schedule — the only route to becoming famous.

Freedom to expose yourself to positive randomness, the key to stumbling into cool opportunities.

The argument is clear. Putting it into practice, however, can become problematic. I know this because I’ve received several e-mails from students reporting that they’ve given underscheduling a try, but didn’t know what to do with all that free time.

The result: lots of doing nothing, which made them unhappy, which, ironically, made them procrastinate more than ever before on their work, which made them even more unhappy, and so on.

In this post I want to help rectify this problem. Below I’ve listed 3 simple rules to help you get the most out of your experiments with an underscheduled lifestyle:

(1) Once a week do each of the following:

Attend a talk on campus.

Go to your nearest Barnes & Noble, grab a stack of books that seem interesting, spend at least one hour with a coffee reading through them.

Identify one person who has done something you find cool then send them an e-mail asking them a concise and specific question about how they got started down that path.

These three things are simple. I don’t care when you do them, just make sure you do all three before the week is done. They’ll act like rocket fuel for your curiosity.

(2) Start a Saturday Morning Project
As you might remember, a Saturday Morning Project is a big, ambitious project (when described it makes people say “wow”) that has no external deadlines or outside pressure for you to complete. For example: trying to get a short story published, or grow a blog, or launch a microbusiness.

The key it is to constrain your work on this project to the same time every week. I like Saturday morning because it’s unlikely you’ll ever have anything else scheduled then. But any time can work.

There’s something energizing about making consistent progress on something cool. It energizes you for the rest of the week, not to mention that you’ll begin to shake loose interesting opportunities as you make progress.

(3) Plan Adventures
Free time that you leave free drains energy. There’s nothing more demotivating than sitting on your dorm room couch, lethargic with laziness, with absolutely nothing to do. Avoid this situation at all costs.

Instead, plan adventures for your free time. In the short-term, gather a crowd to see a movie at the art house theater. Or, check out trivia night at the local bar. Or, figure out how to make good martinis. Whatever. Just organize something with a goal and that involves other people.
In the long-term, seek out, register, and plan to attend as many interesting events in your area as possible. Some of the most interesting students I’ve met are those who are constantly leaving campus on Saturday morning to go attend some conference in the next city over that they somehow finagled a student pass for. There’s no better way to bathe yourself in positive randomness.

Conclusion
Being underscheduled is significantly better than being overscheduled. But if you can’t take advantage of your newfound time affluence, things can still get grim. Take your free time seriously. If you don’t use it for something, it just becomes wasted. The key is finding the right something that keeps you happy, interesting, and impressive.
How do you make optimal use of your unstructured time?

[转]出自阿里巴巴的同学

各位阿里人，　　
　　昨天有个老阿里人找到我,他听说自己2009年的工资会有不小幅度的提升而且2008年的年终奖高出自己的预期.坚持要求自己不加工资.前几天也有阿里的高管要求给自己减薪但给员工加工资.每次阿里集团加工资和发年终奖的时候总有人会坚持不要给自己加工资.感动之余,我想和大家谈谈我对提薪和奖金的看法.
　　2008是不平凡的一年。经过全体阿里人的不懈努力，我们克服了重重困难和挑战，取得了我认为阿里巴巴九年来最为难得的进步和业绩！尽管经济大环境面临空前困难，公司仍然作了2009加薪和2008年丰厚的年终奖计划,根据2-7-1原则,绝大部分员工将会获得加薪和不错的年终奖金.这不仅是因为我们现金充足，更因为勤奋工作并取得出色成绩的阿里人值得褒奖。
　　此次调薪惟有一点不同往年，包括副总裁在内的所有高层管理人员全部不加薪。我们认为，越是困难时期，公司资源越应该向普通员工倾斜，紧迫感和危机感首先要来自公司高层管理者。
　　尽管经济环境不好，但只要公司实现了战略目标，我们仍会奖励优秀员工，不会受外界和其他公司做法的影响；这同样意味着，即使经济环境好了，而我们的成绩差了，即使所有公司都在加薪发奖金，我们也会选择相反的做法！
　　工资是付给岗位的.加薪意味着我们为这个岗位提出了更高更新的要求.(拒绝加薪但不能拒绝我们对你在岗位上的进步要求呵呵).2009年集团将会有巨大的培训预算希望能大幅度提升各个岗位的职责和要求. 　　而奖金是根据公司整体业绩结果来肯定和激励那些在职位上做出出色表现的人.奖金不是福利.不是每个人都理所当然获得的,它必须是自己努力才挣出来的!!分配上，我们坚决不搞平均主义，平均主义是对辛勤付出且绩效优秀同事的不公平！不如此，阿里巴巴不可能实现"今天最好的表现是明天最低的要求"，也不可能挑战更高的目标！
　　各位阿里人,2009年是阿里巴巴的十周年.而这次全球经济危机是阿里巴巴的成年礼, 今天的一切都是任何一家希望基业长青的企业必然要经历的周期,我们所经历的一切也将必定成为我们今生的骄傲! 　　辛苦一年了.请带上你的家人去花钱!!去消费!!!去享受我们一年的辛苦成果!
　　2009年还在等着我们去面对.成千上万的用户在期待着我们的努力….
　　好好过年吧!!
　千万记得替我向父母,孩子和家人朋友们问好!没有他们的付出就不会有今天的阿里巴巴! 　　
感谢你们.阿里人! 　　
马云

2009年1月19日星期一

Overnight Success: It May Takes Years

Paul Buchheit, the original lead developer of GMail, notes that the success of GMail was a long time in coming:

We starting working on Gmail in August 2001. For a long time, almost everyone disliked it. Some people used it anyway because of the search, but they had endless complaints. Quite a few people thought that we should kill the project, or perhaps "reboot" it as an enterprise product with native client software, not this crazy Javascript stuff. Even when we got to the point of launching it on April 1, 2004 -- two and a half years after starting work on it -- many people inside of Google were predicting doom. The product was too weird, and nobody wants to change email services. I was told that we would never get a million users.
Once we launched, the response was surprisingly positive, except from the people who hated it for a variety of reasons. Nevertheless, it was frequently described as "niche", and "not used by real people outside of silicon valley".

Now, almost 7 1/2 years after we started working on Gmail, I see [an article describing how Gmail grew 40% last year, compared to 2% for Yahoo and -7% for Hotmail].

Paul has since left Google and now works at his own startup, FriendFeed. Many industry insiders have not been kind to FriendFeed. Stowe Boyd even went so far as to call FriendFeed a failure. Paul takes this criticism in stride:

Creating an important new product generally takes time. FriendFeed needs to continue changing and improving, just as Gmail did six years ago. FriendFeed shows a lot of promise, but it's still a "work in progress".

My expectation is that big success takes years, and there aren't many counter-examples (other than YouTube, and they didn't actually get to the point of making piles of money just yet). Facebook grew very fast, but it's almost 5 years old at this point. Larry and Sergey started working on Google in 1996 -- when I started there in 1999, few people had heard of it yet.
This notion of overnight success is very misleading, and rather harmful. If you're starting something new, expect a long journey. That's no excuse to move slow though. To the contrary, you must move very fast, otherwise you will never arrive, because it's a long journey! This is also why it's important to be frugal -- you don't want to starve to death halfway up the mountain.

That's perfectly fine by me. I never said I was clever. But if I keep doing it long enough, who knows? I might very well wake up one day and find out I'm an overnight success.

2009年1月13日星期二

Five Questions With Sara Ford

Although Sara Ford is currently the Program Manager for the Visual Studio Power Toys, she started life at Microsoft as a tester. Back then she drove accessibility into Microsoft Visual Studio' s shell (i.e., all that UI you see as you use Visual Studio), most parts of which she owned testing for at one point or another. Nowadays she spends her time convincing teams across Microsoft to share code with y'all via shared and open source.

Sara spends her free time doing sports of all kinds, including riding her custom-built bicycle. Sara says her goal in life is to be a ninety-seven-year-old weightlifter, so that she is featured on the local news. Here is what Sara has to say:

Q: What was your first introduction to testing? What did that leave you thinking about the act and/or concept of testing?
A: My first real introduction to software testing was actually during my interview at Microsoft. During the Exchange interview in the morning, they asked me to test a simple weighted network (think traveling salesman style question). The interviewer explained to me that my answer was actually trying to implement the network, and not test it. She gave me a few examples of what she was looking for, and I thought, “whoa, I never thought that way before.” It was similar to only having a pencil, then getting a box of 8 colored crayons to work with. I started to get the feeling to this “software testing” thing throughout the rest of the loop, and by the time I got to Visual Studio that afternoon, I was honestly having fun answering the questions. Originally, SDET was the last thing I listed on my list of positions I wanted, but I left the interview loop jazzed about being a SDET. Equally important, the last interviewer said something that has stuck with me ever since about software testing. He said, “software testing is all about doing the optimal amount of testing in the minimum amount of work.” So this “fun” problem of “how many ways can you break something” became a challenge of “how many ways can you break something as fast as possible.”

Q: What has most surprised you as you have learned about testing/in your experiences with testing?
A: I was most surprised the day I realized the paradox of, “so, um, how am I going to write tests for the tests that I’m writing?”

Q: What is the most interesting bug you have seen?
A: I love this question =) One of the coolest bugs (yes, a die-hard tester will call the bugs they’ve found “cool”) I’ve ever found was in Visual Studio 2005 Tool Window Docking (sorry Chris, my former dev, for sharing this story). In a pre-beta release, if you re-docked a tool window to the same place over and over again, about maybe 100+ times, the window size would start to shrink (as if the user resized the window) until the size actually became negative, eating into the title bar. And no, I didn’t find this manually. I actually stumbled upon this while experimenting with model-based testing, when one of my models got away from me.
My other all-time favorite bug had to do with a screen reader. If Visual Studio had more than 500,000 characters opened in a file, and a screen reader was running, Visual Studio would crash. Took me 10 days working with a customer who was blind to figure out the root cause of his crash. But persistence prevailed in the end.

Q: How would you describe your testing philosophy?
A: Every tester has their own techniques and styles. But, if I had an actual philosophy, I would have to say it is letting Murphy’s Law work for me. I seem to have really dumb luck at times with software. Maybe things don’t come as intuitively to me, or maybe I just see UI slightly differently from the rest of the population. But whatever it is, the bugs always seemed to find me, instead of the other way around. Software testing was such a great job, until Murphy’s Law found a way to one-up me. The bugs would find me only once, such that I could never get a repro (i.e. getting the reproduction steps for the developer to investigate the bug). Sigh, Murphy’s Law.

Q: Is there something which is typically emphasized as important regarding testing that you think can be ignored, is unimportant?
A: From time to time, I hear testing philosophies of “100% automated testing – no manual tests.” I couldn’t disagree more. First, automated testing can never truly replicate the end-user experience. It’s critical for a tester to know their customers, to use the features as they would, and to share their pain. Automation can never catch end-user pain points, even at 100% code coverage. Secondly, manual testing is where a tester’s creativity comes into play. As I said above, the bugs find me, since I can never seem to use UI the way it was designed.

5 Tips for surviving as a Tester - advice from my previous manager in Microsoft

These tips apply for all testers, not just Microsofties. What other tips would you include in this list?

Bend over backwards to help your dev
You’re going to be breaking his/her code for the rest of your careers together, so you want to have a great working relationship between each other. First off, establish trust with your dev. If you say you’re going to buddy test something by such-and-such time, get it done. Actively seek feedback from your dev on what features are lacking testing, how you can do more testing, etc. And whatever your dev needs from you, like ad-hoc machines in the lab, status of nightly runs, get that info to him/her asap. It won’t take long for you to see the benefits of going out of your way to help your dev. And it’s not just about using common sense, it’s about making sure you’re doing your part in helping the team succeed and how you would want your team to help you out when you need it.

Leave appropriate comments in every bug you verify and close
1 year, 6 months, or even just 3 months down the road you won’t remember how you verified that bug fix. If you can no longer follow the original bug repro instructions (sometimes bugs morph, other times the fix doesn’t allow you to get to the end of the repro), leave comment explaining exactly what you did to verify the fix. And once again as common sense would dictate, make sure the build number that you verified the fix on is written some where in your bug tracking database.Never Assume AnythingNever assume that

Someone else is covering that scenario – Test everything you think of. A little overlap never hurts.
Triage (or whoever is reading your bug) will “just get” the bug. Always be explicit with repros, even when it’s completely obvious (to you) in the attached repro picture.
A simple scenario could never be broken. Always, always, always test – even if it is the simplest feature or part of the feature in the world.

Don’t just get it in writing
Email just isn’t enough. If you’re not going to cover an explicit scenario, get it in writing in your test plan. If you’ve discovered an issue and the issue isn’t going to be fixed as stated in an email thread, make sure it is in your bug database. Email is like leaves on a tree. You get a lot of new email in, you save a lot of email off to the side (like leaves in a large trash bag which makes for excellent tree house furniture), but after a while, saved old email just gets older and unneeded, just like your furniture that is now collecting spiders, so you throw it out. Make sure any major decisions involving your test strategy or test bed or written in the appropriate document / database / forum, etc. And yes, that was my worst analogy ever.

Learn from the bugs you missed
Ask yourself why someone filed a bug report in your feature area that was later fixed? What could you learn from this bug? Are you missing other similar tests? This process is actually called “Root Case Analysis”, but you don’t have to do the formal RCA work to learn from bugs that got away from you. So many times, I find myself staring at my monitor just wondering what I’m missing. The answer to that question is just doing a query of fixed bugs not opened by me. Surprise yourself by seeing just how much you can learn by doing this.

C# 4.0, Dynamic Programming and JSON

C# 4.0 features a new dynamic keyword that allows you to mix in a bit of late-bound code in the midst of your otherwise statically typed code. This helps cleanup the string-based programming mess that is a characteristic of late-bound code. In fact, there are a number of scenarios that would benefit from dynamic typing in my opinion in addition to interop with other dynamic code (such as Silverlight talking to the DOM or a .NET app talking to Office automation APIs). For example:

Accessing JSON data
Accessing XML nodes and attributes
Experimenting and calling into with REST services without an explicitly coded up proxy
Access to settings (eg. Isolated storage settings in Silverlight, or configuration settings such as appSettings)
... and more

Traditional dynamic languages are striving to introduce some degree of static typing (eg. type information in ActionScript 3, and the unfortunately failed EcmaScript 4 attempt) for perf and more appeal. On the flip side, C# is evolving nicely by introducing dynamic typing through a static type called dynamic (nice oxymoron) and offering not just a nicer syntax, but a much more intuitive model for these scenarios. I can only wish the C# 4.0 was here now!
Anyway, I am still excited about this feature, and given the VS 2010 and C# 4.0 CTP are available, I decided to play with a few of the above listed scenarios. In this post, I'll describe the JSON scenario, and in a subsequent post, I'll write about the REST services scenario, so stay tuned and come back for more.
First a quick look at how JSON data might be used in the framework today, specifically using System.Json in Silverlight.

string jsonText = "{ xyz: 123, items: [ 10, 100, 1000 ] }";
JsonObject jsonObject = (JsonObject)JsonValue.Parse(jsonText);

JsonArray items = (JsonArray)jsonObject["items"];
items[2] = 1001;

JsonObject bar = new JsonObject();
bar["name"] = "c#";
jsonObject["bar"] = bar;

StringWriter sw = new StringWriter();
jsonObject.Save(sw);

string newJsonText = sw.ToString();

Contrast this with using dynamic typing. I basically took my very light-weight JSON reader and writer and added support for dynamic typing in my equivalent JsonObject and JsonArray types. With that in place, I can now write code as follows:

string jsonText = "{ xyz: 123, items: [ 10, 100, 1000 ] }";

JsonReader jsonReader = new JsonReader(jsonText);
dynamic jsonObject = jsonReader.ReadValue();

dynamic items = jsonObject.items;
items.Item(2, 1001);

dynamic bar = new JsonObject();
bar.name = "c#";
jsonObject.bar = bar;

JsonWriter writer = new JsonWriter();
writer.WriteValue((object)jsonObject);


string newJsonText = writer.Json;

Note that in the CTP, there isn't any support for indexers used against dynamic types, which gets in the way of normal array syntax. Hence the workaround above using Item(). However, I've been told, that support for indexing into dynamic types already exists in later builds.

Personally, I think the resulting code using dynamic typing is nicer than the current System.Json APIs, since you don't have to use strings for member names, and that distinguishes them from actual string values. Thoughts?

Memory - Episode 1

An Iconic Microsoft Company Photo – 30 Years Later

Top row, L-R: Steve Wood, Bob Wallace and Jim Lane;
Middle row, L-R: Bob O'Rear, Bob Greenberg, March McDonald and Gordon Letwin;
Front row, L-R: Bill Gates, Andrea Lewis, Marla Wood and Paul Allen.
(Missing: Miriam Lubow, who missed the photo shoot because of a snowstorm)

Thirty years ago, on December 7th, 1978, the entire Microsoft staff gathered at Royal Frontier Studios in Albuquerque for a company portrait. It is the only image of it’s kind from that era and is one of the most iconic images in Microsofts’ history. For those of us who work here(or have ever worked here), it serves as a historical touch point illustrating how a small, yet passionate and dedicated, group could quite literally change the world.

It was during this time in Albuquerque that the name “Microsoft” was trademarked and where Bill Gates laid out his famous mission statement for the company: A computer on every desk and in every home running Microsoft software.

In April this year, 11 of Microsoft’s 12 original employees gathered again to re-create the famous photo that was taken shortly before the company relocated in 1978 from its Albuquerque birthplace to Washington State.

Back row, L-R: Bob O'Rear, Steve Wood, Bob Greenberg, March McDonald, Gordon Letwin, and Jim Lane;
Front row, L-R: Bill Gates, Andrea Lewis, Miriam Lubow, Marla Wood and Paul Allen.
(Missing: Bob Wallace, who passed away in 2002)

2009年1月11日星期日

Links - New York Times: The End of the Financial World as We Know It

New York Times: The End of the Financial World as We Know It - “The tyranny of the short term has extended itself with frightening ease into the entities that were meant to, one way or another, discipline Wall Street, and force it to consider its enlightened self-interest.“

The Art of Speaking

How to Speak
Every January, during MIT’s Independent Activities Period, Computer Science Professor Patrick Henry Winston gives a famed lecture titled: How to Speak. During this perennially popular event, Professor Winston walks his audience through a series of tips and strategies, developed and honed over decades, for mastering the art of speaking. I attended his lecture(base internet) for the first time this year, and was not disappointed.
In this post, I draw from these notes to present to you, in detail, the secrets behind the Patrick Winston Method.
The Formula
I = f(K,P,T)
Your Impact is a function of your Knowledge about speaking, Practice, and Talent — in decreasing order of importance. Winston’s advice focuses on your knowledge about speaking. This is the easiest way to gain the biggest increases in your impact.

How to Start
Some advice for starting your talk.
Don’t start with a joke. The audience is not accustomed to you or your speaking style yet. Humor will be difficult at this point.
Do start with a menu. Tell them exactly what you’ll be speaking about and in what order.
Do provide an empowerment promise. Explain why your audience will come away from the talk better than when they entered.

The Big Four
A collection of four heuristics that make a talk work.
Cycling. Deliver ideas first in brief, then in detail, then in summary. To use the lingo of artificial intelligence: let your audience load the schema, then fill in the details, then let them know what’s worth indexing for future reference.
Verbal Punctuation. Provide a mechanism to help people who “fogged out” to easily rejoin the talk. For example: “We have just finished talking about the first heuristic, cycling, I am now going to talk about the second heuristic for helping to make your talks more interesting…”
Near Miss. When explaining an idea, also describe other ideas that are close but not quite the same. This will help people understand what the important points are that define your idea.
Ask Rhetorical Questions. Don’t make them too easy. Don’t make them too hard. Wait 6 seconds for an answer.

The Tools
Four tools that can make or break your presentation.
Time and Place. If it’s in your control: mid-morning is the best time. Choose a location that will look full with your expected audience size. Make sure it is well-lit. Don’t let them turn down the lights. (“It’s easier to see slides in a light room then to seem them through closed eyelids.”)
The Board. A blackboard lets you draw natural graphics that highlight your points. It also paces you. The speed of writing matches the speed with which people process information. Use a logo that captures the main point and that you can return to. (“I once saw a Sloan professor lecture for a whole hour about a triangle; it was amazing!”) It also provides a target. The best thing to do with your hands? Point at things on the board.
Slides. Don’t use anything less than 24-point type. If you can’t fit the information at this font size then you have too much. Follow these four rules:
1.Don’t read the slides! “A special circle in hell for those who…”
2.Don’t stand far away from the screen. This requires divided attention from your audience.
3.Have one meaningful picture per slide. If it’s found in Microsoft’s clip art gallery, it’s not meaningful.
4.No pointers. Laser or otherwise. These are distractions. You’ll play with them. They’re annoying. Stand by the screen and point with your hand or refer to visual anchors on the slide.
Props. When possible, use a prop to illustrate an idea.

Special Cases
Three specific types of talks. (Notice, the first two are specific to academia, but the advice is none-the-less generalizable to other arenas).
Oral Exams. Some strategies:
1. Show your hand early on. Within five minutes have explained what you did and why it’s important.
2. Situate your results in time, space, and field. That is, explain the trajectory over time of your area of concentration, where else people are working on the same problem, and the consequence of your result for the field.
3. Practice. Ask your friends to listen to your talk. Tell them to try to make you cry.
Job Talk. Here is what they want to see in a candidate:
1. Has a vision.
2. Has done something about that vision.
3. Don’t finish with a conclusion slide. Instead have a contributions slides. Something that spells out clearly what you did.
Getting Famous. If you want to become a world class speaker, try to deploy Winston’s Star. A five-point checklist of things to make your talk extra memorable:
1. Symbol. Some icon that makes your ideas easy to hold on to.
2. Slogan. A simple linguistic handle for your ideas.
3. Surprise. Make people say: “did you see that talk…”
4. Salient. Have an idea that really sticks out.
5. Story. Tell stories that engage the audience.

How to Stop
Some things to keep in mind about concluding a talk:
1. Deliver on your promise made at the beginning. Remind them what it was and summarize how you satisfied it.
2. Tell a joke. They know you now. And if they leave happy they will assume the entire talk made them happy.
3. Call for questions.
4. Don’t thank the audience. It makes it seem like they did you a favor by listening to your boring babble.
5. End with a salute. Compliment without thanking. (i.e., “You’ve been a great audience, I hope you learned a lot about how to give a great talk.”)

2009年1月7日星期三

Expand your reach instead of trying to repeat the same tests over and over

Use automation to expand your reach and extend your senses, allowing you to see more and do more.

You just can't run some tests without automation. You can run others one a much larger scale. Here are some examples:
Load Test. What happens when 200 people try to use your software at the same time? What about 2000? What about 20000? You'll need automation to simulate these scenarios.
Performance Benchmark. Is system performance getting better or worse? You can instrument automated tests to capture time measurements each time you run them. By collecting these measurements and reviewing them as a tune series, you can detect performance degradations. Use the same approach to benchmark uses of resources, such as memory or storage.
Configuration Test. Software often must work on different platforms, in different configurations, attached to different peripherals. How do you cover them all? Automation helps you increase your coverage. To make this work, you must ensure that your tests are portable across platforms.
Endurance Test. What will happen when your product has been in use for weeks or months? Memory leaks, stack corruption, wild pointers, and similar error may not be apparent when they occur but will eventually cause trouble. One strategy is to run a series of test cases over a long period C days or weeks C without resetting the system. This requires automation.
Race Conditions. Some problems occur only in specific timing situations. The coincidental timing of two threads or processes contending for the same resource results in an error known as a race condition. These are often hard to find and hard to reproduce. Automation can be a big help because you can repeat tests with many slightly different timing characteristics.
Combination Errors. Some errors involve the interaction of several features. Use automation to test huge numbers of complex tests, each of which uses several features in varying ways.

Those approaches focus on using automation to create new tests or to repeat product usage in ways designed to uncover new bugs. None of these tests is simple to implement. You may have to work up to them by automating different parts of the testing and developing tools to assist you Nonetheless, I think this is often a better goal for the automation efforts than repeating the same feature tests again and again.

Automation Tips - Foreword

Robots that make you breakfast. Flying cars for commuters. That's science fiction. But software can do anything. So, why not make software test software? The reasoning goes, if one computer can do the work of three million mathematicians using sticks and sand, then surely one computer is worth an army of human testers. Indeed, test automation is an exciting idea that holds great promise. But beware. Automating some of your testing might or might not be helpful. Automation can save time, speed development, extend your reach, and make your testing more effective. Or it can distract you and waste resources.

Your investment in test automation is valuable to the extent that it helps you achieve your mission. The role of testing is to gain information. What information is your automation providing?

Automation efforts have been spectacularly successful for some groups but have left others unhappy and frustrated. Some of the failing groups have deluded themselves and their management into thinking that the work they put into automation yielded something that was helpful.

Use automation when it advances the mission of testing. Evaluate your success at automation in terms of the extent to which it has helped you achieve your mission.

Design your tests first, before deciding which to automate. This prevents you from falling into the trap of automating tests that are easy to automate but weak at finding defects.

Design automated tests differently from manual test. Much of the power of automated testing comes from using a computer to do things a person cannot do. Look for opportunities, such as being able to repeat the same tests over thousands of different data files. This prevents you from falling into the trap of only automating tests from the existing (manual) test plans and missing the big opportunities for test automation. When designing manual tests, you aren't likely to consider tests that apply repetitive operations over thousands of files; it would simply be too much work.

These conflicting messages derive from two important things:
1) Automating without good test design may result in a lot of activity, but little value.
2) Designing tests without a good understanding of automation possibilities may overlook some of the most valuable opportunities for automation.

To reliably succeed, you must have good test designers and good automators contributing to the selection and design of your automated tests. That's easy for us to say. It's harder to make it happen.

2009年1月6日星期二

How to Write a Bug Report : A dozen helpful tips

1. Be very specific when describing the bug. Don’t let there be any room for interpretation. More concise means less ambiguous, so less clarification will be needed later on.

2. Calling windows by their correct names (by the name displayed on the title bar) will eliminate some ambiguity.

3. Don’t be repetitive. Don’t repeat yourself. Also, don’t say things twice or three times.

4. Try to limit the number of steps to recreate the problem. A bug that is written with 7 or more steps can usually become hard to read. It is usually possible to shorten that list.

5. Start describing with where the bug begins, not before. For example, you don't have to describe how to load and launch the application if the application crashes on exit.

6. Proofreading the bug report is very important. Send it through a spell checker before submitting it.

7. Make sure that all step numbers are sequenced. (No missing step numbers and no duplicates.)

8. Please make sure that you use sentences. This is a sentence. This not sentence.

9. Don’t use a condescending or negative tone in your bug reports. Don’t say things like "It's still broken", or “It is completely wrong”.

10. Don’t use vague terms like “It doesn’t work” or “not working properly”

11. If there is an error message involved, be sure to include the exact wording of the text in the bug report. If there is a GPF (General Protection Fault) be sure to include the name of the module and address of the crash.

12. Once the text of the report is entered, you don’t know whose eyes will see it. It could show up in other documents that you are not aware of, such as reports to senior management or clients, to the company intranet, to future test scripts or test plans. The point is that the bug report is your work product, and you should take pride in your work.

2009年1月5日星期一

LINQ - Set Operators

There are four LINQ set operators: Union, Intersect, Distinct and Except. Like the other 49 LINQ operators, these methods are designed to allow you to query data which supports the IEnumerable interface. Since all LINQ query expressions, and most LINQ queries, return IEnumerable, these operators are designed to allow you to perform set operations on the results of a LINQ query.

In this post I give four highly simplified examples of how to use each of the operators, and then end with a more complex example that shows how the operators might be used in a real world setting.

Union
The Union operator shows the unique items from two lists, as shown in listing 1.
Listing 1: The Show Union method displays the number 1, 2, 3, 4, 5 and 6.
public void ShowUnion()
{
var listA = Enumerable.Range(1, 3);
var listB = new List<int> { 3, 4, 5, 6 };
var listC = listA.Union(listB);

foreach (var item in listC)
{
Console.WriteLine(item);
}
}
Here two collections are joined together, but only the unique members of each list are retained.

Intersect
The Intersect operator shows the items that two lists have in common.
Listing 2: The ShowIntersect method displays the numbers 3 and 4

public void ShowIntersect()
{
    var listA = Enumerable.Range(1, 4);
    var listB = new List<int> { 3, 4, 5, 6 };
    var listC = listA.Intersect(listB);
    foreach (var item in listC)
    {
        Console.WriteLine(item);
    }
}   

Here to collections are joined together, and only the unique, shared members of each list are retained.

Distinct
The Distinct operator finds all the unique items in a list.
Listing 3: The ShowDistinct method displays the number 1, 2 and 3.

public void ShowDistinct()
{
    var listA = new List<int> { 1, 2, 3, 3, 2, 1 };
    var listB = listA.Distinct();

    foreach (var item in listB)
    {
        Console.WriteLine(item);
    }
}

Except

The Except operator shows all the items in one list minus the items in a second list.
Listing 4: The ShowExcept method prints out the numbers 1, 2, 5, and 6

public void ShowExcept()
{
    var listA = Enumerable.Range(1, 6);
    var listB = new List<int> { 3, 4 };
    var listC = listA.Except(listB);

    foreach (var item in listC)
    {
        Console.WriteLine(item);
    }
}

In the Context of LINQ

The type of code listed above is useful, but it might be helpful to see these same operators used in the context of a LINQ query expression. You can then see how they can be used to analyze the results of queries to better understand the data that is returned.

You probably know that there are two similar collections used to create lists. One is the generic List collection and the other is the old-style collection called ArrayList. We can use set operators to help us better understand the difference between these two classes.

Here are two queries retrieving the methods from the List class and the ArrayList class:
var queryList = from m in typeof(List<int>).GetMethods()
                where m.DeclaringType == typeof(List<int>)
                group m by m.Name into g
                select g.Key;

var queryArray = from m in typeof(ArrayList).GetMethods()
                 where m.DeclaringType == typeof(ArrayList)
                 group m by m.Name into g
                 select g.Key;


Here is code to retrieve the interesection of these two lists:
var listIntersect = queryList.Intersect(queryArray);
And here is code that displays the resulting sequence:
var queryList = (from m in typeof(List<int>).GetMethods()
                 where m.DeclaringType == typeof(List<int>)
                 group m by m.Name into g
                 select g.Key).Intersect(from m in typeof(ArrayList).GetMethods()
                                         where m.DeclaringType == typeof(ArrayList)
                                         group m by m.Name into g
                                         select g.Key);

In either case, the following list would be displayed:

get_Capacity
set_Capacity
get_Count
get_Item
set_Item
Add
AddRange
BinarySearch
Clear
Contains
CopyTo
GetEnumerator
GetRange
IndexOf
Insert
InsertRange
LastIndexOf
Remove
RemoveAt
RemoveRange
Reverse
Sort
ToArray

And here is how to see the items that the generic lists supports that are not part of the old style collection:
var listDifference = queryList.Except(listIntersect);
And here is the result of this query:

ConvertAll
AsReadOnly
Exists
Find
FindAll
FindIndex
FindLast
FindLastIndex
ForEach
RemoveAll
TrimExcess
TrueForAll

Now you have a list of the methods the two classes share in common, and a list showing what the new generic class has that is not part of the older collection. The LINQ set operators made it easy for you to discover this information.

LINQ - Understanding IEnumerable

The IEnumerable interface is a key part of LINQ to Objects and binds many of its different features together into a whole. This series of posts explains IEnumerable and the role it plays in LINQ to Objects. If you hear people talking about IEnumerable, and sometimes wished you better understood its significance, then you should find this text helpful.

Collections and IEnumerable

Though LINQ to Objects can be used to query several C# types, it cannot be used against all your in-process data sources. Those that can be queried all support the IEnumerable interface. These include the generic collections found in the System.Collections.Generic namespace. The commonly used types found in this namespace include List, Stack, LinkedList, Queue, Dictionary and Hashset.

All of the collections in the System.Collections.Generic namespace support the IEnumerable interface. Here, for instance, is the declaration for List:

public class List<T> : IList<T>, ICollection<T>, IEnumerable<T>, IList, ICollection, IEnumerable

You will find IEnumerable listed for all the other generic collections. It is no coincidence that these collections support IEnumerable. Their implementation of this interface makes it possible to query them using LINQ to Objects.

LINQ to Objects and IEnumerable

Consider the following simple LINQ query:

List<int> list = new List<int> { 1, 3, 2 };

// The LINQ Query expression
var query = from num in list
            where num < 3
            select num;

foreach (var item in query)
{
    Console.WriteLine(item);
}

The type IEnumerable plays two key roles in this code.
* The query expression has a data source called list which implements IEnumerable.
* The query expression returns an instance of IEnumberable.
Every LINQ to Objects query expression, including the one shown above, will begin with a line of this type:

from x in y

In each case, the data source represented by the variable y must support the IEnumerable interface. As you have already seen, the list of integers shown in this example supports that interface.

The same query shown here could also be written as follows:

IEnumerable<int> query = from num in list
                         where num < 3
                         select num;

This code makes explicit the type of the variable returned by this query. As you can see, it is of type IEnumerable. In practice, you will find that most LINQ to Objects queries return IEnumerable, for some type T. The only exceptions are those that call a LINQ query operator that return a simple type, such as Count():

int number = (from num in list
              where num < 3
              select num).Count();

In this case the query returns an integer specifying the number of items in the list created by this query. LINQ queries that return a simple type like this are an exception to the rule that LINQ to Objects queries operate on class that implement IEnumerable and return an instance that supports IEnumerable.

Composable

The fact that LINQ to Objects queries both take and return IEnumerable enables a key feature of LINQ called composability. Because LINQ queries are composable you can usually pass the result of one LINQ query to another LINQ query. This allows you to compose a series of queries that work together to achieve a single end:

List<int> list = new List<int> { 1, 3, 2 };

var query1 = from num in list
             where num < 3
             select num;


var query2 = from num in query1
             where num > 1
             select num;


var query3 = from num1 in query1
             from num2 in query2
             select num1 + num2;

Here the results of the first query are used as the data source for the second query, and the results of the first two queries are both used as data sources for the third query. If you print out the results of query3 with a foreach loop you get the numbers 3 and 4. Though it is not important to the current subject matter, you might have fun playing with the code to understand why these values are returned.

Summary

By now it should be clear to you that IEnumerable plays a central role in LINQ to Objects. A typical LINQ to Objects query expression not only takes a class that implements IEnumerable as its data source, but it also returns an instance of this same type. The fact that it takes and returns the same type enables a feature called composability.

The next logical question would be to ask why this type plays such a key role in LINQ to Objects. One simple answer would be that the creators of LINQ decided that it should be so, and hence it is so. But one can still ask why they picked this particular type. What is it about IEnumerable that makes it a useful data source and return type for LINQ to Objects queries? The answer to that question will be in your head.

Thoughts on becoming a professional tester

"If a man is called to be a streetsweeper, he should sweep streets even as Michelangelo painted, or Beethoven composed music or Shakespeare wrote poetry. He should sweep streets so well that the hosts of heaven and earth will pause to say, here lived a great streetsweeper who did his job well."- -Rev. Martin Luther King Jr.

Lots of the new comers are confused with the changes in the daily works, even suffer from them. Throughout life our environment changes. That's just the way it is. New 'doors' open, and some 'doors' close. I often recall my first manager at Microsoft(called Lu Jin ^_^) telling me that it is up to me to control my own destiny at the company. But, I had been around long enough to realize that life also sometime throws us curves or unexpectedly changes direction, and Microsoft is certainly a dynamic company. It is often during times of change where new and exciting opportunities present themselves. Even during times of change I believe we still control our own destiny (at least somewhat). When our environment changes (as it always does) we generally have many choices. For example, often we can embrace the change and dynamically adapt and/or rise up to new challenges that life presents. Or, we can choose to wallow in self-pity, pretend we are a victim of some evil plot, hypercriticize the change with dogmatic arrogance, and incessantly bemoan the dubiously negative aspects of the change from an often overly emotional narrow-minded perspective. This latter choice is usually not an especially productive one (personally or professionally), and generally only demonstrates one's extremely biased, and limited view and their incapacity to grasp the "big picture."

Let's face it; we have chosen to work in one of the most dynamic industries in the world. Change is all around us, although some things remain relatively the same. For example, the techniques medical doctors use in the initial screening of certain maladies have remained relatively constant for decades, but at the same time advancements medical imagery has made tremendous technological leaps forward. Likewise, the practice of exploratory testing has been used extensively in software testing for decades, but the effective application of combinatorial analysis (pair-wise, triplet, n-wise) of interdependent or semi-coupled parameters has only recently become a mainstream best practice.

The testing discipline is undergoing tremendous change these days. There are many people around the world who are serious about maturing and advancing our profession. Some ideas are great, some still need to be refined, but at least they are seriously investigating at ways to advance the profession. As a tester working in a rapidly changing industry we must constantly re-evaluate our skills, and increase our professional knowledge of software testing and computer systems in general in order to provide the best possible service to our employer.

I think if someone chooses testing as a profession, then they should strive to become a professional in the discipline, and develop and refine the skills and knowledge that entails.

2009年1月4日星期日

More on the LINQ Aggregate Operators

The LINQ aggregate operators allow you to perform simple math operations over the elements in a sequence. This post is designed to walk you through those operators, and give you an overview of how to use them. Table 1 shows a list of the 7 aggregate operators.

Except for the Aggregate operator itself, all of these operators have a simple, obvious default use. Several of these operators, do, however, have overloads that need a few sentences of explanation. I will show you one simple example of using the default behavior for the operators, and then dive a bit deeper with a second example that shows how to use at least one of the overloads.

The Count and LongCount Operators

The Count and LongCount operators return the number of elements in a sequence. The Count operator can find this number quickly by simply asking objects such as List that support the ICollection interface for the count. If that service is not available, then LINQ iterates over the items in a list to get the count. The LongCount operator provides the same basic functionality, but allows you to work with an Int64. A simple example of using the Count operator is shown in below:

public void ShowCount()
{
    var list = Enumerable.Range(5, 12);
    Console.WriteLine(list.Count());
}

The overloads for Count and LongCount allow you to pass in a lambda expression that performs custom calculations from which LINQ can derive the count for a sequence. For instance, you can write code that returns the number of even numbers in a collection:

var list = Enumerable.Range(1, 25);

Console.WriteLine("Total Count: {0}, Count the even numbers: {1}",
    list.Count(), list.Count(n => n % 2 == 0));

Our list consists of the numbers between 1 and 25. We call count once with the first version of the Count operator and get back the number 25.
The second overload of the Count operator takes a simple predicate. The declaration looks like this:

public static int Count<TSource>(this IEnumerable<TSource> source,
   Func<TSource, bool> predicate);

The predicate takes an integer and returns a bool specifying whether or not a particular value from the list passes a test. In our case, we simple ask whether or not the number is even. This computation will return the values 2, 4, 6 and so on up to 24, for a total of 12 elements.

The Min and Max Operators

The Min and Max operators are equally simple. codes below show how it works. The first shows the behavior of the first overload of Min and Max, the second shows how to one of the other overloads to pose slightly more complex questions.

public void ShowMinMax()
{
    var list = Enumerable.Range(6, 10);
      Console.WriteLine("Min: {0}, Max: {1}", list.Min(), list.Max());
}

Our list consists of the number 6 through 15, so the code writes out the values 6 and 15 to the console.The C# source that implements Min and Max use the IComparable or IComparable interfaces to perform the calculations. If you pass in a null argument you will get an ArgumentNullException.
For the more complex examples, I'm going to need a few rows of simple data, which I provide in this part of codes.

class Item
{
    public int Width { get; set; }
    public int Length { get; set; }

    public override string ToString()
    {
        return string.Format("Width: {0}, Length: {1}", Width, Length);
    }
}


private List<Item> GetItems()
{
    return new List<Item>
    {
       new Item { Length = 0, Width = 5 },
       new Item { Length = 1, Width = 6 },
       new Item { Length = 2, Width = 7 },
       new Item { Length = 3, Width = 8 },
       new Item { Length = 4, Width = 9 }
    };
}

There is no simple way to know maximum or minimum values from a list of Items. To find the largest Item do you choose the element with the greatest Length, the greatest Width, or some other value? To solve this problem the C# teams provided us with an overload of the Min and Max operators that take a delegate that we can use to select the proper value for the comparison:

public static int Max<TSource>(this IEnumerable<TSource> source, Func<TSource, int> selector);

Like nearly all the LINQ to Objects operators, Max is implemented as an extension method for the class IEnumerable. It takes an extremely simple lambda that is passed an element from the enumeration and returns an integer. To see how this works, take a look at this.

List<Item> items = GetItems();
ShowList(items);
Console.WriteLine("MinLength: {0}, MaxLength: {1}",
   items.Min(l => l.Length ), items.Max(l => l.Length));

As you can see, Min and Max both take a very simple delegate, which is implemented here as a lambda.

The lambda that is passed to Min looks like this: l => l.Length. This is lambda is so simple that it can be a bit confusing to people who are new to LINQ. Let's take one moment to be sure we understand what is happening.

We know that this LINQ operator must iterate over the sequence passed in to it, and we can assume that it passes each item it finds to the selector delegate. It then tests the result returned from selector, to see if it is the largest value returned. Without peeking at the real source code, it seems that Max might do something like the code in below.

public static class MyExtensions
{
    public static int Max<TSource>(this IEnumerable<TSource> source
      Func<TSource, int> selector)
    {
        int largest = int.MinValue;
        foreach (var item in source)
        {
            int nextItem = selector(item);
            if (nextItem > largest)
            {
                largest = nextItem;
            }
        }
        return largest;
    }

Assuming that we are working with a collection of Items, then selector, were it implemented as a standard method, would have to look something like this:

public int selector(Item item)
{
    return item.Length;
}

This method is semantically identical to the delegate we used in listing x: l => l.Length. It is very simple code that tells us which part of the Item class we are going to use to determine our max value.

It's all so simple that one feels a little like a character in Edgar Allan Poe's "The Purloined Letter:" the answer was hidden in plain sight. Once again we see that the biggest impediment to learning LINQ is the fear that it might be complicated. In practice, it is almost startlingly simple.

The Average Operator

Once one understands the pattern shown in our examination of the Min and Max operators, we find that it can be easily applied to most of the other Aggregate operators. Let’s look at the Average operator, which returns the average value from an enumeration.

For instance, one can find the average for a range of numbers like this:

var list = Enumerable.Range(0, 5);
  Console.WriteLine("Average: {0}", list.Average());

When run, this code tells us that the average of the numbers 0, 1, 2, 3, 4 is the value 2.

When working with a collection of Items, we face the same problem we had with Min and Max: How does one discover the average value for list of Items that define two properties called Length and Width? The answer, of course, is that proceed just as we did with Min and Max operators:

List<Item> items = GetItems();
double averageLength = items.Average(l => l.Length);
double averageWidth = items.Average(w => w.Width);
double averageValue = items.Average(v => v.Length + v.Width);
Console.WriteLine("AverageLength: {0}, AverageWidth: {1} AverageValue: {2}", averageLength, averageWidth, averageValue);

Again, we pass in very simple lambdas such as l => l.Length + l.Width or w => w.Width. Somewhere in the background code similar to what you see in the custom implementation for the Max operator found in listing 5.X. The code must iterate over the list, passing in each item to our lambda, which defines the value we want the Average operator to use in its calculations:

AverageLength: 2, AverageWidth: 7 AverageValue: 9

The Sum Operator

The Sum operator tallies the values in an enumeration. Consider the following simple example:

var list = Enumerable.Range(5, 3);
Console.WriteLine("List sum = {0}", list.Sum());

Our list consists of the numbers 5, 6 and 7. The Sum operator adds them together, producing the value 18.

working with a list of Items, the Sum operator faces the same problem we saw with the Min, Max and Average operators. It should come as no surprise that the solution is nearly identical:

var items = GetItems();
Console.WriteLine("Sum the lengths of the items: {0}", items.Sum(l => l.Length));

Here is the same pattern you saw with the Average, Min and Max operators: we pass in a simple lambda to help the Sum method know which part of an Item it should use as the operand when performing its simple addition. The result printed to the console is the value 10. If only the rest of our lives were quite this simple!

The Aggregate Operator

The Aggregate operator follows in the footsteps of the Sum operator, but it provides us with a few more options. Rather than taking a simple delegate like the other operators in this series, it asks for one similar to the lambda we worked with in a previous post.

public static T Aggregate<T>(this IEnumerable<T> source, Func<T, T, T> func);

We know what do to with delegates that looks like this. We could, for instance, create one that adds up a range of numbers:

var list = Enumerable.Range(5, 3);
Console.WriteLine("Aggregation: {0}", list.Aggregate((a, b) => (a + b)));

The aggregate operator gets passed the numbers 5, 6 and 7. The first time the lambda is called it gets passed 5 and 6, and adds them together to produce 11. The next time it is called it is passed the accumulated result of the previous calculation plus the next number in the series: (11 + 7) which yields 18. This is the same result we saw for the Sum operator in the previous section. This overload of the Aggregate operator is indeed very similar to the Sum operator, though it is more flexible, in that you can easily perform multiplication, division, subtraction and other operations instead of simple addition. For instance, this code performs multiplication, yielding the value 210:

list.Aggregate((a, b) => (a * b))

Before pushing on, I should backtrack a little and discuss two simple points that are often brought up when people talk about this first version of the Aggregate operator. If it is passed a list with one item, it returns that item. If it is passed a list with 0 items, it throws an InvalidOperationException.

A second overload of the Aggregate operator allows you to seed the process with an accumulator:

public static TAccumulate Aggregate<TSource, TAccumulate>(
    this IEnumerable<TSource> source, TAccumulate seed,
    Func<TAccumulate, TSource, TAccumulate> func);

This is essentially the same operator as shown in the previous example, but now you can decide the starting point for the value that will be accumulated:

Console.WriteLine("Aggregation: {0}", list.Aggregate(0, (a, b) => (a + b)));

If we pass in a list with one item in it, say the number five, then the first time the lambda is called it would be passed the seed plus the sole item in the list:

(0 + 5)

The result, of course, is the number 5.

Suppose we pass in an accumulator of 0 plus the numbers 5, 6, 7.

var list = Enumerable.Range(5, 3);

Console.WriteLine("Aggregation: {0}", list.Aggregate(0, (a, b) => (a + b)));

In this case we would step through the following sequence:

0 + 5 = 5
5 + 6 = 11
11 + 7 = 18.

Again, we are doing essentially what we did with the Sum operator.

If you pass in a different seed, then you get a different result:

Console.WriteLine("Aggregation: {0}", list.Aggregate(3, (a, b) => (a + b)));

With a seed of 3, we get:

3 + 5 = 8
8 + 6 = 14
14 + 7 = 21

As mentioned earlier, the Aggregate operator allows us to perform not just addition, but multiplication, division or various other binary mathematical operations:

Console.WriteLine("Aggregation: {0}", list.Aggregate(1, (a, b) => (a * b)));

In this case the series looks like this:

1 * 5 = 5
5 * 6 = 30
30 * 7 = 210

Note that I passed in an accumulator equal to 1, so that we did not end up with the following series of operations:

0 * 5 = 0
0 * 6 = 0
0 * 7 = 0

In what I sometimes suspect might have been an excess of good spirits, the team added one final overload to the Aggregate operator:

public static TResult Aggregate<TSource, TAccumulate, TResult>(
                this IEnumerable<TSource> source, TAccumulate seed,
                Func<TAccumulate, TSource, TAccumulate> func,
                Func<TAccumulate, TResult> resultSelector);

This overload is nearly identical to the previous overload, but you are given one more, very simple, delegate that you can use to transform the result of your aggregation. For instance, consider this use of the Aggregate operator:

Console.WriteLine("Aggregation: {0}", list.Aggregate(0, (a, b) => (a + b),
    (a) => (string.Format("{0:C}", a))));

Please notice that the first two-thirds of this call mirror what we did earlier, and only the third parameter is new.

Suppose we pass in a sequence with the values 5, 6 and 7. As we've already seen, the process will begin by performing the following series of operations:

0 + 5 = 5
5 + 6 = 11
11 + 7 = 18

Once we have our result of 18, this number is passed to the last lambda in our call. It uses the string's Format method to transform it into a string in currency format:

$18.00

Like nearly everything in LINQ, this seems terribly complicated at first only to end up being reasonably simple. It is these kinds of simple operations, however, which provide us with the building blocks out of which we can safely create complex programs. This is what we mean when we apply the word elegant to a technology.

LINQ - More on Aggregate Operator

In this post we'll work with a list of words.

The code we will look at reverses the following string:

* The end is the beginning, the beginning the end

If we pass this string in to our method, it will yield this result:

* end the beginning the beginning, the is end The

Here is the method that we will use:

private string ReverseSentence(string sentence)
{
  string[] words = sentence.Split(' ');          

  return words.Aggregate((accumulator, b) => b + " " + accumulator);
}

The method first breaks the sentence down into a list of words:

string[] words = sentence.Split(' ');

It passes this list to the LINQ to Objects Aggregate operator. This overload of that operator takes a single lambda expression as a parameter. The lambda expression takes two parameters (accumulator, b) and returns them in reverse order (b accumulator). If it were a standard method rather than a lambda expression, it would look like this:

private static string Reverse(string accumulator, string b)
{
   return b + " " + accumulator;
}

The method would be called like this:

words.Aggregate(Reverse);

In our code, however, we use lambda expressions rather than standard methods.
As you recall from the previous post, the Aggregate operators passes in items two at a time from a list. The first parameter becomes the accumulator for the aggregation. This is why I chose to give the first parameter in the lambda expression the name accumulator:

return words.Aggregate((accumulator, b) => b + " " + accumulator);

You can watch the aggregation process take place in the debugger. Set a breakpoint on the call to Aggregate, and add the variables accumulator and b to your watch list. Step through the method with the F11 key. You can watch the result "accumulate" in the accumulator variable.

The process begins by passing the first two words into the lambda expression, where the first one is the accumulator:

* (The) (end)

The method reverses their order and places a space between them:

* end The

The next iteration sends in the result of the previous expression, plus the next word from the list:

* (end the) (is)

The lambda expression reverses them:

* is end The

The Aggregate operator now processes the result of the previous iteration in the accumulator plus the next word:

* (is end The) (the)

Again our lambda expression swaps the parameters and places a space between them:

* the is end The

Next time we pass in the following:

* (the is end The) (beginning,)

This yields:

* beginning, the is end The

The rest of the cycle looks like this:

* (beginning, the is end The), (the) >===> the beginning, the is end The
* (the beginning, the is end The) (beginning) >===> beginning the beginning, the is end The
* (beginning the beginning, the is end The) (the) >===> the beginning the beginning, the is end The
* (the beginning the beginning, the is end The) (end) >===> end the beginning the beginning, the is end The

I find this an immensely satisfying process to contemplate simply because it so very logically runs counter to my intuitions. I keep expecting it to do something different, but when I think it through I see exactly why it does what it does. How many real world uses for this I will discover I don't know, but I will treasure each one as I encounter it.

Listing 1 shows the complete code for this simple program. In this version of the ReverseSentence method, you see how to convert the sentence int List rather than an array of string. I use the ToList() LINQ operator to perform this conversion, so you are seeing a sneak peak of an operator I haven't yet covered in this series.

using System;
using System.Collections.Generic;
using System.Linq;

namespace LinqFarmSeed02
{
    class Program
    {
        private static string ReverseString(string sentence)
        {
            List<string> words = new List<string>(sentence.Split(' ').ToList());

            return words.Aggregate((a, b) => b + " " + a);           
        }

        static void Main(string[] args)
        {
            string sentence = "The end is the beginning, the beginning the end";
            Console.WriteLine(sentence);

            string result = ReverseString(sentence);
            Console.WriteLine(result);
        }       
    }
}

In this post you have seen how to use the Aggregate operator to reverse the order of a list of words. Thinking through the way this operator works on a list of strings can help you understand exactly how it is implemented and exactly what it does.

订阅：博文 (Atom)

GIENO MIAO'S BLOG