Thursday, September 24, 2009

Dr. Dro

Finally. As of about 9:00am on Tuesday, September 22nd, my doctoral thesis was deposited at the University of Wisconsin - Madison. I am now officially Dr. DeRose.

I look forward to having a life again.

I won't bother with the details of the thesis, but I did want to share my Acknowledgments. They are below.

This dissertation is like the final exam of a long class. And, like many finals, it is cumulative; paraphrasing Prof. Jeff Erickson, "the final will cover everything you have ever learned in your entire life, with a focus on what you learn here". In this context, I have innumerable people to thank, so the ones below are but a few highlights.

First, to my parents: thanks for staying in the United States, despite the personal sacrifice. Thanks for teaching me that C's are unacceptable because I can do better. And thanks for always supporting me, and wanting what's best for me, especially when times got tough.

To my father, special thanks for sparking my interest in programming. I don't remember how young I was when he gave me the Commodore 64. I know I was 11 when, as I started to play computer games, he handed me "The C Programming Language" and told me to stop playing and start making. To him I owe the epiphany of "so that's why pointers are useful!" If not for these things, I don't know where I'd be now.

Thanks also to Joseph Smarr and Steve Severinghaus. They turned a coding hobby into a love of computers and computer science. They taught me, through practice, example, and critique ("line after line of ugly, ugly code"), that software can solve real problems.

Many thanks to Phil Bohannon for teaching me that people live in different worlds. He showed me that so much of collaboration is discovering the benefits of other people's worlds, and sometimes leaving yours to live in theirs.

Finally, I'd like to thank my advisors, AnHai Doan and Raghu Ramakrishnan. They imparted to me their passion for data management. They also put up with me long enough to teach me immeasurably about how to think and how to communicate. These lessons will be priceless anywhere I go from here.

Tuesday, August 25, 2009

Five-Minute Ph.D.

Okay, I lied, both about how often I'd start posting, and about the content of the next post. Briefly put, finishing a thesis and preparing for a defense while working a full-time job is both painful and time consuming. My defense is September 18th, after which there will be much rejoicing, and possibly more consistent posting.

Until then, here's a cross-post from my other blog. I'm behind on that one, too, but since it's a work commitment, I write there before I write here.


As my Ph.D. defense nears, I'm thinking a lot about the most important lessons:
  • Don't look for reasons to fail; find ways to succeed. If something should or must be done, find a way to do it.
  • First figure out the right thing to do. Only then think about implementation, and see how close you can come. Even if you can't reach the ideal, at least you'll be pushing in the right direction.
  • Any good problem solver can hack a good solution quickly. What's more valuable is identifying the underlying problem, and how it relates to other problems. This tells you if something is a true solution, and helps discover other opportunities.
  • Think in a structured, disciplined way. First, separate out orthogonal issues. Then, solve them incrementally and iteratively. Don't try to attack the whole mess at once.
  • Finally, when communicating with others, try to tell a story. Start with something familiar, then make sure your ideas flow.

Those are the big ones. The gems of a Ph.D. education, in five easy minutes. Interestingly, none of these are particularly technical. But deeply technical things are limited in application. I think that's the real secret: the work you do in a Ph.D. is technical, but a good Ph.D. is about becoming a better thinker and communicator.

Tuesday, June 23, 2009

Program Manager Blog

I should also mention that I've started a new, more work-related blog called Learning Program Management. It's primarily about what I've been picking up in my new role at Microsoft.

The division between the two blogs is simple: if it deals with Program Management, Microsoft, or SQL Server, it'll go to Learning PM. Anything else I think worth sharing will go here.

Back on Earth Again

It's been almost a year since my last post, which is very poor of me. I don't know how it works for others, but when I make major life changes, I cocoon for a while.

Since my last post, I have gotten married, honeymooned in New Zealand, visited Brazil, moved to Seattle, started my new job, and bought a house (as you can see, Sarah has been much better than I about staying in touch). One important item not on the list is "defended my thesis", so there are more big changes to come. Nevertheless, I've mostly gotten my feet under me again, so it's time to come out of hibernation. Also, to mix as many metaphors as I can in one post, apparently.

So, if there's anyone still following this out there, apologies, and hello again. If you'd like to get in touch, my old email, IM, and phone number are all the same. All that's changed is my address:

Pedro DeRose
6704 E Crest View Loop SE
Snoqualmie, WA 98065

Now that I've poked my head back out, I'm going to try and be better about posting periodically. Next time: what I learned (or at least remember from what I learned) about buying a house.

Monday, August 18, 2008

New Phone Number

My phone bricked. I'll expand this post with details later, but for now, what's important is that my new number is 217-369-6980 until further notice.

Update: My number is back to 217-898-9662. Been that way for a while, actually. However, what with major life events (marriage, honeymoon, visiting family in Brazil, and moving to Seattle), I haven't had time to post updates. I'll try to be better about it in the upcoming weeks.

Friday, July 18, 2008

Restricting method access in Perl objects

This a geeks-only entry in the "Perl: Handy, but Ugly" series...

I often want to restrict access to certain methods in a class. One classic example is public and private methods. As another, I've written a class for data storage with both read and write methods, and sometimes I want an instance to be read-only, and other times write-only. I could implement this with an internal read/write flag. However, while I want that flag to be flippable, I don't want just anyone flipping it. That sort of thing is hard to do in Perl because it doesn't believe in enforced privacy.

Fortunately, Perl does believe in being powerful and flexible. So I've found a neat way of wrapping object instances in what I call adapters, which expose only a subset of the object's methods.

The basic desiderata are as follows:

  1. The adapter should be an object wrapping another object.
  2. It should only define the methods it exposes, so that the wrapped object's unexposed methods aren't even there
  3. There should be no way of getting to the wrapped object through the adapter (otherwise, you can get to the unexposed methods)
  4. Finally, I don't want to write a new adapter for every class I want to wrap, or every subset of methods I want to expose

Wait a second, you say. I want adapters to be classes defining a custom set of methods, but I don't want to write a new adapter each time? Yes. And because Perl is "Handy, but Ugly", I can do it.

The trick is that Perl gives you direct access to the symbol table: that magical hash that knows what reference you mean when you use a variable or subroutine name in your code. And since a class is just a set of symbols, it's possible to create a class entirely on the fly just by inserting the proper subroutine references into the symbol table.

With that, I present my AdapterFactory perl module. It's fairly well commented, so I'll leave groking it as an exercise for the reader. A couple of hints:

  • With no strict, a string can be dereferenced as if it were a reference to the variable whose name is the string's value. This works only for non-lexical variables (i.e., those not defined with "my"). For instance, $h = "hash"; %$h is equivalent to $h = \%hash; %$h, or %hash
  • For some reason, even with use strict, strings on either side of the arrow operator can be dereferenced to the package or method whose name is the string value. For instance, $p = "Package"; $m = "new"; $p->$m() is equivalent to Package->new()
  • The symbols for a package are kept in a hash with the name of the package plus "::". Thus, symbols for package "foo" are kept in hash %foo::
  • The * sigil is used to set values in the symbol table
AdapterFactory.pm
###
# Author: Pedro DeRose
# Creates adapters, or objects that wrap another object, but expose only a
# subset of its methods. Useful for separating public/private methods, or
# restricting functionality. Does not provide any handle to the object itself.
#
# Usage example:
#
#     use AdapterFactory qw(defineAdapter adapt)
#
#     defineAdapter('Foo::Public', [ qw(get set print) ]);
#     my $fooAdapter = AdapterFactory::Foo::Public->new($fooObj);
#     my $barAdapter = adapt('Foo::Public', $barObj);
#     
#     defineAdapter('Foo::Private', { secret => [ 'default' ] });
#   
#   Defines the AdapterFactory::Foo::Public adapter exposing the get(), set(),
#   and print() methods, then creates adapters wrapping $fooObj and $barObj.
#   Finally, defines the AdapterFactory::Foo::Private adapter exposing the
#   secret() method, and specifies that "default" should always be passed to it.
###   
package AdapterFactory;
use Exporter 'import';
@EXPORT_OK = qw(defineAdapter adapterDefined adapt);

use strict;

# Keep map of adapter to object as a lexical variable so that adapter objects
# don't store the object themselves, where other code can get to it.
my %adapterToObj;

###
# Defines a new adapter class whose name is the name of this class, plus "::"
# then the given name appended (e.g., given name "Foo::Bar", the name is
# "AdapterFactory::Foo::Bar"). It wraps the object passed to its new()
# constructor, exposing the specified methods. Methods can be specified in two
# ways. When an array reference of method names, they are called directly. When
# a map from method name to an array reference of arguments, the adapter's
# methods call the wrapped object's methods with the given arguments always
# appended. See the usage example above for how to use the adapter.
#   name: the name of the adapter class
#   methods_r: reference to methods to expose
#   returns true if the definition was successful, false otherwise
###
sub defineAdapter {
    my ($name, $methods_r) = @_;
    $name or die "Missing name";
    ref($methods_r) eq 'HASH' or ref($methods_r) eq 'ARRAY' or die "Bad methods";

    if(adapterDefined($name)) {
        warn "Adapter $name already exists.";
        return undef;
    }

    # Lots of symbol table manipulation, so stop yer whining
    no strict;

    # Compose the adapter class name
    my $class = __PACKAGE__."\::$name";

    # Turn method array ref into method hash ref with no method arguments
    if(ref($methods_r) eq 'ARRAY') { $methods_r = { map { ($_ => []) } @$methods_r }; }

    # Directly create symbol table entry for each exposed method.
    foreach my $method (keys %$methods_r) {
        my @args = defined($methods_r->{$method})? @{$methods_r->{$method}} : ();
        *{"$class\::$method"} = sub {
            # Look up object using adapter's reference, then call the method
            my $self = shift;
            return $adapterToObj{$self}->$method(@_, @args)
        };
    }

    # Create the constructor last, so it clobbers any "new" method in methods_r
    *{"$class\::new"} = sub {
        my ($class, $obj_r) = @_;

        # Map the given obj to this adapter
        my $self = {};
        bless($self, $class);
        $adapterToObj{$self} = $obj_r;

        return $self;
    };

    return 1;
}

###
# Returns whether an adapter with the given name is already defined
#   name: the name of the adapter class
#   returns true if an adapter with the name is defined, false otherwise
###
sub adapterDefined {
    my ($name) = @_;
    no strict;
    return scalar(%{__PACKAGE__."\::$name\::"});
}

###
# Creates and returns an adapter for a given object. Equivalent to calling the
# new() constructor on the adapter created with the given name, and passing the
# given object.
#   name: the name of the adapter class
#   obj_r: reference to the object being wrapped
###
sub adapt {
    my ($name, $obj_r) = @_;
    $name or die "Missing name";
    UNIVERSAL::isa($obj_r, 'UNIVERSAL') or die "Object must be a blessed reference";

    # Create and return the adapter
    my $class = __PACKAGE__."\::$name";
    return $class->new($obj_r);
}


1;

Thursday, July 3, 2008

Perl: Handy, but Ugly

In what will probably be a many-part series, here's an oddity of Perl that had me tearing out my hair for a couple of hours...

If you know Perl well, feel free to skip this paragraph. Perl has a handy but ugly notion of context. Specifically, code execute in either a scalar or a list context: if a single value is expected, the code executes in scalar context; if a list of values is expected, it executes in list context (that's vague, but good enough for now). Then, code behaves differently depending on the context.

One example of context is getting the length of a list. Given a list @foo = ('a', 'b', 'c'), then @foo in scalar context is the length of @foo. Thus, $x = @foo sets the single value $x to 3 (the code executes in scalar context because $x is a single value, so Perl expects a single value assigned to it).

Now for a pop quiz. If @foo = ('a', 'b', 'c'); $x = @foo sets $x to 3, what does $x = ('a', 'b', 'c') do? Turns out it sets $x to c. Fascinating, isn't it?

The reason is that the comma does different things in list and scalar contexts. In a list context, comma is the list building operator. Thus, ('a', 'b', 'c') in list context (such as when assigned to the list variable @foo) returns a list with three items. However, in scalar context, comma is like C's comma: it executes both its left and right operands, then returns the result of the right. For instance, 'a', 'b' returns b, and 'a', 'b', 'c' returns c. Thus, when we assign ('a', 'b', 'c') to a single value, the code executes in scalar context, returning c.

Of course, I wasn't lucky enough to have this bite me in such a simple form. Instead, consider this (still heavily simplified) example:

sub foo {
    $a = "hello";
    $b = "world";
    return ($a, $b);
}

print join(" ", foo()) . "\n";
print scalar(foo()) . "\n";
I naively thought this would print hello world then 2. Instead, we get hello world then world. Today's lesson, then: when returning lists from functions, assign them to a list variable first.

Thursday, June 26, 2008

Project Note Taking System

A while ago, I went looking for a good note taking system. Notes, as in on paper. I work on a lot of projects, and since I grok things better when I write them down, I needed a way to organize ideas, meeting minutes, tasks, and progress.

I found several hacks to turn my preferred notebook, a Moleskine into a full-fledged PDA replacement using GTD. However, I didn't want a PDA replacement. I wanted a simple way to organize project ideas.

I also found a lot of good note-taking systems. Of these, the Cornell system was closest to what I wanted. I liked the idea of taking notes, then adding higher-level comments off to one side. Unfortunately, page division doesn't work well in a small notebook, and the system isn't very project-oriented.

Thus, after some trial and error, I've mostly settled on something that works well for me. I begin with a large, graph paper Moleskine, though any notebook should work. Next, I take notes on the right-hand page, then write higher-level comments on the left page. That's the gist. The fun part is the details.

On the right-hand page, I always first write the date in the upper-right-hand corner. This makes finding old notes a lot easier. After that, I take notes however I like — outlines, drawings, mindmaps, whatever.

Then, both while writing notes and when reviewing them, I write higher-level comments on the left page. I find it useful to vertically align them with the part of the notes they comment on. Each comment is labeled, in the form Label: comment, so that I can immediately tell what kind of comment it is. I use five labels:

Topic
Every left-hand page has a single Topic comment first thing on the page. It's short phrases or keywords to remind me what the notes are about. By using only one per page and putting it at the top, it's easy to flip through the notebook and find notes about particular topics.
Thought
These are interesting thoughts about the notes, such as summarizations, ideas, etc.
Tip
Often my notes include good lessons, so Tips are things I want to do differently in the future.
Task
These are things that I need to do based on the notes. As I do them (or move them to a better task management system), I check them off.
Tack/Tank
This pair of labels keeps track of tacks we've taken in the project, and why we've decided to tank them. I use them because I found that projects often cycle back to old ideas without remembering the very good reasons they were killed in the first place. To illustrate their use, suppose we have a project meeting on Monday and decide to use MySQL. My notes on the right-hand page contain our reasoning, and I add "Tack: use MySQL" to the left-hand page, leaving some space underneath. On Tuesday, we change our minds, and decide to use SQLite instead. So now I add "Tack: use SQLite" to Tuesday's left-hand page. Then, I go back to Monday's page, and under the "Tack: use MySQL" comment, I add a Tank comment explaining why we're no longer using MySQL.

That's it. Fairly easy to use and well organized, and relatively easy to find information later. Of course, it's not ideal. What I really want is a lightweight tablet PC, about the size of my Moleskine but all screen, with a swivel keyboard, and nice note software with tags, tree-structure organization, and handwriting search. But before saying such things exist, I also want it to be affordable. Good luck to me. Until then, I'll keep buying Moleskines.

Thursday, May 29, 2008

Industry vs. Research

As a graduate student looking for jobs, a common question I heard was "Industry or research?" Industry jobs include developers, technical managers, and even applied researchers to a large degree. Fundamental research jobs are professors and research scientists at industrial labs, such as Microsoft Research and Yahoo! Research. The definitions are somewhat fuzzy (applied research is industry? industrial labs are research?), but a generally distinguishing characteristic is whether publishing papers is a primary aspect of the job.

It was this characteristic that made me realize the fundamental difference between industry and research. It's one I wish I had known when I started graduate school. In short,

Industry is primarily about selling products, while research is primarily about selling stories

This is why publishing papers is telling: papers are a medium for selling stories. Of course, a good story helps sell a product, and a working product helps sell a story. So there is definite overlap. However, it's telling how well the pros and cons of industry and research derive from this basic difference.

To illustrate, consider some classic pros and cons. In industry, since you sell products, your work has direct impact on people that use the product. Since people will typically pay for this impact, the product itself is the source of funding. And if your product is sufficiently impactful, it is the source of a lot of funding, and you get rich. However, this means it's critical to quickly and consistently create marketable products. The result is a dampening effect on the problems targeted by industry: they are dictated by the market, and typically have shorter-term visions with fewer (or at least more calculated) risks.

In contrast, research has significantly more freedom in the problems it tackles. They are often longer-term, riskier visions. Research can do this because it only has to sell stories describing core ideas, not fully working products. Thus, it can focus on interesting technical problems. However, "selling" a story does not usually mean for money, but rather convincing people that it describes a good idea (e.g., getting a paper accepted to a conference). Since neither the story nor the idea generates money directly, researchers must seek out external funding such as grants, or, in industrial labs, income from products (which, to be fair, often contain the final fruits of research).

Given such pros and cons, the distinction of product vs. story seems obvious in hindsight. However, what made me first realize it was a more subtle situation. My advisor asked me to devise a data model for the system we're building. I came back with two options: a very common model, and a novel model that was simpler and more expressive. I favored the novel model, but my advisor said we should use the common one. His reason was that the data model was not our primary contribution, and papers with too many innovations can confuse readers. And he was right. Even though the novel model would make for a better system, the common model makes for a better story — and I'm currently in the business of selling stories. At some later date, after we sell our current story, we may sell another story that focuses on a new data model.

To conclude, I want to say that this isn't meant to promote either industry or research. In my particular case, I've found that I lean more towards selling products than stories. However, I've spoken with both developers and researchers, and both agree with the product vs. story differentiation, and each prefers their side. Of course, I'd love to hear from anyone else on the topic. I just think that understanding this difference is vital to making an informed decision about graduate school, and life afterwards.

Saturday, May 24, 2008

Job Decision

It's finally official. After nearly two months of applications, interviews, travel, negotiation, introspection, and extremely hard thinking, I've made a job decision. Come January next year, I'll start as a Program Manager in Microsoft's SQL Server team.

This was a very difficult decision, as I had to choose between five compelling offers. In the end, there were two primary considerations: location, and how I want to contribute to my field.

My offers spanned two locations: Microsoft in Seattle, and the others in Silicon Valley. I characterized my options as better quality of life in Seattle vs. proximity to networking and friends in Silicon Valley. Seattle's quality of life is better due to lower cost of living, much cheaper housing (I can actually afford a nice house my first year), and significantly nearer mountains. It also feels more laid-back. On the other hand, Silicon Valley hosts constant interaction between innumerable tech companies, providing excellent networking opportunities and mobility. Also, several of my friends live there.

For me, Seattle and Silicon Valley were effectively tied. However, this was a two-person decision, so Sarah joined me in visiting both places. She met and loved my Silicon Valley friends, and received a great tour of Seattle courtesy of Microsoft. Sarah sees locations differently than I do. I pick a job, and that decides the location; Sarah picks a location, then finds a job. Location is part of how she defines where she wants her life headed. As it happens, before we were engaged, she was already looking to move to the Pacific Northwest. Thus, though she liked California, and especially my friends, Washington is closer to where she wants to be. This was one consideration.

The other strong consideration clarified after many conversations with mentors. The key question is how I want to contribute to my field. One path is as a technical luminary, with primarily technical contributions. This path includes god-like developers, researchers, and other deeply technical people. My offers at IBM, Oracle, and Yahoo! followed this path. Another path is as a technical manager, with primarily leadership and strategic contributions. This path includes general managers, CEOs, and other big-picture people. My offers at Google and Microsoft followed this path. I've spent most of my life as a deep techie. However, due to some eye-opening experiences and a lot of introspection, I've decided that, at least currently, my calling is management and leadership.

Neither of these considerations alone decided me. But due to both together, plus several others secondary, I've accepted the Microsoft offer. A couple things in particular really impressed me about the position. First, I got to meet several team members, including my future boss, and they're all amazing. Second, Microsoft is very serious about investing in people and building careers, so the opportunities for mentorship and advancement are fantastic. I'm extremely excited, and really looking forward to starting. All that's left is to finish my doctorate!

Finally, to wrap up, I want to very sincerely thank everyone who helped me throughout this process. All of my mentors for their advice; all of my friends for their time, love, connections, and support; and all of my family for putting up with weeks of waffling (individuals may fall into more than one category). I know not everyone will be happy with my decision, but I hope you will all be happy for me. Of course, feel free to send along any particularly strong variations on "You fool!". I promise no hard feelings.