Friday
Jul302010

You should blog even if you have no readers

Spencer Fry wrote a great post on "Why entrepreneurs should write." I would further add that the benefits of writing are so extraordinary that you should write a blog even if you have no readers (and regardless of whether you're an entrepreneur).

I have over 50 unfinished drafts. Some of them are just a few ideas scribbled down arguing with myself. Most of them will never be published, yet I got value out of writing all of them.

Writing makes you a better reader

Blogging has changed how I read other people's writing.

In struggling to find the right ways to structure and present my posts, I am much more attuned to what makes a good argument and what makes a bad argument. I am better at seeing holes in other people's reasoning.

At the same time, when reading I am less likely to fall into the trap of discrediting a post with weak counterclaims. In most any post, there are likely to be counterclaims that are based on exceptional cases. Internet commenters love to point these out. However, these exceptional cases miss the main thrust of the post, and by understanding the implicit backdrop behind a post's argument, I get a lot more value out of reading.

I'm also more aware of the style of good writers. I mentally take note of the ways good writers phrase their ideas. I'd always enjoyed Paul Graham's writing, but now I really appreciate how he organizes his posts. He has an awesome ability to suck you into his world and show you what it looks like from his perspective. I've learned a lot about good writing from reading Bradford Cross's blog; his posts have a clear arc and make excellent use of short paragraphs to keep the posts flowing.

Writing makes you smarter

Writing reveals holes in your thinking. When your ideas are written and looking back at you, they're a lot less convincing than when they're just in your head. Writing forces you to mature your ideas by thinking through counterarguments.

Writing helps you organize your thoughts in a coherent way. This makes you a much better conversationalist when these topics come up. I can't count the number of times I've had deeper conversations with people because I had matured my ideas offline.

Consider anything else a side benefit

Everything else writing gives you -- personal branding, networking, inbound opportunities -- are just side benefits. They're potentially very large side benefits, but they are not the main reason you should write.

You should write because writing makes you a better person.

You should follow me on Twitter here.

Monday
Jul122010

My experience as the first employee of a Y Combinator startup

I'm the first employee of BackType, a Summer '08 YC company. My joining the company increased the company size by 50%. The experience has been awesome, but I will say up front that being the first employee of a startup is not for everyone.

The best part of being the first employee of a startup is the total exposure to all parts of the company. I've learned a ton about product development, customer development, recruiting, and entrepreneurship. Additionally, I've met and connected with lots of other awesome people through the YC network. I've gotten all these benefits at relatively low risk for myself, as I still have a salary and a solid chunk of equity.

No Rules

There are a lot of rules working at most companies. You don't even realize that some of the rules are rules until you work at a company with no rules. I'm talking about the most basic things like what hours you work, what days of the week you work, what tools you use, and whether you come into the office or not.

Because there are no rules, you have to be able to set your own direction and make decisions on your own.

Click to read more ...

Wednesday
Jun162010

Your company has a knowledge debt problem

When your company lacks experience in tools and techniques that can make it more productive, your company has knowledge debt.

Companies tend to operate in ways that exacerbate their knowledge debt problem. Consider this fairly typical job ad:

Initech is seeking an experienced Software Engineer to join
the engineering team.

Responsibilities

* Design core, back-end software components
* Analyze and improve efficiency, scalability, and stability
of various system resources

Requirements

* M.S. Computer Science or related field preferred
* 2+ years of Java experience
* Expert in relational data modeling and query
optimization using MySQL

I would posit a guess that this company uses Java for the majority of its work and uses MySQL on the back-end. Naturally, the company wants to recruit people who share that skill set and can "jump right in" and contribute.

This mindset is fundamentally flawed. A company should be hiring for problem solving skills, programming ability, and cultural fit, not for any specific skill set.

If anything, a company should prefer candidates who are experienced in a different set of technologies. Any worthwhile programmer will be able to pick up the technologies necessary to work with the existing code base. Hiring people with different skill sets gives the team instant experience in a new set of tools for solving problems.

Click to read more ...

Wednesday
May262010

Why your company should have a very permissive open source policy

Having a permissive open source policy is important if a company wants to recruit truly stellar programmers. Or put another way: great programmers will be less inclined to work for you if you have a restrictive open source policy because being involved in open source projects is one of the best ways for a programmer to increase his market value.

Traditional methods for measuring programming ability are ineffective

The job market for programmers, especially the top programmers, is notoriously inefficient. This inefficiency is due to employers lacking good methods for evaluating programmers. The standard techniques used to evaluate programmers -- resumes, on-the-spot coding questions, take-home projects -- are at best crude approximations of a programmer's ability, and none of them will be indicators of the truly visionary people. Sure, there are other indicators like being involved in successful companies or having past impressive titles, but those are still indirect indicators of programming ability.

If you're a programmer, this difficulty in measuring your skill means its really difficult to make a potential employer's perceived value of you match your actual value. Top programmers aren't differentiated from the next tier of programmers and get badly mispriced in the market. Top programmers need better mechanisms to communicate their value so that they can be priced more fairly in the market.

Click to read more ...

Saturday
May082010

News Feed in 38 lines of code using Cascalog

In this tutorial for Cascalog, we are going to create part of the back-end for a simplified version of a Facebook-like news feed. In doing so we are going to walk through an end-to-end example of running Cascalog on a production cluster. If you're new to Cascalog, you should first look at the introductory tutorials here and here.

The code and sample data for the example presented in this tutorial can be found on Github.

Click to read more ...

Friday
May072010

Cascalog Presentation at Bay Area Clojure User Group

Here are the slides from my presentation about Cascalog at the Bay Area Clojure User Group last night:




Tuesday
Apr272010

New Cascalog features: outer joins, combiners, sorting, and more

In the first tutorial for Cascalog, I showed off many of Cascalog's powerful features: joins, aggregates, subqueries, custom operations, and more. Since Cascalog's release a couple weeks ago, I've added a number of new features to Cascalog that seriously increase the expressiveness and performance of the language without compromising its simplicity or flexibility.

Click to read more ...

Wednesday
Apr142010

Introducing Cascalog: a Clojure-based query language for Hadoop

I'm very excited to be releasing Cascalog as open-source today. Cascalog is a Clojure-based query language for Hadoop inspired by Datalog.

Highlights

  • Simple - Functions, filters, and aggregators all use the same syntax. Joins are implicit and natural.
  • Expressive - Logical composition is very powerful, and you can run arbitrary Clojure code in your query with little effort.
  • Interactive - Run queries from the Clojure REPL.
  • Scalable - Cascalog queries run as a series of MapReduce jobs.
  • Query anything - Query HDFS data, database data, and/or local data by making use of Cascading's "Tap" abstraction
  • Careful handling of null values - Null values can make life difficult. Cascalog has a feature called "non-nullable variables" that makes dealing with nulls painless.
  • First class interoperability with Cascading - Operations defined for Cascalog can be used in a Cascading flow and vice-versa
  • First class interoperability with Clojure - Can use regular Clojure functions as operations or filters, and since Cascalog is a Clojure DSL, you can use it in other Clojure code.

Click to read more ...

Saturday
Apr102010

Fun with equality in Clojure

I ran into some very non-intuitive behavior from Clojure recently. See if you can guess what "foo" is in the following examples:

Example 1:

user=> foo
1
user=> (= foo 1)
true
user=> (= [foo 2] [1 2])
true
user=> (= {foo 2} {1 2})
false

Example 2:

user=> foo
false
user=> (= foo false)
true
user=> (when foo (println "shouldn't print?"))
shouldn't print?
nil

Yikes, huh? Here are the answers:

Example 1: (def foo (Long. "1"))

Example 2: (def foo (Boolean. false))

For example 1, the map equality breaks down because Long and Integer have different hashcodes for the same numeric value. In example 2, Clojure considers anything besides false or nil to be true in a conditional, so that means a false Boolean object will be true in a conditional even though it's equal to "false".

I would definitely consider #1 a bug, as part of the contract of equality is that two equal objects have the same hashcode. #2 is more debatable, but it seems more intuitive that the Boolean object false be considered false in conditionals as well.

You should follow me on Twitter here.

Tuesday
Mar232010

Migrating data from a SQL database to Hadoop

I wrote about the various options available for migrating data from a SQL database to Hadoop, the problems with existing solutions, and a new solution that we open-sourced on the BackType tech blog. The tool we open-sourced is on GitHub here.