Author Archive

How Not to Crowdsource (or really, how not to build an open-submission website)

@ink_slinger linked to a City of Edmonton website today called “Idea Zone”. I was intrigued, so tried to find out what it was.

Go ahead, visit the site. See if you can find out what it’s all about. I can wait…

Did you visit it? What did you learn? Probably nothing, because in order to find out what this site is, you have to register. I don’t know about you, but I like to know what a website is before giving it my vitals. And its registration form is quite detailed. Although the more intimate fields (address and phone number) are not required, they are still asked for, which is quite intimidating.

So step one in how not to crowdsource is: Require me to register to find out what your site is.
And while we’re at it, step two is: Ask me for this much detail without buying me dinner first.

Thankfully, Edmonton blogstar Mack Male has a blog post that explains it all. Not a terrible idea, overall. I think they should have taken a bit more of a lesson from the low signups at the ICLEI Congress that there must have been something terribly wrong, but that’s a whole other matter.

The purpose of the site (for people who don’t want to click) is to get ideas for how to make Edmonton a better city. I sometimes think too much effort is put into finding ways to make this city better, and far too little into actually doing anything about it, but more input from a broader spectrum probably isn’t a bad thing.

So I register. Leaving out the gory details of my location, age, and phone number as I see no reason for them to have them. They do one thing right here and skip the activation step, so kudos on that. They miss the boat on the benefit of that step being skipped by not just having me logged in immediately.

So rule 3: Make me activate my account with a link in an email. Then make me log in even though I just gave you my username and password and so can’t possibly be faking who I am.

So I log in. Before I can do anything, I have to agree to this obscure set of rules about how my submission may or may not be used. I don’t really care. By now, I have gone through several forms, been frustrated and limited in what I can do at every turn, and am not really interested in submitting anything at all.

So a final rule before I sum up: Make me agree to rules I don’t care about even though this could have been a simple checkbox in the signup page.

But I go on, because now I’m on a roll and writing a blog post about it. Somehow I just can’t stop myself. I find that, since Mack’s post, there have been 10 new users and *no* new ideas posted. This comes as absolutely no surprise to me. I’m also too tired of signing up to enter an idea now.

I guess the city is using this software because they’re using a version of it internally with a stronger workflow. This is the public-facing version of that software. Well, I’m just going to come out and say it. The public facing side of it is crap.

When you’re crowdsourcing, your goal should absolutely not be to try to filter users out early. This is a super important thing for most sites to do because they’re looking to filter out all of the rough at the expense of some diamonds. Unfortunately, in crowdsourcing, you can’t afford to do this. The entire purpose of this process is for *you* to find the diamonds. That means a bit more work on your part, but it also means a less frustrating user experience.

Not only should users be able to see what the hell this site is without logging in, they should be able to see submitted ideas and even submit their own ideas with either a minimal (username/password) account creation or no account creation at all. It should be moderated, filtered on the level of word-triggers (no one will suggest to improve the city with anything to do with penis’ for example), but it should be *easy* to submit ideas.

Championing those ideas, commenting on ideas, there you can increase the barriers. But if your goal is to find new ideas, you must make this process much easier. It’s important to realize that internal and external tools rarely work well from the same package (see for example the dreadful WebCT — great for teachers, terrible for students).

To sum up all the rules in one sentence: Make it hard for users to submit ideas!

Drizzle: The Future of MySQL

Brian Aker, one of the main engineers on mysql at Sun, has posted a presentation he did on the project he’s been working on for the last year and a half: Drizzle. I highly recommend anyone who’s interested in the state of the art of database technology watch it.

To summarize:

  • Scalability: A large part of the effort so far is along the lines of making it so that the system can scale to massive numbers of threads (and processes). They’re removing locks wherever possible and aiming for systems with 100+ cores.
  • UTF-8: This is a hugely important move. Drizzle talks exclusively in UTF-8 and bytestreams. This pushes all the character set insanity out to the client, which is really where it belongs. Unfortunately, this will probably be a stumbling block for some apps that have data that can’t be easily converted.
  • Protobuffer replication streams: Using google’s protobuffer protocol to put out replication information makes it really easy to write applications that do things based on the replication stream. With mysql binlogs, this was a fairly tedious thing to do and resulted in fragile code.
  • Async protocol: This is really useful. A page load should be able to spam the server with a bunch of queries and then fetch results as needed rather than doing them one at a time. This is a big part of taking advantage of higher concurrency and reducing pageload times.
  • Built in sharding: This is also really useful. I’m not entirely sure what their plan is, because this is the first I heard of it being part of the project, but if done right this will be so valuable. Sites that need to shard often wind up implementing this from scratch. I’ve been involved in doing so myself. It certainly isn’t as scary as a lot of people think it should be, but the fear is palpable among other devs and a solid baseline implementation would raise the state of the art a good deal.
  • Plugins: Plugins are a big part of drizzle’s re-architecting. The goal seems to be to completely ground up make it as simple as possible in the core (slides say 350k loc as opposed to 6.0’s > 1m loc) and push all extra functionality out to plugins. Areas subject to becoming plugins include:
    • Pluggable client protocols: Making it so that the client can talk in an HTTP/REST protocol for simplicity, or any protocol desired.
    • Pluggable logging: Have it log out to syslog, for example. Or to an analysis app that does custom slow query logging, etc.
    • Pluggable authentication: Turn off auth altogether, use the system’s user accounts through PAM (yes, please!), LDAP, HTTP AUTH, or just something custom. This also helps remove locks for scalability apparently.



I can’t stress enough how this is the real future of MySQL, far moreso than any future versions of mainline mysql are. As the world of the web moves towards simpler databases like couchdb, drizzle is the only way that mysql will manage to be competitive on the web. Mainline mysql just keeps getting bigger and heavier, growing towards enterprise use (towards being an Oracle replacement, really) leaving those of us who don’t need or can’t afford those features (not in $, but in response time) out in the cold.

Personally, I think that object/document store databases are the future of databases for the web as a whole, but Drizzle is the future of the particular subset where schemas are still important. And for the time being, it will come out of the gates as the most mature product among next-gen web databases by the simple fact of its inheritance of mysql architecture.

I’m going to be keeping my eye on Drizzle, and I think other people should too. Brian Aker has a blog and a twitter, and Drizzle itself is on LaunchPad.

Why you shouldn’t really be using qmail anymore (or how I found a license I hate more than the GPL)

I’ve long been a fan of djb’s method of writing software. Over the years, three of his tools have served me very well: djbdns, daemontools, and qmail.

But djb has a dark side. He has some strange views on filesystem layout (I’m no booster of the linux standard layout, but his views on layout are just plain strange) that I can get over if not work around. More importantly, though, are his views on licensing. The GPL makes me cringe (another blog post for another time), but djb goes a whole other direction: You can’t modify his code and redistribute it. You can distribute patches, you can distribute his pristine copy, but you can not and must not distribute an altered version wholesale, in source or in binary.

Which would be fine, if it ever got updated. But it’s been years now, and the world of internet mail (and spam) has changed drastically since then. Namely, backscatter. To a lot of people familiar with email tech, this is nothing at all new. But to qmail, it’s like it’s still 1999.

For the uninitiated, backscatter is when spam sends to known-bad addresses with a reply-to that goes to their real target, a known-good (or plausibly-good) address. The known-bad bounces back to the known-good, giving someone spam from a sender who didn’t actually mean to do anything bad. This results in a very bad reputation for the previously innocent mail server.

Think of it like sending a letter with no stamp and the address of the person you’re sending it to to get around paying for postage (note: I have no idea how this works and am not endorsing any form of mail fraud).

The right thing to do, nowadays, is for a mail server to immediately reject an undeliverable email with an error (like a 404 code from a web server). Because of a quirk in how qmail is designed, it can’t do that, though. It will accept all mail for all domains it knows about and then reject it later if it can’t deliver through a bounce. Which makes it a prime target for backscatter.

The process for solving this with qmail involves a fairly tedious and possibly risky set of steps. You have to patch your copy of qmail to add scriptable hooking to the e-mail accept phase. You then have to add a script to this system, written in probably bash or perl script, that will go through and do all the processing qmail intends to do later on to figure out if it’s deliverable. If it’s not, it’ll return an error code. If it is, it’ll let it go through the normal qmail process.

This completely breaks the rather beautiful design of qmail. At this point, you may as well be using postfix, which is less beautiful but actually designed for plugins and has more modern notions about what to do about backscatter anyways.

So from being a djb booster, there’s now only one product of his I recommend: djbdns. Still the simplest, cleanest little dns server you can run. Daemontools has fallen out of favour to runit, which has similar modernizations and a less restrictive license.

So postfix seems to be where it’s at for email servers these days. I always felt I had a better understanding of how qmail worked, though. Maybe someday someone will ground-up rewrite it like they did with daemontools/runit.

Participatory Democracy and Twitter

I was up late last night (not unusual for me lately) watching something kind of amazing unfold. The Alberta Legislature was debating a new version of their human rights legislation that included two major changes:
- on one end of the spectrum, the addition of sexual orientation to protected status’. Technically, this was just a gesture, as sexual orientation has been a protected status in common law for years due to being read in by the courts.
- on the other end, they added the rights of parents to remove their children from classes with content relating to religion, sexuality, and sexual orientation. A right that, by my understanding, was already present in the school act.

Now, Alberta’s legislature is a perpetual majority government by the Progressive Conservative party, and that party is capable of essentially controlling all legislation that passes through the house. They really wanted the latter and seemed to think the former would be a good olive branch to prevent argument.

But that’s all politics as usual. Simple background information. What happened last night was that somehow, people on twitter were rallied to watch the debate unfold over the streaming video the Alberta government provides of legislature proceedings. If it had just been people on there bitching and moaning, it also wouldn’t have been very special.

But last night, there weren’t just average citizens participating. There were actually MLAs on twitter discussing and debating with the twitter users. Granted, most of the MLAs participating were backbenchers or otherwise not taking direct part in the debate on the floor, but they were there and they were talking to the very people they represent while making law.

I know that there are people concerned at adding distractions to people on the floor of governing bodies, but I honestly think this should be encouraged. I couldn’t help but think I was seeing some element of the future here, where people are moved closer to their representatives in government and able to influence them more directly. And anyone who’s ever watched CPAN knows there’s plenty of zoning out, reading magazines (one MLA last night was reading a magazine while the MLA next to him was debating), chattering, etc. If their time is going to be wasted, I’d rather it be wasted on us.

I do think the MLAs could stand to learn some of the twitter conventions a bit better. If they’d used hash tags, it would have been easier to follow their discussions. Also, if more of the left side of the house (Libs and NDP) had been on that would have been nice too. I’m a little shocked to see the Conservatives on the forefront of this trend.

If you want to read the discussion on twitter about bill 44, you can go here.

New Facebook Layout: Why It’s a Bad Move

Here’s the thing: Facebook should not try to be a better twitter. When we were lagging behind on features at Nexopia for two years, we constantly ran into the trap of thinking about how we needed to do these things our competition did and do them better. When we finally came out with something that made us, in a couple of ways, a better facebook than facebook, it burned us hard. I can’t go into any detail on how, but suffice to say it did hurt us.

Facebook should try to be a better facebook. It is, at its core, a social utility for your REAL life. For the people you know in person and want to keep in touch with. This change, to be like twitter, where it’s about networking with people you may not know but do know OF, takes them away from that core advantage.

The last update, though I bitched about it, at least made the site more organized and streamlined. This is exactly what I needed from a ’social utility’. Information organized into neat little buckets where I could see immediately where things are. Now.. well now my friend feed gets horribly spammed by whomever posts the most updates. It eliminates my ability to get an overview of what’s going on in my real-world world.

This is, incidentally, why I really dislike the twitter facebook app. And it’s only going to get worse now that the feed is spammable.

Goodbye organization, hello feedspam. Please, Facebook, leave twitter to twitter and get back to being a social utility.

Demoing Bittablog at #democampyeg

So tonight, for the first time in about 4 years, I did public speaking. Back in college there was a fair amount of public speaking and presentation giving to do, and I enjoyed it then. I always come away from doing public speaking feeling like I’ve done a good job, and I hope that perception is accurate. I find myself able to get my thoughts across clearer in presentation form (even though I always wing it) than I often can in just direct discussion.

What I presented was, of course, bittablog, which has been my pet project for the last little while. I knew from experience that last minute additions to a presentation are a bad idea, so the feature where you can post a bitta directly from your twitter account didn’t make the cut as something I could present, but the bonus of cutting it out was that nothing in my presentation failed. And I think that’s pretty important.

I won’t get too much into what bittablog is here, because I already covered it here in my first post on here, and the front page manages to do it concisely in 140 character bitts. Suffice to say that I really enjoyed presenting bittablog tonight, and I think a lot of the audience got a kick out of it too.

One interesting thing that I experienced, which was reminiscent of RailsConf last year, was the way in which people were twittering as I presented. After I was done, I pulled out my iphone to read what people were tweeting as I was talking. I have to say, it was really gratifying to come back and see the really nice things people said about it while I was up there talking.

At RailsConf I got to see this from the audience side through IRC (there was also twittering, but I was unconverted at the time), and one thing that I said I wanted to do then at some presentation was have a cohort twittering/ircing at the same time, answering people’s questions and talking back.

One difference though was that there was no silent heckling going on, and I think that’s good. A lot of the railsconf presentations had some really viscious talkback going on on irc. Some deserved, some not, and it was actually kind of intoxicating and hard to resist falling into the same trap. None of that at #democampyeg, which is great. This is a very supportive community, and I hope it stays that way.

OpenSocial

Google just announced something called OpenSocial, which is a facebook apps-like mechanism running on an open platform of essentially embedded js and html. At least, that’s the jist I’m getting.

But where’s the security? Letting untrusted apps run js on my social network site (and that’s not just a hypothetical. 1 million actual users, more like 3 million the way facebook and myspace counts them) means giving them access to cookies (we do httponly, but that doesn’t cover all browsers by any means) and the ability to do a lot of really nasty things to our users.

Seems to me facebook didn’t do a closed platform so much for lock-in as for a desire to avoid security issues just like this. The hoops you have to go through to get any serious information on a facebook app, including several levels of user confirmation, are a serious hindrance to overt abusive use.

Either google has failed to make this useful to me, or they have failed to market it to me. Both of these possibilities seem very surprising.

On why I am odd

I suspect that most people, when visiting a non-chain restaurant, don’t wonder about whether they could manage to buy the restaurant and if it’d be a good investment… Honestly, there’s a place downtown that I think has huge potential that it’s not living up to and If I Were A Rich Man, badle-beedle-badle-beedle-beedle-bum, *ahem*, I would probably buy and try to make into what it could be. I think it’s a business crush.

And, like many of my crushes, is probably doomed. After all, restaurants are the most likely type of business to fail. And boy howdy do they ever fail.

Also, who’s the ghost who left a comment anonymously on my last post? I’ve ruled out all the likely suspects, I think.

Stupid Banks Stupid About Stupid Security Questions

So my bank has recently decided to start DEMANDING security question/answer pairs for their web page login system. In order to log in, you MUST answer one of your security questions in addition to your password.

They give you 5 sets of questions to choose from and a freeform field to put the answer into. Am I the only one who sees the gaping stupidity of this? If they could allow you your own questions, maybe that’d be ok. But since all their questions are easily discoverable (stuff like maiden names, high school mascots, pets, best friend names, etc. In fact, the very things that all these years password security policy has advised you to KEEP OUT OF YOUR PASSWORD, and for very good reason), change really often (favorite magazine, favorite chocolate bar, favorite restaurant), or are very gender selective (favorite fashion designer — and a big wtf to that one in general too).

If someone can take the time to find out your actual password, they can take the time to find these things out. There are only 5 questions, so at a minimum it’ll take 5 random attempts at login from different computers over a couple of months to find out what they are and do some research without setting off alarm bells with the bank.

When are companies going to realize that security questions are a serious regression in security?

Random nerdy fact of the day

qmail has a limit of 900 characters on email addresses it’ll relay.