SxSWi Day Four: Considerations for scalable web ventures (How to scale)

Panelists of Scalable web ventures

Kevin Rose | Digg
Cal Hendersen | Flickr
Joe Stump | Digg
Chris Lea | Media Temple
Garrett Camp Stumble Upon
Matt Mullenweg | WordPress

{Discussion: Kevin seems to be moderating}

  • Consensus is that you think about scaling when you get there
  • Joe Stump says that it (not worrying about scaling initially) helped them concentrate on building cool features
  • Software load balancers suck … Squid is highly recommended
  • Pound is good for http load balancing
  • Joe Stump says that at 15 million pageviews you should start thinking of specializing servers (images, db, static files … that sort of thing)
  • WordPress uses mainly rented server boxes …. 1000 of them (didn’t know that)
  • Cal from Flickr says that engineering time is expensive and that if you can just solve the problem by throwing money at it … then you should.
  • They recommend that when your development staff grows past 2 people that someone be appointed to standardize code (underscores vs Camel Case)
  • TRAC is brought up by Stump … use it if you
  • Lea says to document your code. Its actually a monetary issue, time spent figuring out code is time not coding, if you can reduce that you save money.
  • Cal cracks a really great joke “What is this documentation thing you speak of?” … “seriously … just hire people that agree with you”
  • Question comes up about remote workers … Kevin says that at one point they didn’t care … they hired a guy from the East to help digg scale.
  • Matt says his people don’t see each other for  months and they get together usually for social purposes more than anything
  • Says they are trying to get to a stage where they have users clustered in certain cities.

{Floor opened for questions}
Q: What bottlenecks do you guys usually encounter

  • If its not your db then its your file storage (NFS etc) … Cal concurs … says its almost IO … “With a teeeny bit of foresight … you can avoid that” Love this guy!
  • Talking about digg architecture … Joe Stump says digg db has 200 tables but only about 2 are problem children
  • says one of them has about 200 million records … and the two get the highest 2 read and write requests
  • Stump says that your language never matters … Cal chimes in “Unless its Ruby” … the crowd roars in appreciation!
  • PS:not sure where this fits, but there is a discussion about how their admin tools are usually not very well done. Joe says that whenever something goes wrong with digg, they usually check with the Admins first to see if they’ve screwed with anything.

Q:recommended software

  • Cal says use Ganglia for graphing, puppet for admin. Lea suggests Unin for graphing. Cal says Ganglia and unin are almost the same. LVS comes up.

Q: How do you keep the community from becoming obnoxious

  • Kevin talks about giving the community the tools to moderate themselves, talks about the success of the “bury” option.

The panel

Q: About source control

  • They do pushes at digg from once a day to 45 times a day, but Rose says officially they’ll be pushing twice a month.
  • Joes suggests that when you’re ready to push live, you should freeze your code and create a new branch
  • This way when something breaks, you can fix the code in the branch and push it live from there.

Q: asks about what they do when they can’t have a local development environment

  • Cal says they can’t support a local dev environments because its just too complex, too many moving parts. They just use dev servers. Everyone on the panel concurs.
  • Cal: Talks about how Flickr lives on memcached and Squid, suggests taking a look at Varnish if all your stuff is in memory. Says that they have 32,000 requests per second for images.
  • Joe: Says they cache things like user objects containing user data (because users barely alter their info after they first get set up)
  • Matt: Talks about the use of output caching … says that if you’re getting 20 million page views then caching pages for, say one second, can take you down to about 8 million or was it 800,000 …
  • Joe: Talks about queuing. Says that when a user diggs something, they just cache it for that user so it shows up dugg, but they queue it. So a digg might not get into the databases for a few minutes.
  • Digg uses gearman for queuing. Cal talks about a “ghetto queue” of cron and mysql to implement queuing. Joe says that’s “housing projects” bad.

Q: about API’s

  • I love Cal! He says that there’s something about something about API’s that brings out the stupidity in people. They just try to suck down all your data as fast as possible … where they wouldn’t try to do that with a regular web page. He says that implementing a throttling  system is a good idea, to avoid getting creamed by idiots. Did I mention that I love this guy!

Q: any good documentation for using the tools they recommended

  • Answer: A resounding NO. Joe says 90% of what he’s learned with open source is by trial and error.

Definitely Worth the price of admission!