|
Seth Davis uses StatSheet for article...but no link October 04, 2009 Glad to see Seth Davis used StatSheet for his article on overworked refs, but this is yet another example of a traditional media outlet referencing Statsheet.com but not linking to it. In the age of the Internet, just citing a source in an online article should not be enough to maintain journalistic integrity. If you quote data throughout a whole paragraph in your article and mention "StatSheet.com" you should at least link back to the site. Even if you use "rel=nofollow" so SI doesn't give StatSheet any Google juice, they should still link back. I've never had this issue once from a blogger. Just traditional media. Perhaps one of the reasons they are doomed... Posted by Robbie | Permalink | Comments The Hosting Provider Time Machine: Paying 2006 prices in 2009 August 24, 2009 UPDATE: Good discussion over at Hacker News This post is a rant about my hosting provider, Slicehost, but it applies to many (most?) of the hosting providers out there. But before I get started, let me say that I've been a very happy Slicehost customer. When I initially launched StatSheet in Nov 2007, I was a Joyent customer, but after months of inexplicable disk I/O problems that impacted the performance of the website, I switched to Slicehost. And I've had ZERO problems since then (knock on wood). Very easy to setup/upgrade/etc. No mysterious performance problems. Back to my rant...one of the issues with a provider like Slicehost is that you must fit your hosting needs into one of their pre-canned "slice" configurations. You can see the current list of slice configurations. That's perfectly fine when I started out because my needs were fairly small. With Slicehost's easy upgrade process I could move up to larger RAM/storage as needed. However, this became a bigger issue after I reached 7-2GB slices. The price increase to the next level is no longer insignificant. And I no longer need more RAM AND more storage AND more bandwidth. I just need more storage. 3 of my 7 slices could benefit from moving to the 4GB slice plan solely for the increase in storage, but that would mean an additional increase of $360/month to go from 80GB per slice to 160GB per slice. Effectively I'd be paying $4,320 per year just to go from 240GB to 720GB. So I sent a query in to Slicehost support to see if I could upgrade JUST the storage. I knew they didn't do this kind of thing, but I wanted to ask anyway. The response I got back was what I expected...no, they don't do a-la carte type upgrades. That's understandable. They want to make everything as cookie-cutter as possible. I get that. However, as I started looking at the price math a little more, it struck me that paying $130/month for 80GB of storage seems so...well...2006. 80GB isn't what it used to be after all. I did a little more digging and found that the price/performance curve for disk storage is still on its torrid pace of doubling density every 12 months. That really got me thinking. I hadn't recalled Slicehost ever increasing the Slice storage requirements or decreasing their fee. Are they really charging the same thing as they did years ago for the same type of hardware? Thanks to the Way-Back-Machine, I could easily find the answer. Their Sept 2006 and Sept 2007 page shows that yes, Slicehost is indeed charging the exact same price for the exact same hardware requirements. What does this mean? Well, for hosting providers that do not a) increase their hardware configurations or b) decrease their price on at least a bi-yearly basis, their profit margins are actually increasing over time. Why 2 years? Let's assume hosting providers purchase excess hardware capacity to handle future demand. Two years of extra capacity is probably a little much, but I'm being generous ;-) So at least after year 2, the hosting provider can start putting in the lower cost hardware and yet still charge the same-old price. At some point you'd think they'd have to either add more hardware to the current Slice configuration or drop the price. I doubt they are going to drop the price, so when will they increase the hardware?? What's a lean startup to do? I looked around and Slicehost is still mostly competitive with the other providers in their class (if you know of something better, please let me know). And if you look at Joyent, they offer only 25GB of storage for their 2GB plan at twice the price! Slicehost's answer to my query was to use something like S3 to offload unused/unneeded files and that's what I'm doing. But I don't want to. If I was talking terabytes of data, I could understand, but my home external storage is 1TB which I bought 2 years ago. 80GB seems so miniscule in comparison (because it is). I know, I know, an external disk drive isn't the same as NAS or whatever the hosting provider uses, but does it justify no change in 3 years? Posted by Robbie | Permalink | Comments 2009 High School Basketball Player Rankings updated July 19, 2009 Thanks again to the continued effort of Jeff Crume to compile his RSCI rankings, I've updated the High School Basketball section on StatSheet.com to include the 2009 Top 100 and McDonald's All-Americans. Soon I plan on adding new charts and analysis on recruiting trends. Let me know if you have any suggestions. Posted by Robbie | Permalink | Comments Passenger/Nginx much better than Mongrel (at least for StatSheet) July 15, 2009 Summary: I recently switched from Mongrel to Passenger/Nginx and have noticed a significant IMPROVEMENT in memory usage and performance Recently I upgraded the StatSheet Network to Rails 2.3.2 from 2.1 and it was pretty painful. I've learned that I need to upgrade with every point release so each upgrade event isn't so dramatic. Why is StatSheet difficult to upgrade? Mainly because of the size of the codebase: 1100 views, 400 models, 60 controllers, 325 database tables, 30 million rows. That's quite a bit of code and data for 2 years of work. More code means more potential corner cases. Because I have so many models and controllers, I use a nested folder structure. All the NBA models are under models/nba, all the College Football models are under models/cfb, etc. This has caused quite a few problems as Rails still doesn't do a good job at managing the namespace (e.g., "X is not missing Y constant" errors). Each upgrade 1.x -> 2.1 -> 2.3.2 has raised new issues related to this problem. With the last upgrade, the memory usage on my two Mongrel servers (4 instances per server) went UP. I was already seeing an initial startup memory of about 150MB and would grow to over 300MB within the course of a couple hours. It got so bad I had to make Monit restart Mongrels once they consumed more than 300MB. I had to shut off the Monit email notifications because I was getting so many emails about restarts. I modified Monit to write to a file every time it restarted Mongrel so I could keep track. Below is a list of the number of times Monit restarted Mongrel over the past 30 days (this is just one server; the other server saw similar numbers of restarts): Restarts Date 146 2009-06-15 138 2009-06-16 114 2009-06-17 120 2009-06-18 122 2009-06-19 109 2009-06-20 117 2009-06-21 113 2009-06-22 86 2009-06-23 109 2009-06-24 113 2009-06-25 107 2009-06-26 92 2009-06-27 115 2009-06-28 102 2009-06-29 115 2009-06-30 112 2009-07-01 108 2009-07-02 104 2009-07-03 112 2009-07-04 93 2009-07-05 112 2009-07-06 98 2009-07-07 109 2009-07-08 116 2009-07-09 169 2009-07-10 154 2009-07-11 Why so many restarts? It has been very hard to pin-down. As you might expect with a sports statistics website, StatSheet is very database intensive which means lots of complex database queries. Lots of tables means lots of joins (:include) and even nested includes (:include => {:table1 => [:table2, :table3]}). I think this is the source for much of the leakage. I took a look at a few of the ruby/rails memory troubleshooting tools, but they all really sucked for my situation. Again, I have a bunch of code (and over 1.5 million pages of unique pages) so it is very difficult to reproduce the memory consumption patterns in development. If Rails is a ghetto, Mongrel is a slumdog. Mongrel has surprisingly few features or diagnostic tools and the documentation is sparse at best. Heck, the code hasn't even been updated in over a year. Back to the upgrade...because it looked like Mongrel was consuming even MORE memory than before and because I had already started dabbling with Passenger for Nginx, I decided to step up my testing. But I heard it sucked memory too. I got it working on a production server pretty quickly and painlessly. I also installed Ruby Enterprise Edition just to get the max advantages with Passenger. I let it run for a bit and was amazed to see no marked memory leakage. I've been able to let it run all day without a single restart! The other thing that sucked with Mongrel is that instances would stop responding. I was using Nginx on the front-end with the fair module. Periodically I'd check and only 2 or 3 of the instances on a server would actually get any traffic. Again, this was painful to troubleshoot. Nothing in the mongrel logs and limited info in the Nginx error logs. This is another problem Passenger solves. Nginx/Fair -> Mongrel is really attempting to create a global queue. But since Nginx and Mongrel isn't tightly integrated they don't work all that well together. Passenger has a global queue and manages it for you, so I don't have to worry about periodically reloading Nginx in order to contact inactive/restarted Mongrels. Granted I've only had Passenger in production for a couple of days, but any major issues would have likely popped up by now. It will be interesting to see how it does under heavy load. The Phusion folks will be getting a donation from me shortly!
Posted by Robbie | Permalink | Comments
|
What's New?
This blog is the place to learn about all the new stuff I'm working on for StatSheet.
Subscribe
Have a suggestion?
|