This is why the FSM should excise cheat entries; apparently nobody else easily will.
I seem to have done a very good job at making this look easy. Mission Accomplished and all that.
“Should”…”easily”… I’ll take “easily” first ’cause its the hardest.
The FSM is almost completely automated. On Wednesday night, while I was sleeping, the FSM noticed that a new Farktography theme was posted and started doing its thing – it scraped the main page to figure out the theme name (it is always surrounded in quotes – thank you very much for that Mr and Mrs Headline Submitter) and then it scraped the comment page to figure out the boobies time and the main page listing time. Then it scraped the voting page to get the votes and the timestamps and the URLs and the Farktographer names. And then, just to be sure, it went back to the preceding four contests and scraped those voting results to pickup any late entries/votes. Fifteen minutes after that, it generated the static reports (the Recents, the Evars, the Operations). It did that every hour on Thursday – it starts at 00:00 and ends at 23:00. Contests hit the main page at essentially random times but the schedule ensure that the FSM will notice within an hour. On Friday and Saturday nights, it wakes up once to scrape the past four contests, just to keep the data somewhat lively. It repeats the process every week. It works. It only fails if I’ve introduced a memory leak or if certain base assumptions are violated (e.g. the Fark.com HTML drastically changed or something like that). It works and it works while I am sleeping.
Scraping those pages – extracting the votes, the time stamps, the Farktographer names, &c – was kind of hard. XPath with Regular Expressions? That’s cutting edge right there. Cleaning up HTML to make XPath work on it? Sort of hard (thanks Fark coders for not making it too hard). Maintaining the database of the entries and the contests? Attention needed to be paid. In general, getting the FSM almost completely automated was hard. But I was motivated to do it so I could free up time to do other things (like, be a husband or a dad or even spend time working on new FSM features).
On-behalf-of posting is the only part that is not automated right now. I have to manually enter the OBO values to let the FSM do its thing. That manual step is one of the reasons why I’m not a big fan of OBO, but OBO posting is a nice thing for people to do for each other and it doesn’t happen too often. The primary reason why I’m not a big fan is that OBO introduces a 2nd version of the Truth. The data on the FSM no longer matches the data on Fark. I really wanted the FSM to simply be a value-add to Fark. Now, it is actually creating data which makes reconciling between Fark and the FSM is much harder (esp. since there’s no way for an outside observer to know when OBOs have been applied).
Moderating individual entries does not lead itself to automation. The Scarlet AAARRRRrrrrr report shows some potential repeats, but it does not catch them all, and false positive could exist. I’ve considered using that data to auto-excise entries, but I don’t want to automatically make a mistake. Things like the same image under different URLs are even harder to catch – I’d like to write some image comparsion algorithm some day to do that, but that is really hard. It would be very satisfying, but hard. People get paid to do that stuff. HDR? Geez, I don’t know. Maybe something that could measure dynamic range and kickout images that exceed some threshold, but that is going to take some research – not easy. And the FSM would have touch each image as an image (right now it just tracks URLs).
So what if I don’t try for automation and just do what I’m doing for OBO? I don’t want to. I’d have to moderate every contest and that takes my time. I could expose some sort of moderator UI and divvy the work, but to do that, I’d have to define moderators. Right now, the FSM has “viewers” – people can only view the data. If I had moderators, I’d need to incorporate some authentication and authorization mechanisms. That is kind of hard. Certainly not easy. And who gets to be a moderator? How are moderators moderated (“meta-moderation” ala Slashdot)? I would want to publically expose moderation events to ensure that some moderator does not get all power mad. Allowing for meta-moderation is another hard thing.
And let’s not forget that deleting posts on the FSM would introduce yet another version of the Truth.
On to “should”. Words like “should” are part of the lexicon my business partners use at my place of employment. The FSM is a vanity site; a hobby. It attempts to add value to something (Farktography) that enriches my life and it allows me to explore technologies and techniques that I can’t really explore at work (e.g. XPaths with regular expressions). I think that the FSM succeeds quite well at doing both of those things. If you are a “7 Habits” person, the FSM and Farktography are both Quadrant II activities for me (Important, Not Urgent). I really don’t want them to be Quadrant I (Important, Urgent). “Should” is a Quadrant I word. The FSM is best guided by requests, not requirements. I moved from rankings to standings, for example, based on a request.
I need to preface this next part with “I’m very happy and flattered that people have been moved to sponsor me to TF based on either my photography or on my FSM work. I really appreciate those gestures (anonymous or onymous). Seriously. Five bucks means different things to different people, but the gesture of sponsorship means a lot to me”. I’ve stated in the past that I really don’t want to be sponsored based on the FSM. The reason being is that I don’t want there to be any perception of quid-pro-quo. The FSM is not some sponsorship-whoring vehicle. It may be a (very inefficient) vote-whoring vehicle. But I’m not doing it for pay.
I’m happy that the FSM gives an “easy” impression. I worked hard to make it easy to use (seriously, that Portfolio URL thing took way more thought and work than I expected). I’m very happy to receive feedback and requests and I will continue to roll out new features, but I need to balance those features against some guiding principles: “value-add”, “one version of the Truth”, “automation”, “ease of maintenance” are a few of those.