Erick Rudiak. Songwriter. Singer. Human. - From Erick's brain to the Internet's prying little ears.

Jumping the InfoSec reporting F.U.D. shark

Posted by erickru on June 06, 2010  •  Leave comment (0)

One of the traps information security practitioners often fall into is becoming purveyors of F.U.D.; fear, uncertainty, and doubt. It's an easy sell: organized crime and state-sponsored groups have truly scary/amazing capabilities nowadays, and there's no lack of news about data breaches to stir up concern. People are getting hacked, our systems do have exploitable weaknesses, and there is reason to worry. Complex interconnections between systems often come about through ingenuity and/or necessity rather than by design -- often in spite of it -- making the securing of an environment difficult even for the most technically astute organizations. It's no surprise that, after the Aurora incident came to public light in January, I began to receive plenty of "Google was hacked, are you next?" spam from InfoSec vendors, F.U.D. purveyors extraordinaires.

The difficult thing for InfoSec folks is to remember that our key audience members (i.e. our CIOs and CTOs) are easily exhausted by the constant stream of F.U.D. that is so tempting for us to forward their way. What was interesting about the Aurora story is that it crossed over into the mainstream so quickly that I had my key stakeholders approaching me with questions -- while the story was still fresh and developing -- for the first time in 14 months, i.e. not since Conficker hit the public consciousness thanks to 60 minutes. Our problem in InfoSec agenda-setting is twofold. First, we don't want to wait for the next haphazardly-selected CBS exposé to start the where-to-invest-in-InfoSec conversation with our stakeholders. Second, too often we sound like this guy -- or worse, we write like this guy and get smacked down thusly -- when we raise our heartfelt concerns.

Different companies will have different cadences for this sort of thing. What's been working relatively well at mine is a monthly conversation starter. The template I use is this: 1200 words or less, split into three sections. First: the industry trend; what's happening out in the InfoSec universe that is particularly interesting? Second: the connection; how does that external event have any similarity to or bearing on what we do as a business? Third: the stakeholder takeaways; what new action do I want my counterparts to take? Since I annotate my sources well, it is no secret that I've used Brian Krebs' columns, first at WaPo and then on his own site, for inspiration... almost as much as I've used Wired's Threat Level blog and, more recently, the Info Law Group's blog. This is where I get to jumping the shark.

Krebs typically did interesting investigative journalism, getting inside scoops on data breaches and on the doings of organized crime, especially botnet operators. Lately, he's been on a mission to bring attention to the plight of small business owners who've lost money to organized crime, usually after using a malware-infested PC for their online business banking -- writing prolifically on the subject, to the tune of 32 posts in the last 5 months. Because I'm not responsible for InfoSec at a small business, I haven't been using Krebs' column for inspiration as much recently. However, I couldn't help but shake my head at the article titled Using Windows for a Day Cost Mac User $100,000, which leads off with this sad tale:
David Green normally only accessed his company’s online bank account from his trusty Mac laptop. Then one day this April while he was home sick, Green found himself needing to authorize a transfer of money out of his firm’s account. Trouble was, he’d left his Mac at work. So he decided to log in to the company’s bank account using his wife’s Windows PC.

Unfortunately for Green, that PC was the same computer his kids used to browse the Web, chat, and play games online. It was also the same computer that organized thieves had already compromised with a password-stealing Trojan horse program.

We all know how this movie ends. My disappointment is with the over-the-top headline. Why suggest a cause-and-effect? Krebs doesn't need to be sensationalistic - he has great readership; if alexa is to be believed, Krebs has as many readers as crypto pioneer and prolific pundit Bruce Schneier (the Chuck Norris of InfoSec). He also doesn't need to pile on the Windows FUD-fest; he has blogged regularly about his recommendation that small business users should use a dedicated PC, booting from a Live CD, for their online banking. Why the yellow journalism for this particular piece? Certainly, "Employee's family costs company $100K" isn't a fair headline. How about "Company worker violates policy by leaving authorized banking laptop unattended at office?"

Most importantly, how much did "using Windows for a day" contribute as the root cause of this incident? Perhaps I'd feel better if Krebs had gotten professional forensics done on the PC, or at least conducted some interviewing that established that Windows was at fault as opposed to, say, one of the more common entry points for malware: expired AV subscriptions, disabling firewalling or auto-updates, etc. The ironic thing is that, generally speaking, home users of Windows should have an easier time than corporate ones as far as endpoint security goes: the most common products that are exploited (the OS, browsers, PDF readers, Flash, and Java Runtime Environments) all offer auto-updating today -- you have to go out of your way not to be patched, Windows will warn you if your AV subscriptions lapse, and you have to try extra hard to keep Microsoft's Malicious Software Removal Tool from its monthly refresh. It's corporate IT change management rules that -- usually for the best of possible reasons -- tend to inject cautionary delays into these self-preservation mechanisms. As it stands, I now have to read Krebs' latest releases with a little more hesitation before I use them to seed my regular stakeholder conversations with my IT VP's. The last thing I want to do is resell F.U.D. to my trusted partners. Credibility: damaged; shark: jumped.

Some minor corrections.

Posted by erickru on May 01, 2010  •  Leave comment (0)

It's turning out to be a slow day, so I've found time to correct a couple of minor errors on the Internet. They took a while to find. First, SQL Server Magazine encourages readers to google Massachusets 201 cmr 17, and then writes:

Storing the name of a customer in SQL Server without the data being encrypted? No way, Jose. You’ll get a fine of $5,000 per breach or lost record. If you have a database that contains 1,000 names of Massachusetts residents and lose it without the data being encrypted that’s $5,000,000.
[...]
If I didn’t know better, I’d think the security czar of Massachusetts (or whatever the title is of the person who wrote this law) was a SQL Server sales executive because the law could sell a heck of a lot of SQL Server 2008 Enterprise Edition upgrades to get Transparent Data Encryption and other useful Enterprise Edition–only features in the OS and database stack.


If I didn't know better, I'd think SQL Server Magazine was a SQL Server sales executive, because what 201 CMR 17 actually calls for is:

Encryption of all transmitted records and files containing personal information that will travel across public networks, and encryption of all data containing personal information to be transmitted wirelessly [... and ...] Encryption of all personal information stored on laptops or other portable devices;


Here's the thing. Encrypting your database files "at rest" is not a bad idea. It's helpful for all those times you lose your server's un-purged hard drive. But the Massachusetts law defines a breach as (paraphrasing): the unauthorized acquisition or unauthorized use of data that creates a substantial risk of identity theft or fraud against a resident of the commonwealth. It also defines personal information as a name combined with a state or government-issued identifier (SSN, etc.) or a financial account number. The thing to watch out for, SQL Server Magazine, isn't storing a name unencrypted in your database; it's mishandling your customers' data. Encrypting your SQL Server instance will give you relief if that mishandling comes in the shape of a lost hard drive. It won't if it comes in the shape of a SQL injection attack, network sniffing, or a lapse in judgement by a privileged employee, as the MSDN article on the subject is wise to point out. Before I would recommend a DBMS encryption investment to my CIO, I would take a long look at other methods of preventing that scenario from happening (i.e. an onsite data destruction program) and evaluate the ROI on either approach.

The other error I spotted wasn't so much a factual error, as in the case above, but an error of attribution. Andy, IT Guy, posted a passionate response to news coverage of the decision by the City of Los Angeles to replace their Groupwise collaboration system with Google's GovCloud offer. His article is titled, "Doesn't Anyone Care About Potential Consequences?" What I've found fascinating about this particular case (and others like it) is in the insight that we're able to glean about the risk management decisions taking place behind closed doors. The City of Los Angeles memo (warning, it's loooong) gives us a rare glimpse. Did COLA care about potential consequences? What were their risk acceptance criteria? Well, to start, it tells us:

Have the security issues raised in the prior CAO report and discussed in the Committee meeting been resolved? Since the prior Committee meeting, Google has announced a new proposal for protecting sensitive government data that is consistent with the approach preferred by the Police Department and the California Department of Justice. The Police Department is satisfied that these measures will adequately address its security concerns. Formal approval from the Department of Justice, however, can only be gained through its review of the actual functioning of the new system during the pilot period


Andy speculates on the validity of the potential cost savings, and the city's memo spells it out in great detail:

The total budgetary impact of implementing the Google system in 2009-10 would be $5,976,205. Of this amount, $1,951,260 is for additional expenditures not included in the 2009-10 Budget, including $1,754,760 for Google implementation and e-mail subscriptions, and $196,500 for infrastructure upgrades. These unbudgeted expenditures will be a General Fund obligation. ITA has identified $1,687,209 that could be used for this project, comprised of savings totaling $180,000 from its 2009-10 Budget and additional funding of $1,507,209 from a 2006 class action antitrust settlement agreement between the City and Microsoft. CSC agreee to advance the City $250,000 in future rebates to cover the majority of the remaining 2009-10 balance of $264,051. The recommendations in this report are in compliance with the City's financial policies.


The city memo also goes on to say that the current GroupWise system needed to be upgraded in order to track product end-of-life cycles (legacy systems: the gift that keeps on giving), and that the city would have had to spend $2.34M over five years to provide disaster recovery service levels comparable to Google's.

Andy had especially strong words for the proposed productivity gains:

They they say that they expect to get another $15,000,000 dollars in increased productivity. ARE YOU KIDDING ME! Do they honestly think that the ability to work on documents at the same time will provide that kind of added value.


Here's what the City of LA spelled out in its recommendation:

The initial CAO report identified potential productivity gains from the collaboration tools, and noted that the service availability of Google was likely to be superior to our current system. We also identified short-term productivity losses from transitioning to a new system and from incompatibility issues between Microsoft Office and Google's applications. It is not possible to accurately predict the magnitude of productivity changes. ITA, however, has estimated that the average productivity gain per City employee would be 10 minutes per week with the transition to Google's system. Using an average annual salary of $71,200 for City employees, ITA has valued that time at $44,509,500 over five years. While increased productivity is a benefit to the City, 10 minutes per week per employee would not lead to hard dollar budgetary savings.


Andy concludes by saying "government documents are are [sic] increased risk of being breached." That, unfortunately, is not something that any of us can prove or disprove, since risk is such a squishy calculation in information security. It's true that nowhere in the COLA memos does it detail the current state of security within their IT systems and whether their security posture or capabilities exceeded Google's. We do have a couple items, however, that might clue us in to the answer of whether COLA cares about the potential consequences:


  • per section 11 of the attached SOW, COLA will be getting an annual review of Google's implementation of GovCloud
  • Kevin Crawford, assistant general manager of IT, is quoted as saying "We're going to have a more secure system then we have today"
  • COLA got Google to agree to unlimited damages in the event of a data breach


The last bullet is perhaps the most enlightening: unlimited liability is rare in this day and age, and what it says about this particular deal is that COLA was able to transfer a lot of risk to Google. Only time will tell if other municipalities or private enterprises are able to do the same with their outsourced IT providers. You and I certainly don't get that from AWS today.

I have no direct insight into what exactly happened in the board rooms of the City of Los Angeles when their CIO and CISO had a heart-to-heart discussion on whether this deal should go through. Andy, IT guy, if you're reading, I think it's a fair guess to project that several people in that room cared about potential consequences.

There. The Internet is correct now. I feel better.

Time to retire ROT-13

Posted by erickru on April 01, 2010  •  Leave comment (0)

4/1/2010. With yet another piece of critical infrastructure made vulnerable to authentication bypass due to the use of ROT-13, I do believe it's time for the information security community to band together to stamp out this plague. My proposal is simple and borrows from the time-tested tradition of 3DES: we need to deprecate ROT-13 in favor of 3-ROT-13. Software vendors of the world: if you're using ROT-13 today, pleased heed this call. 3-ROT-13 is a simple, backward-compatible replacement for ROT-13. Just like 3DES applied DES in three consecutive rounds to ciphertext, extending the lifespan of DES beyond its imminent demise in 1999, applying three rounds of ROT-13 can do the same for this venerable cipher. And, unlike DES, there is considerably less performance impact to carrying out additional rounds, as there is no pesky keying necessary in between steps. Let's pledge to make 2010 the year that we clean up this ROT-13 mess and make the world safer for computing. Cisco, you go first. Messrs. Schneier, Rivest, et. al., in the words of Craig Ferguson, I await your letters.

Ray Davies: early HIPS user?

Posted by erickru on July 16, 2009  •  Leave comment (0)

The Kinks: Tired of Waiting For You

I continue to have a love/hate relationship with Host-based Intrusion Prevention System (HIPS) technology. On the love side, I have a bias towards any security system that remediates new risks without installing new code. In order of preference, optimal vulnerability management ought to start with doing nothing because you're already invulnerable, reconfiguring what you have to be protected, and finally -- a last resort -- installing some sort of software patch. HIPS, in theory, allows you to hit the "do nothing" sweet spot for vulnerability management by gracefully handling the outcome of an attack (overflowing a buffer, for example) rather than trying to identify all possible variations of the attack itself. The latter has proven incredibly difficult over the years.

On the hate part, there's the waiting: while HIPS makes grand promises of zero-effort protection against zero-day threats, there is an unfortunate and unpleasant waiting game involved. To illustrate, let's look at the latest out-of-cycle (7/13/09) warning from Microsoft: "code execution is remote and may not require any user intervention." Visit the wrong web page with the wrong browser, presto, you're a statistic. Sounds pretty bad. If I'm going down my decision tree of (1) do nothing, (2) reconfigure, or (3) patch, I'd love to know right then, on 7/13/09, if I can stop at step (1) because my HIPS suite is protecting me. Is it? Let's check....

As of this writing, it has been a little over 72 hours since this particular vulnerability became public. Depending on my organization's vulnerability management policy, I may have had to make a decision by now on whether or not I need to take invasive action to protect my enterprise. It may turn out, after a few more days, that my HIPS suite had me covered all along. The link above gets updated periodically with the latest protection data. To my chagrin, I'm finding lately that the decision-enabling information that allows me to choose not to take invasive action to counteract a vulnerability is either absent or confirmed too late to help. HIPS: love it in principle, still waiting for the big payoff in practice.

Web application security étude.

Posted by erickru on July 13, 2009  •  Leave comment (0)  • 

A few weeks back, jullrich posted "My Top 6 Honeytokens" to his web application security blog. I began thinking about how his suggestions could be implemented in mod_security -- not just because I have some weird caveman-ish wiring that leads me to think about technical solutions to other people's problems, but because governance is often a lot more effective if you can speak the language of the governed. In my case, the governed include our intrepid information technology group, the folks who run our web servers and whom I task with defending our systems against attack.

After enabling mod_security and mod_unique_id on my test server, and reading up on prior art (especially this blog post by Ryan Barnett), here is the Apache configuration (httpd.conf, etc.) I wound up creating:


# honeytoken suggestions from https://blogs.sans.org/appsecstreetfighter/2009/06/04/my-top-6-honeytokens/
# the following rules should inherit SecDefaultAction from above

# 2. Add fake admin pages to robots.txt
SecRule REQUEST_FILENAME "^/secretadmin" \
"phase:3,id:10000002,t:none,setenv:modsec_honey=true,pass,log,auditlog,msg:'Honeypot: set cookie for fake robots.txt entry',setsid:%{ENV.UNIQUE_ID},setvar:session.score=+10"

# Fortunately, since I'm writing this for a blog post, I can put blog comments inside httpd.conf comments. More after the jump...

On lightning strikes.

Posted by erickru on September 13, 2008  •  Leave comment (0)

Last week, Microsoft announced some changes to its monthly patch notification mechanism. What caught my eye the most was this:

In addition, as part of the company's ongoing effort to improve its guidance for customers, Microsoft announced its new Exploitability Index. Developed based on customer feedback, the Exploitability Index will provide customers with guidance on the likelihood of functional exploits being developed for vulnerabilities addressed by Microsoft security updates. This additional information helps customers better assess their unique risks and better prioritize deployment of the monthly security update.

I certainly commend Microsoft (and Oracle) for taking the bull by the horns with their approach to vulnerability patching. It's a difficult problem to tackle and the element of predictability they have brought to the table helps their customers. Delivering an "exploitability index" is another story, on the other hand.

It's attempting to answer an interesting and difficult question: "will this happen to me?" To make the example more real, let's ask "will lightning strike me?" To answer this question, we can look at some data that can shed real light on the question. The National Weather Service tracks lightning strikes. You can look at a map of the US and immediately tell what the likelihood of being struck by lightning is for your location; if you dig deeper, you can figure it out for time of year, trends in strike density, etc.


map from lightningsafety.noaa.gov


This data is considered reliable because it uses verifiable historical and geological information about the threat (lightning). Information security, unfortunately, does not conform to the same rules. While Microsoft certainly can make predictions about exploitability of an issue based on their undisputed expert knowledge on the subject, we simply don't have the luxury of looking back on years (or centuries) of data to support those assertions. This is not to knock Microsoft; it's simply that the threat changes so rapidly:


  • It's not the same threat every time: buffer overflows in Windows Metafile Format this month, insufficient randomness in DNS query sequence numbers the next, etc.;
  • "Unknown unknowns": the customer wants to know "will lightning strike me, here and now ?" and to answer that, Microsoft has to understand a staggering number of combinations of software (always), hardware (sometimes), and other mitigating controls (occasionally firewalls, device configurations, etc.); just recently, MS08-037 was revised several times after its initial publication in early July as more details about the interaction of the exploit and the fix itself became known; (updated 12 Sep 2008) the September 2008 bulletin was updated four days after release "to add Microsoft Office Project 2002 Service Pack 2, all Office Viewer software for Microsoft Office 2003, and all Office Viewer software for 2007 Microsoft Office System as Affected Software."
  • Exploitability inevitably changes over time. It only takes one lucky attacker to take a vulnerability from theoretical exploit to weaponized worm. ASLR protection in Windows Vista has been considered a great mitigating control against a variety of overflow attacks, yet at Blackhat 2008, a paper was presented showing how ASLR can be bypassed -- this fundamentally changes the "likelihood of exploitation" equation for an entire class of vulnerabilities, both past and future;
  • For most vulnerabilities, the fact that there's a patch pretty much guarantees that someone has (or is working on) an exploit. Microsoft says it themselves:

    Along with the predictability of Microsoft's monthly security update process is the emergence of an undesirable cycle: the release of exploit code, related to those updates, sometimes within hours of release.



As a customer trying to decide whether or not to patch every month, I continue to applaud Microsoft's efforts to give me decision-enabling information. Alas, the Exploitability Index is a piece of trivia isn't tipping the scales on the decision for me. If I'm at the ol' swimming hole and I see a thunderstorm approaching, I'm going to get out of the water -- not so much because I've been reading the lightning strike density maps for my neck of the woods, but because I know that -if- I get struck, it's going to hurt. Similarly, as I decide whether or not to patch a given vulnerability immediately, I'm going to make that decision on the same factor: if when an exploit for that vulnerability is released, I want to know which data it's going to hit and how hard. I want to know how much it's going to hurt.

Stay tuned next week when I wax rhapsodic on "The Firestone tire recall of 2000 and why I don't care how long that SQL injection issue has been around."

Slow news day?

Posted by erickru on December 12, 2007  •  Leave comment (0)

This was the best that my daily vigil of keeping-up-with-the-wacky-world-of-security generated today:



One thing I've learned over the years is that it helps to know your audience (which is why this blog is 90% songwriting, but I digress...). Observation #1 about the audiences of the three publications above: odds are, they've been online long enough to know about the perils of patch management, unencrypted data and botnets. That's been drilled into us from all angles, including the aforementioned trade press. Observation #2: odds are, that audience is largely corporate in nature (i.e. not a lot of weekend computer enthusiasts working their ranches from sun-up to sun-down are glued to computerworld.com in their leisure time). Observation #3: the security story that's really going to scare that particular audience straight is that there's an unpatched SAP vulnerability out there. Cheers to infoworld.com for reporting it; jeers to all three for offering as news things we already knew back in 1998.

I don't think that word means what you think it means.

Posted by erickru on June 12, 2006  •  Leave comment (0)

eWeek has an enticingly-titled article this week warning us all that "Zero-Day Exploits Abound at Legitimate Web Sites." This caught my eye as I've always been impressed with the folks who are able to stay so far ahead of the curve as to notice when the bad guys are exploiting something new - I figured this might be an interesting article about someone who spoke various foreign languages and trolled through IRC sites galore and bulletin boards a-plenty looking for interesting hints of bugs-to-come (hunting these types of bugs invariably seems to involve knowledge of some non-English dialect). I was initially impressed as I read,

Exploit Prevention Labs said that the zero-day exploits are specifically being used by international cyber-crime rings targeting the operating system and Web browser flaws.

Let's recap. A zero-day exploit is something that hits "the wild" at or before the time the remediation for the exploit hits. Expecting to find out more about just how they uncovered this trove of elusive zero-day exploits, I read on. And then, right there in the next paragraph, they lost me:

In the month of May, the company said that the widely publicized WMF (Windows Metafile) attack, launched in December 2005, remained the top threat zero-day threat on the Web, accounting for roughly 33 percent of all the exploits it detected.

The WMF vulnerability first hit the mainstream around Christmas 2005 with a patch made public on 5 January 2006. When did the WMF bug go from being a zero-day exploit to being a "users-shoulda-patched-by-now" vulnerability? It's all about how you define the meaning of the word "go":

Exploits are a new tool being used by international cyber criminal organizations that take advantage of security vulnerabilities in common software applications such as Windows operating systems and browsers.

That's right. An "exploit" is a new tool being used by mysterious and hard-to-track international men of crime and mystery. How, oh how, will I sleep tonight? Wait, never mind.


What's most frustrating about this is that the zero-day problem certainly is real. There are people who, despite being diligent on their patching, have their PCs exploited using attacks to which no countermeasure is generally available. eWeek and XPL certainly aren't the industry's version of the Wall Street Journal or Johnson & Johnson, but the average Joe out there who reads,


Microsoft and other applications vendors require an average of two months, and sometimes up to six months, to develop patches to fix newly discovered vulnerabilities. During this time period, known as "the risk window," Internet users are unprotected against exploits. In December of 2005, for example, the Windows Metafile (WMF) vulnerability was discovered and, within days, cyber-criminals such as the CoolWebSearch gang were distributing drive-by downloads to victims' computers. There even emerged an underground exchange where exploit authors were offering to sell their crimeware code to the highest bidders.

may be drawing the conclusion that Microsoft left them exposed to the WMF issue for months... and that their safety depends on purchasing a product. Granted, Microsoft is not above reproach here, but what's really behind the press releases and the media blitz? How does XPL's SocketShield product protect users against previously-unknown bugs, particularly given the patching challenges listed above, supported by additional assertions about where protection cannot take place:

SocketShield provides a critical layer of security that complements the defenses provided by traditional security solutions. Firewalls cannot stop exploits because exploits enter through the trusted communications stream of the user’s browser connection. Anti-virus and anti-spyware applications can’t protect against exploits because they must wait for the malware code to hit the hard disk in order to detect it, and by that time most exploits have already executed their payload. Patch management systems can’t distribute a patch until the application vendor releases it. And patching as a general practice, while critical, often fails because it relies on users taking action of their own volition.

Firewalls? No good. Patching? Not enough. AntiVirus? Nice try, but the exploit's already on your hard drive (but has it executed?!). How will my precious browsing remain protected? For that, let's go to the literature once again:

With SocketShield, Thompson and his team have developed the industry's first zero-day exploit blocker. It does this by monitoring the browser's communications stream and stopping known exploits from getting past the browser. The technology is powered by Exploit Prevention Labs' patent-pending Intelligence Network, which brings together a unique combination of research techniques:

  • Exploit Intelligence is an extended network of human researchers and automated probes, honeypots and search bots focused on discovering new vulnerabilities and exploit examples
  • The Reputation Filter creates an intelligent filter for known and suspected exploit distribution sites.
  • Community Intelligence is a community of SocketShield users who allow information about attempted exploitation of their computers to be transferred to Exploit Prevention Labs


The SocketShield Correlation Engine aggregates intelligence gained through this research, assembles it in real time, and distributes it transparently to SocketShield users, providing exploit-specific protection in minutes.

Now we have the rub: it's an application that monitors all of your browsing activity, sends some information (hopefully anonymized and with your consent) about what attacks your browser is encountering to a central repository, where data analysis is performed to quickly identify emergent threats and send updates in reaction faster than AV vendors might. Don't get me wrong - as a method, this certainly has merit (distributed computing applied to zero-day vulnerability research) and is an interesting approach. But I'm a luddite - I'll stick with my personal firewall, my WindowsUpdate, and my Firefox for now, thanks.

The computer knows.... OR DOES IT?!!?

Posted by erickru on May 27, 2006  •  Leave comment (0)

Back when I had a German car, it had one immensely useless indicator: the "check engine" light. When lit, it could indicate all sorts of critical problems, such as a loose gas cap, a snapped timing belt, a busted brake light... basically anything except a failure of something associated with the car's engine. Adjacent to that light was an indicator that would tell me whether the car wanted an oil change or, occasionally, a more expensive, full-serve trip to the dealer. It always bugged me that the car had ideas about its service needs that might surpass or supercede my own on the subject. At the end of the day, of course, I had the last word: it could beg and plead, but if I was going to go 8000 miles without changing the oil, that was that. (Private note to the guy who bought the car: I'm kidding of course - 8000 is just a random, foolish number I selected to illustrate a point. Wink wink.)

I've been playing around with pandora.com lately, basically using my idol worship page as the seed for my personal station. One of the neat features of Pandora is that, not only will it select songs for you based on its database of song characteristics, but it allows the user to query its AI to find out -why- it gave me Hootie and the Blowfish alongside Warren Zevon and Cake. Here's what it said:

Having used this feature a couple more times and come up with a few more reasons like

I began to wonder: do I really love the major key tonality that much? My conclusion: naaah. I'm not so predictable. I mean, I'm sure that if Goooooogle had been generating recommendations, they would have realized that I associate minor keys with important songs by a mere cross-reference of their vast document index. Note to Sergey and Larry: it's my idea, but do call me (you have the number) if you want to use it. We'll do lunch.

I also decided, upon further consideration, that manually adding Art Brut's "My Little Brother" (which seriously kicked my ass after hearing it on NPR a couple weeks ago) to my "station" may not have been the best strategy. I was given this nifty explanation:

Note to the pandora.com folks: it might have made more sense if you'd just said "because you added it yourself, fool." As it stands, I'm clicking through my playlist desperately trying to find a song with folk influences AND extensive vamping. Another note to the pandora folks: just plain vamping will not do. Twice through the chorus just isn't extensive enough. Perhaps twenty-seven (see: I Never by the aforementioned Rilo Kiley) will qualify.

Lies, damned lies, and statistics again.

Posted by erickru on April 26, 2006  •  Leave comment (0)

I'll confess up front: I didn't do very well in my stats class in college. I was alright on the basics, but at the point where we were picking multi-colored marbles out of a bag full of pennies, marbles, and puppies, I was struggling. Why the preface? Because I'm feeling compelled to reply to and/or otherwise dispel bad advice coming from supposedly reputable sources out there. In this case, it's this post on trimMail's weblog, which I would have ignored had it not come from an otherwise reputable vendor and been picked up by reddit.


The basic premise of the post is this:


What does this mean to the average sender? Many network managers set their email serving software to re-try sending four times, at increasing intervals. For example, after the first attempt, you may try again in 10 minutes. If that attempt fails, try again an hour later... again in 4 hours ... and last attempt, 10 hours after that.

If your chances of delivery are less than 60% each time you try, you may fail to get messages delivered at all. In the meantime, as ever more undelivered mail piles up in its delivery queue, your server will waste more and more storage, bandwidth and cpu cycles trying to ship the messages. Bad karma.

This is where my remedial statistics background (and a gut feel for how often Yahoo actually drops delivery of an email message) comes in. If the probability of success for an endeavor is P, then the probability of failure is 1-P. If you try the same thing twice, the the probability is (1-P)*(1-P). In other words, if the success rate is 60% every time you try, then the failure rate is 40% every time you try. If you try once, you'll fail 40% of the time; if you try a second time, you'll fail 40% of THOSE times, giving you compound 16% likelihood of failure. By the fourth attempt (assuming you'd configured your sendmail the way the author of the article had described), your likelihood of failure would be 2.56%.

Of course, this is where understanding SMTP comes in. If we take even the strictest interpretation of RFC2821, this chapter specifically:


Once an SMTP client lexically identifies a domain to which mail will
be delivered for processing (as described in sections 3.6 and 3.7), a
DNS lookup MUST be performed to resolve the domain name [22]. The
names are expected to be fully-qualified domain names (FQDNs):
mechanisms for inferring FQDNs from partial names or local aliases
are outside of this specification and, due to a history of problems,
are generally discouraged. The lookup first attempts to locate an MX
record associated with the name. If a CNAME record is found instead,
the resulting name is processed as if it were the initial name. If
no MX records are found, but an A RR is found, the A RR is treated as
if it was associated with an implicit MX RR, with a preference of 0,
pointing to that host. If one or more MX RRs are found for a given
name, SMTP systems MUST NOT utilize any A RRs associated with that
name unless they are located using the MX RRs; the "implicit MX" rule
above applies only if there are no MX records present. If MX records
are present, but none of them are usable, this situation MUST be
reported as an error.

When the lookup succeeds, the mapping can result in a list of
alternative delivery addresses rather than a single address, because
of multiple MX records, multihoming, or both. To provide reliable
mail transmission, the SMTP client MUST be able to try (and retry)
each of the relevant addresses in this list in order, until a
delivery attempt succeeds. However, there MAY also be a configurable
limit on the number of alternate addresses that can be tried. In any
case, the SMTP client SHOULD try at least two addresses.

we see the basic premise that multiple MX records are the Internet's mechanism for providing a failover mechanism for SMTP without resorting to fancy load balancers, etc. So the likelihood of failure for any one SMTP delivery "pass" isn't really 40% (the average of the availability of each MX) as the article implies. If we trust their availability statistics for mx1.mail.yahoo.com, mx2.mail.yahoo.com, and mx3.mail.yahoo.com, then we see that the likelihood of finding an available mail exchanger is actually closer to (1-.63)*(1-.75)*(1-.41) or 0.05 (let's throw out the priority-5 MX, which makes the odds of failure even that much less). With a 5% failure rate on each connection, the odds of failing on four consecutive attempts is now predicted to be 0.000625%.

I'm no huge fan of Yahoo's mail service, but something about trimMail's post triggered some sort of Righteous Indigantion reflex that compelled me to post a defense/rebuttal. I know, I know, I'll stick to songwriting - something at which I'm marginally better than statistics.