Multiple testing

One of the potential pitfalls that arises now that it's easier and easier to test hundreds of variables to try to find correlations is the problem of multiple comparisons or multiple testing

The term "comparisons" in multiple comparisons typically refers to comparisons of two groups, such as a treatment group and a control group. "Multiple comparisons" arise when a statistical analysis encompasses a number of formal comparisons, with the presumption that attention will focus on the strongest differences among all comparisons that are made. Failure to compensate for multiple comparisons can have important real-world consequences, as illustrated by the following examples.

  • Suppose the treatment is a new way of teaching writing to students, and the control is the standard way of teaching writing. Students in the two groups can be compared in terms of grammar, spelling, organization, content, and so on. As more attributes are compared, it becomes more likely that the treatment and control groups will appear to differ on at least one attribute by random chance alone.
  • Suppose we consider the efficacy of a drug in terms of the reduction of any one of a number of disease symptoms. As more symptoms are considered, it becomes more likely that the drug will appear to be an improvement over existing drugs in terms of at least one symptom.
  • Suppose we consider the safety of a drug in terms of the occurrences of different types of side effects. As more types of side effects are considered, it becomes more likely that the new drug will appear to be less safe than existing drugs in terms of at least one side effect.

In all three examples, as the number of comparisons increases, it becomes more likely that the groups being compared will appear to differ in terms of at least one attribute. Our confidence that a result will generalize to independent data should generally be weaker if it is observed as part of an analysis that involves multiple comparisons, rather than an analysis that involves only a single comparison.

For example, if one test is performed at the 5% level, there is only a 5% chance of incorrectly rejecting the null hypothesis if the null hypothesis is true. However, for 100 tests where all null hypotheses are true, the expected number of incorrect rejections is 5. If the tests are independent, the probability of at least one incorrect rejection is 99.4%. These errors are called false positives or Type I errors.

A recent NBER paper argues that this problem invalidates most finance papers claiming to have found some formula for investing success. The abstract:

Hundreds of papers and hundreds of factors attempt to explain the cross-section of expected returns. Given this extensive data mining, it does not make any economic or statistical sense to use the usual significance criteria for a newly discovered factor, e.g., a t-ratio greater than 2.0. However, what hurdle should be used for current research? Our paper introduces a multiple testing framework and provides a time series of historical significance cutoffs from the first empirical tests in 1967 to today. Our new method allows for correlation among the tests as well as missing data. We also project forward 20 years assuming the rate of factor production remains similar to the experience of the last few years. The estimation of our model suggests that a newly discovered factor needs to clear a much higher hurdle, with a t-ratio greater than 3.0. Echoing a recent disturbing conclusion in the medical literature, we argue that most claimed research findings in financial economics are likely false.

Gaze deeply enough into the noise and you'll see some pattern.

[via Vox]

RELATED: Spurious correlations

Robots take all the jobs (composer edition)

Xhail is a new service that offers a unique, custom score for your movie.

Here's the rub: the score is written by software, using real instrument stems. Instead of talking to a composer about what you want, you simply type in keywords like “fantasy” or “melancholy” and the software returns a score which you can customize using the interface provided. Add instruments, take out sections, add percussive emphasis at key timecode to match action on screen. The demo video gives a good sense of how it works.

Lots of details are still missing, like how much does it cost? Still, it's an impressive demo. The track composed for the fantasy short at the end of the demo video and the interface for modifying the video both were much better than I expected. You'd expect nothing less from a scripted demo video, and we'll have to wait for a public release to see if it's all that, but I'm intrigued.

I suspect many will rush to dismiss this service, especially my friends in the filmmaking world, just as people tend to do with any computer-generated art, but some of that, as always, comes from either a general technophobia or reverence for human creation.

If you can afford a real composer, this isn't a service targeted at you. Facetious title of my post aside, I suspect this is a less a case of replacing our existing composer supply than adding supply at the low end of the market.

That Lebron ad

Nike just released a black and white spot titled Together, starring Lebron James. Many people forwarded it to me, and it was posted a lot in my Facebook and Twitter streams, to almost universal adoration. In it, thousands of citizens of Cleveland join Lebron and his teammates in the pre-game huddle, infusing their team's upcoming season with an almost spiritual civic calling.

Maybe I'm a cold-hearted cynic, but it struck me as simply the slickest of propaganda, a bit like the recent Derek Jeter Gatorade ad. The Jeter ad was also grandiose, black and whtie footage showing him mixing with the people of New York on his way to the stadium, but at least it was a retrospective, and Yankees fans mythologize him in a way that probably makes it feel as if he's a man of the people, someone who belongs to them, even if he isn't, not any more than most celebrity stars (the most honest part of the ad was when some bar owner says to Jeter “We've been waiting for you to come into here since 98 at least” and Jeter retorts, “You never invited me.”).

It's understandable, Nike and Lebron have been trying to couch his entire return to Cleveland as motivated purely by his loyalty to his home city. It began with the letter he wrote announcing his resigning with the Cavaliers (emphasis mine):

Before anyone ever cared where I would play basketball, I was a kid from Northeast Ohio. It’s where I walked. It’s where I ran. It’s where I cried. It’s where I bled. It holds a special place in my heart. People there have seen me grow up. I sometimes feel like I’m their son. Their passion can be overwhelming. But it drives me. I want to give them hope when I can. I want to inspire them when I can. My relationship with Northeast Ohio is bigger than basketball. I didn’t realize that four years ago. I do now.

...

When I left Cleveland, I was on a mission. I was seeking championships, and we won two. But Miami already knew that feeling. Our city hasn’t had that feeling in a long, long, long time. My goal is still to win as many titles as possible, no question. But what’s most important for me is bringing one trophy back to Northeast Ohio.

I always believed that I’d return to Cleveland and finish my career there. I just didn’t know when. After the season, free agency wasn’t even a thought. But I have two boys and my wife, Savannah, is pregnant with a girl. I started thinking about what it would be like to raise my family in my hometown. I looked at other teams, but I wasn’t going to leave Miami for anywhere except Cleveland. The more time passed, the more it felt right. This is what makes me happy.

...

But this is not about the roster or the organization. I feel my calling here goes above basketball. I have a responsibility to lead, in more ways than one, and I take that very seriously. My presence can make a difference in Miami, but I think it can mean more where I’m from. I want kids in Northeast Ohio, like the hundreds of Akron third-graders I sponsor through my foundation, to realize that there’s no better place to grow up. Maybe some of them will come home after college and start a family or open a business. That would make me smile. Our community, which has struggled so much, needs all the talent it can get.

Really? Lebron always knew he was going to go back to Cleveland? He wants to lift up the local economy?

Would Lebron have gone back to Cleveland if they didn't have Kyrie Irving and back-to-back first overall draft picks in Anthony Bennett and Andrew Wiggins, who they parlayed into Kevin Love? Let's look back at the two previous times Lebron has been on rosters in decline and see what he did. He left both times.

I'm sure going back home was one checkmark in the plus column for him, and given fans were burning his jersey the last time he left, I don't blame Nike and Lebron for trying to flip the narrative on his return to try to win back the Cleveland fans. I didn't watch his first home game last night against the Knicks, but I saw a highlight of him in the tunnel with his teammates, gathering them in a huddle, telling them this was one of the most important games in history.

Why would it be one of the most important games in history? There's only one reason, because he was playing back in Cleveland. Either he buys into this vision of him as an economic messiah for Ohio, or he's playing the part as scripted in the Nike ad, but either way it's a comical level of self-importance.

No argument here, he's been the best player in the NBA for several years, he's one of the all-time greats. I also don't think players owe their teams a lifetime commitment of employment, especially since they don't have much choice in who drafts them. I don't hold it against players when they sign where they can get the most money, it's the same thing most people in any profession would do. Frankly, a player like Lebron is grossly underpaid, as are star young athletes in most sports given artificial salary restrictions in the years after they're drafted up until they become free agents.

This is also Nike's brand, they're famous for trying to transform a mundane pair of sneakers into a golden ticket into the community of elite go-getters. Growing up in Chicago a Bulls fan, I can recall many of Michael Jordan's Nike ads by heart. But almost all of them were centered around his mastery of the craft of basketball, and the transference of that skill to his shoes in the typical halo of excellence that all brands dream of for their products. They didn't show Jordan hanging out with people on the streets of Chicago, they didn't pretend he was accessible or normal in any way. From all accounts, he's a competitive sociopath, and the greatest basketball player of all time, and lots of people idolized him. His Nike spots never tried to deny that, they worked with it.

An honest reading of Lebron's return to Cleveland is that it ended up meshing with his own self-interest in being with a roster on the upswing, with two young All-Star talents and room to maneuver at the trade deadline. That wouldn't make for a great commercial, though, and so we post his new spot to social media with captions like “Chills!” I can't even blame the fans of Cleveland if they've bought into this new narrative. It's more fun than holding a grudge.

And hey, cool kicks.

Human and computer curation in the age of information abundance

I don't usually talk shop here, but I wanted to spend a few words discussing some of the key strengths of the new version of Flipboard we just shipped, v3.0 on the odometer. Though I've only been at the company just over a year and a half, Flipboard has been around since just after the first iPad shipped. In many ways, it's the first iPad app I can remember because it was the first native iPad app, the first app that didn't seem like a port of an iPhone app. It launched at first just for the iPad, and it wasn't until later that the a version of Flipboard for the phone shipped.

Because of its long history, so many of my readers may have played with Flipboard at some point in the past and forgotten about it. Many of my readers or Twitter followers are what I consider to be among the top 1% of voracious information consumers, and for them, high density information presentation, typically an RSS reader with hundreds of RSS feeds, or Twitter, with its neverending supply of links and headlines and thoughts, are the weapons of choice. I can understand that, and I use both as well.

But both of those options offer a certain inherent level of noise because of how they work. On Twitter it's because people you follow have lots of interests, some of which match yours, some of which don't. Sometimes they point you at things outside your sphere of interests, but once you're following several hundred people, as I am, the multitude of noise can pile up.

The RSS feed junkie is sort of the predecessor to the Twitter information junkie, and I know people who subscribe to literally hundreds of RSS feeds in a feed reader and try to keep up with the flood of headlines each day. The problem is the same there; it's almost inevitable that a list of that many information sources will inject a lot of perceived junk into your mental diet.

Though it doesn't feel like it to the top 1% of information addicts, cultivating a list of hundreds of people to follow or hundreds of RSS feeds is exhausting and tedious for the 99%. Keeping up is its own burden; I long ago gave up trying to read every tweet in my Timeline, or every headline from every RSS feed I follow.

At the same time, FOMO is a very real phenomenon not just in the real world but in the information space, and it's a natural outgrowth of the internet, the most efficient delivery mechanism of data that has ever been invented in the history of mankind. Effectively, there is a near infinite amount of information out there, and more is being generated each day than one person could read in a lifetime. Supply is no longer the problem; oversupply is.

Given that we all have a finite amount of attention to give each day, how do we allocate it most efficiently? Flipboard 3.0 is different strategy for answering that problem, and it's different not only from the other apps/services outlined above but even from past versions of itself. For that reason, it has something to offer both the power information consumer and the casual information grazer.

The new Flipboard, designed specifically for phones and built in part off of the great technology acquired with our purchase of Zite, centers around a person's interests. The combination of any person's interests forms a sort of intellectual fingerprint, and that is the North star for personalization in the app. You might find it challenging to curate a list of all the experts in a field across their blogs, websites, research papers, Twitter and Facebook accounts, etc. Multiply that across all of your interests, and the task of following the right sources and people grows by leaps and bounds. I follow nearly 900 people on Twitter and I'm still adding and deleting people all the time.

The new Flipboard simply asks for your interests, a very finite and manageable list, and then uses that to find the best articles for you on those topics, wherever those might reside, and regardless if you know or follow the authors or blogs. We offer over 30,000 such topics to choose from, and if you have no idea how to know if that's a good number, I can assure you it gets deep into the long tail. And it's growing all the time; when we started work on this release, there was no topic in our database called “Peter Piot”. We hadn't seen enough recent articles of note about Piot. And then a two year old boy died in West Africa, and the world changed. Peter Piot helped discover Ebola. New topics arise every day; a few weeks ago, we added “Apple Pay”, and soon we may have to add CurrentC, though maybe that will disappear before it has a chance to become a thing.

It's in the discovery of content on these tens of thousands of topics where computers and algorithms have a huge advantage over humans. You might think of Flipboard as an intelligent agent, an AI that can read and screen millions of new articles written each day, something no human can come close to doing. This intelligent agent knows what you're interested in and understands when it finds an article on that topic, and it can do that for millions of users on tens of thousands of topics across millions of articles every day.

However, to say human have been rendered obsolete in yet another field couldn't be further from the truth. It's of little use if your digital reader brings back every article on every topic you care about. That's still way an information deluge. Remember, the goal here is efficient allocation of your limited attention. Your intelligent agent also needs to separate the signal from the noise.

This is where humans remain in play. It turns out computers are good at determining the topic of article but not the quality. To do that, we turn to the aggregate reading and curation activity of millions of readers. The advantage of some scale in this space is that there are enough eyeballs staring at enough items of interest that you need only follow the collective gaze to find what's interesting. This is especially true with content that isn't based on text, like photographs, where computer vision is still fairly crude in comparison to human vision when it comes to understanding both subject matter and aesthetic value. Ask any social service of scale if they'd trust a computer to moderate content for pornography and other disturbing imagery; well over a 100,000 workers worldwide filtering the filthy and the horrific from our social streams are proof they don't.

[By the way, even with textual content, computers aren't perfect. A topic like “magic” is a good example. Computers will bring back content on the Orlando Magic, card tricks, and articles with headlines referring to Roger Federer as a magician with a tennis racket.]

Just as the best chess players in the world are not grandmasters or top computer programs but some combination of human player with a chess program, the best curation is still some combination of human judgment and a variety of computer algorithms. Someday perhaps there will be some digital general intelligence of such power that human taste is superfluous, but today is not that day.

Even all of this is not enough, though. It turns out that if you give every person exactly all these interesting articles on all the topics they tell you they want to see, they get bored. People are always chasing after the unexpected delight, it gives them a mental rush to stumble across a fascinating article they wouldn't have expected us to realize they'd like. This is the serendipity that is habit-forming.

Some argue that the serendipity can consist of just noise. That presumes people enjoy sifting through the irrelevant to find the gold dust. I think some people enjoy the hunt, but many are not so persistent, so our goal is for even the serendipity in Flipboard to be of interest. There are many approaches to achieving this, and I won't delve into the technology behind it all, it's a lot of math. All that matters for readers is that it works, because when it does it feels like magic.

One last thing: a person's interests evolve over time. You may choose a bunch of topics in your initial setup, but even if you don't alter that list again, it's possible to tell, based on your reading behavior, when your tastes, or what we call your affinities, change. The more you read and like and curate on Flipboard, the more your Flipboard will start to feel like a pair of raw denim jeans that break in and mold to your legs, or like a sportcoat or dress that's being tailored for you a bit each day.

My recommendation for both folks who may have used Flipboard before but haven't used it in a while and for loyalists: update to the new app, version 3.0 for those who still care about such designations, and follow a bunch of topics, and then follow even more, the more specific the better. At last count, I was following 90 topics on Flipboard, and that number has been growing by a few each week I've been playing and testing the app.

Here are just a few topics I'm following to give you a sense of the breadth of our database:

  • Joe Maddon - soon to be announced as the new Cubs manager, I hope.
  • Nikon D750 - I haven't upgraded my Nikon digital SLR for several generations, but I changed my mind for this baby. Finally, Nikon adds wi-fi so I can get a photo from my SLR to my phone quickly. Why they didn't add this years ago still boggles my mind.
  • Peter Thiel and Marc Andreessen - interesting thinkers in the technology world
  • Paul Thomas Anderson
  • Driverless Cars
  • GIF Animations - we should probably just rename this to Animated GIFs, but maybe this is how they're referred to in high company.
  • Augmented Reality
  • E-learning
  • David Foster Wallace
  • Information Design
  • Cinematography
  • Roger Deakins - speaking of cinematography...Deakins should've won an Oscar in 2008 for The Assassination of Jesse James by the Coward Robert Ford. Got a double nomination in that category that year! The other was for No Country for Old Men
  • Neuroscience
  • Cycling
  • Sports Analytics
  • Technological Singularity
  • Optical Illusions
  • Memes
  • Sneakers
  • Tesla
  • Elon Musk - the entrepreneur whose courage and bravado are the envy of the Valley.
  • Chicago Cubs - finally, a light at the end of the tunnel. Fellow Cubs fans, it's about to get good.
  • Interior Design - I finally bought a condo in June, I'm still in the midst of renovation and decoration hell, someone tell me it gets better.

One thing I've often felt in trying lots of services like Nuzzel and Quibb on top of Twitter and RSS Readers is seeing the same set of articles referred to me several times. I'll see it first appear on Twitter, then a short while later on services like Nuzzel or Quibb, and then a day later in emails of top tweets from Twitter, and then even later on some blogs I follow.

What was always refreshing about throwing Zite and now Flipboard into the mix was the sense of discovering things I hadn't already seen many times. It has taken a lot of tuning and testing to dial that in, and it's an ongoing project, but when it works it feels like magic, and it gives me the feeling of being ahead of the herd in finding things to link to on my blog or to post on Twitter.

Okay, I've stepped off my work soapbox. I'm biased, of course, so don't take my word for it, listen to Farhad Manjoo at the NYTimes. Or if you don't want to listen to Farhad, take the word of Jennifer Garner, who revealed her favorite app is Zite, whose core technology now powers key pieces of the new Flipboard.

While Garner may not have a big social media presence, she did reveal her favorite app. "Zite," she said. "It's like a magazine and it sends you all your favorite things. So mine are West Virginia, world news, kids, parenting, relationships, healthy living, food and cooking…It just curates exactly what I want to read."

She is definitely way more beautiful than I am (I'm not sure about Farhad, who I've never met), and with kids to raise and a Hollywood career, she's also busier and has a higher opportunity cost for her time, so if we can earn her trust in allocating her minimal free time, maybe there's something there.