Data mining algorithms in plain English

Maybe not interesting if you're a data mining guru, but this explanation of the top 10 most influential data mining algorithms in plain English is a good read for the rest of us, though “plain English” is perhaps debatable.

Here's a good one, on k-means:

You might be wondering:
 
Given this set of vectors, how do we cluster together patients that have similar age, pulse, blood pressure, etc?
 
Want to know the best part?
 
You tell k-means how many clusters you want. K-means takes care of the rest.
 
How does k-means take care of the rest? k-means has lots of variations to optimize for certain types of data.
 
At a high level, they all do something like this:
  1. k-means picks points in multi-dimensional space to represent each of the k clusters. These are called centroids.
  2. Every patient will be closest to 1 of these k centroids. They hopefully won’t all be closest to the same one, so they’ll form a cluster around their nearest centroid.
  3. What we have are k clusters, and each patient is now a member of a cluster.
  4. k-means then finds the center for each of the k clusters based on its cluster members (yep, using the patient vectors!).
  5. This center becomes the new centroid for the cluster.
  6. Since the centroid is in a different place now, patients might now be closer to other centroids. In other words, they may change cluster membership.
  7. Steps 2-6 are repeated until the centroids no longer change, and the cluster memberships stabilize. This is called convergence.
     

This seems like a great idea for a book: the central data algorithms of the third industrial revolution, this networked, online age. One chapter per algorithm, with a discussion of how it manifests itself on the key websites, applications, hardware, and other services we use all the time now. If you are a data mining expert in need of someone to be the “plain English” side of a writing team, call me maybe.

The new coastal culture wars

At Amazon.com, all the irritation and wasted time of a shopping expedition are gone—the search for a parking place, the surly floor clerk, the sold-out items, the perversely slow person ahead of you at checkout. You don’t have to think about how much the cashier, with her wrist in a splint, makes per hour. The Internet’s invisibility shields Amazon from some of the criticism directed at its archrival Walmart, with its all-too-human superstores. Online commerce allows even conscientious consumers to forget that other people are involved.
 

Emphasis mine, in this passage from George Packer's article on Amazon vs the book publishing industry, from the February 17, 2014 issue. The piece was titled “Cheap Words” and the subhead read “Amazon is good for customers. But is it good for books?” 

Granted, it's tough to represent the pro-Amazon position when so few people will speak on the record or comment on the piece, but I will say I've read enough pieces on the tech industry from what you might call East Coast institutions to detect some coastal cultural bias in each direction. It's not surprising when software is eating the world and cultural influence shifts towards the West Coast for the liberal elite of, say, Manhattan, to turn a nose up at the hoodie-wearing, ping-pong playing, nouveau riche of Silicon Valley.

Packer has written a lot about Amazon in its ongoing battle with publishers, but he's not the only writer I've detected some of this tone in. His example stood out to me, however, because of the New Yorker's typically neutral tone. Here's a straw man of a cashier, or straw woman, as it were, and her wrist is in a splint. Why not just end her arm at the elbow, in a stub, like Charlize Theron's Imperator Furiosa?

Silicon Valley has enough real problems (Amazon included) that need addressing that shouldn't be obscured by conjuring false bogeymen. That so many have cast book publishers as some sympathetic white hat in this story is one of the more absurd developments in recent media history.

Slow innovation

But this system leaves a category of innovation stranded: new ideas based on new science. Self-fertilizing plants. Bacteria that can synthesize biofuels. Safe nuclear energy technology. Affordable desalination at scale. It takes time for new-science technologies to make the journey from lab to market, often including time to invent new manufacturing processes. It may take 10 years, which is longer than most venture capitalists can wait. The result? As a nation, we leave a lot of innovation ketchup in the bottle.
 
This is a relatively new situation. From the 1960s through the early 1990s, society’s investments in education and research produced smart people and brilliant ideas, and then big companies with big internal R&D operations would hire those people, develop those ideas and deliver them to the marketplace. When I joined MIT’s electrical engineering faculty in 1980, that model was working extremely well, translating discoveries from university labs across the country into innovations such as silicon-on-insulator technology (IBM) and strained silicon (Intel) — two advances indispensable to delivering on the promise of Moore’s Law , which since the ’60s has enabled the rapid advance of computing power. 
 
In the past two decades, and especially the past five years, the United States has undergone a profound shift in how it develops, adopts and capitalizes on innovation. Today, our highly optimized, venture-capital-driven innovation system is simply not structured to support complex, slower-growing concepts that could end up being hugely significant — the kind that might lead to disruptive solutions to existential challenges in sustainable energy, water and food security, and health.
 

From a call for the U.S. to come up with more ways to incubate and invest in slower forms of innovation, the types based on new science, which typically have longer gestation periods from conception to payoff.

A lot of green tech seems to fall into this category, requiring tons of capital and decade long time horizons, something most VC's aren't set up to handle.

Bitcoin feels a bit like something one could lump into this slow innovation category, minus the massive capital requirements of most green tech, though perhaps it's less because of scientific uncertainty and more on cultural and social inertia.

Chiraq

Spike Lee is working on a movie about the violence on the South Side of Chicago, and many in my hometown aren't happy with the working title Chiraq, a mash-up of Chicago and Iraq. Supposedly Rahm Emanuel met with Lee to express his disapproval, but so far Lee is standing by it.

No details about the film, which may be a musical comedy based off of the Greek comedy “Lysistrata” but will not feature Kanye, were disclosed Thursday, but Lee reiterated the importance of it given a recent spate of shootings across the Chicago area, notably in the Englewood area.
 

Maybe a musical comedy based off Lysistrata? Hmm.

Wisdom of the Kickstarter crowd?

I didn't realize this, but “Kickstarter now raises more money for artistic projects each year than the National Endowment for the Arts (NEA).” In light of that, an HBS professor decided to study whether the NEA and people on Kickstarter differ in how they select which projects to fund.

"First, it's important to consider that there's a bit of an art to raising money from the crowd," Nanda says. "Sometimes the judges liked projects for which the artists hadn't quite figured that part out. That said, most of the disagreements were on projects that the crowd liked but that the judges would potentially have given less money to or not have funded at all. Those particular crowd favorites showed more variance. They were more likely to be breakout hits, but also included one flop that judges might potentially have been able to stop." 
 
The crowd aggregation allowed the funding of many projects that were slightly outside the purview of what judges focused on, suggesting that Kickstarter's democratization enables a greater breadth of artistic production, says Nanda. At the same time, the study recognized that Kickstarter supporters weren't always applying the same kind of discipline and rigor in their analysis of projects. They simply liked a project and supported it, or didn't. 
 
"Overall, the general sense is that the projects that found success on Kickstarter were by no means crazy," Nanda says. "Quite the opposite. The average size of the project in our sample was similar to the average size of a project funded by the NEA. And yet, you can imagine that the kinds of projects people put on Kickstarter and the kind they submit to the NEA are quite different in composition and style, which is why we can't definitively say whether crowdfunding is a substitute to grant-making bodies such as the NEA."
 

The one advantage of Kickstarter over a grant from the NEA is that your supporters on Kickstarter effectively become your first audience. That is, given a fixed amount of funding, I'd hypothesize that getting that amount in small doses from lots of people is more optimal than getting all of it from one entity or person. It's a healthier, lower risk distribution of funds.

Longer term, the rise of crowdfunding is part of what I consider a healthy trend towards disintermediation in the arts, putting more of the tools of fundraising, production, distribution, marketing, etc., directly in artists' hands. Kickstarter doesn't just enable artists to raise money, it gives them a direct line to many of their fans, one they can turn to even after the project is complete.