Drawing invisible boundaries in conversational interfaces

One of the things anyone who has worked on textual conversation interfaces, like chatbots, will tell you is that the challenge is dealing with the long tail of crazy things people will type. People love to abuse chatbots. Something about text-based conversation UI's invites Turing tests. Every game player remembers the moment they first abandoned their assigned mission in Grand Theft Auto to start driving around the city crashing into cars, running over pedestrians, just to exercise their freedom and explore just what happens when they escape the plot mechanic tree.

However, this type of user roaming or trolling happens much less with voice interfaces. Sure, the first time a user tries Siri or Alexa or whatever Google's voice assistant is called (it really needs a name, IMO, to avoid inheriting everything the word "Google" stands for), they may ask something ridiculous or snarky. However, that type of rogue input tends to trail off quickly, whereas it doesn't in textual conversation UI's.

I suspect some form of the uncanny valley and blame the affordances of text interfaces. Most text conversation UI's are visually indistinguishable from those of a messaging UI used to communicate primarily with other human beings. Thus it invites the user to probe its intelligence boundaries. Unfortunately, the seamless polish of the UI isn't matched by the capabilities of chatbots today, most of which are just dumb trees.

On the other hand, none of the voice assistants to date sounds close to replicating the natural way a human speaks. These voice assistants may have more human timbre, but the stiff elocution, the mispronunciations, the frequent mistakes in comprehension, all quickly inform the user that what they are dealing with is something of quite limited intelligence. The affordances draw palpable, if invisible, boundaries in the user's mind, and they quickly realize the low ROI on trying anything other than what is likely to be in the hard-coded response tree. In fact, I'd argue that the small jokes that these UI's insert, like answering random questions like "what is the meaning of life?" may actually set these assistants up to disappoint people even more by encouraging more such questions the assistant isn't ready to answer (I found it amusing when Alexa answered my question, "Is Jon Snow dead?" two seasons ago, but then was disappointed when it still had the same abandoned answer a season later, after the question had already been answered by the program months ago).

The same invisible boundaries work immediately when speaking to one of those automated voice customer service menus. You immediately know to speak to these as if you're addressing an idiot who is also hard of hearing, and the goal is to complete the interaction as quickly as possible, or to divert to a human customer service rep at the earliest possible moment.

[I read on Twitter that one shortcut to get to a human when speaking to an automated voice response system is to curse, that the use of profanity is often a built-in trigger to turn you over to an operator. This is both an amusing and clever design but also feels like some odd admission of guilt on the part of the system designer.]

It is not easy, given the simplicity of textual UIs, to lower the user's expectations. However, given where the technology is for now, it may be necessary to erect such guardrails. Perhaps the font for the assistant should be some fixed-width typeface, to distinguish it from a human. Maybe some mechanical sound effects could convey the robotic nature of the machine writing the words, and perhaps the syntax should be less human in some ways, to lower expectations.

One of the huge problems with voice assistants, after all, is that the failures, when they occur, feel catastrophic from the user perspective. I may try a search on Google that doesn't return the results I want, but at least something comes back, and I'm usually sympathetic to the idea that what I want may not exist in an easily queryable form on the internet. However, though voice assistant errors occur much less frequently than before, when they do, it feels as if you're speaking to a careless design, and I mean careless in all sense of the word, from poorly crafted (why didn't the developer account for this obvious query) and uncaring (as in emotionally cold).  

Couples go to counseling over feeling as if they aren't being heard by each other. Some technology can get away with promising more than they can deliver, but when it comes to tech that is built around conversation, with all the expectations that very human mode of communication has accrued over the years, it's a dangerous game. In a map of the human brain, the neighborhoods of "you don't understand" and "you don't care" share a few exit ramps.

Evaluating mobile map designs

I saw a few links to this recent comparison by Justin O'Beirne of the designs of Apple Maps vs. Google Maps. In it was a link to previous comparisons he made about a year ago. If you're into maps and design, it's a fairly quick read with a lot of useful time series screenshots from both applications to serve as reference points for those who don't open both apps regularly.

However, the entire evaluation seems to come from a perspective at odds with how the apps are actually used. O'Beirne's focus is on evaluating these applications from a cartographic standpoint, almost as if they're successors to old wall-hanging maps or giant road atlases like the ones my dad used to plot out our family road trips when we weren't wealthy enough to fly around the U.S. 

The entire analysis is of how the maps look when the user hasn't entered any destination to navigate to (what I'll just refer to as the default map mode). Since most people use these apps as real-time navigation aids, especially while driving, the views O'Beirne dissects feel like edge cases (that's my hypothesis, of course; if someone out there who has actual data on % of time these apps are used for navigation versus not, I'd love to hear it, even if it's just directional to help frame the magnitude).

For example, much of O'Beirne's ink is spent on each application's road labels, often at really zoomed out levels of the map. I can't remember the last time I looked at any mobile mapping app at the eighth level of zoom, I've probably only spent a few minutes of my life in total in all of these apps at that level of the geographic hierarchy, and only to answer a trivia question or when visiting some region of the world on vacation.

What would be of greater utility to me, and what I've yet to find, is a design comparison of all the major mapping apps as navigation aids, a dissection of the UX in what I'll call their navigation modes. Such an analysis would be even more useful if it included Waze, which doesn't have the market share of Apple or Google Maps but which is popular among a certain set of drivers for its unique approach to evaluating traffic, among other things.

Such a comparison should analyze the visual comprehensibility of each app in navigation mode, which is very different from their default map views. How are roads depicted, what landmarks are shown, how clear is the selected path when seen only in the occasional sidelong glance while driving, which is about as much visual engagement as a user can offer if operating a 3,500 pound vehicle. How does the app balance textual information with the visualization of the roads ahead, and what other POI's or real world objects are shown? Waze, for example, shows me other Waze users in different forms depending on how many miles they've driven in the app and which visual avatars they've chosen.

Of course, the quality of the actual route would be paramount. It's difficult for a single driver to do A/B comparisons, but I still hope that someday someone will start running regular tests in which different cars, equipped with multiple phones, each logged into different apps, try to navigate to the same destination simultaneously. Over time, at some level of scale, such comparison data would be more instructive than the small sample size of the occasional self-reported anecdote.

[In the future, when we have large fleets of self-driving cars, they may produce insights that only large sample sizes can validate, like UPS's "our drivers save time by never turning left." I'd love if Google Maps, Apple Maps, or Waze published some of what they've learned about driving given their massive data sets, a la OKCupid, but most of what they've published publicly leans towards marketing drivel.]

Any analysis of navigation apps should also consider the voice prompts: how often does the map speak to you, how far in advance of the next turn are you notified, how clear are the instructions? What's the signal to noise? What are the default wording choices? Syntax? What voice options are offered? Both male and female voices? What accents?

Ultimately, what matters is getting to your destination in the safest, most efficient manner, but understanding how the applications' interfaces, underlying data, and algorithms influence them would be of value to so many people who now rely on these apps every single day to get from point A to B. I'm looking for a Wirecutter-like battle of the navigation apps, may the best system win.

    The other explicit choice O'Beirne makes is noted in a footnote:

    We’re only looking at the default maps. (No personalization.)

    It is, of course, difficult to evaluate personalization of a mapping app since you can generally only see how each map is personalized for yourself. However, much of the value of Google Maps lies in its personalization, or what I suspect is personalization. Given where we are in the evolution of many products and services, analyzing them in their non-personalized states is to disregard their chief modality.

    When I use Google Maps in Manhattan, for example, I notice that that the only points of interest (POI's) the map shows me at various levels of zoom seem to be places I've searched for most frequently (this is in the logged in state, which is how I always use the app). Given Google's reputation for being a world leader in crunching large data sets, it would be surprising if they weren't selecting POI labels, even for non-personalized versions of their maps, based on what people tend to search for most frequently.

    In the old days, if you were making a map to be hung on the wall, or for a paper map or road atlas, what you chose as POI's would be fixed until the next edition of that map. You'd probably choose what felt like the most significant POI's based on reputation, ones that likely wouldn't be gone before the next update. Eiffel Tower? Sure. Some local coffee shop? Might be a Starbucks in three months, best leave that label off.

    Now, maps can be updated dynamically. There will always be those who find any level of personalization creepy, and some are, but I also find the lack of personalization to be immensely frustrating in some services. That I search for reservations in SF on Open Table and receive several hundred hits every time, sorted in who knows what order, instead of results that cluster my favorite or most frequently booked restaurants at the top, drives me batty.

    When driving, personalization is even more valuable because it's often inconvenient or impossible to type or interact with the device for safety reasons. It's a great time saver to have Waze guess where I'm headed automatically ("Are you driving to work?" it asks me every weekday morning), and someday I just want to be able to say "give me directions to my sister's" and have it know where I'm headed.

    My quick first person assessment, despite the small sample size caveats noted earlier:

    • I know that Apple Maps, as the default on iOS, has the market share lead on iPhone by a healthy margin. Still, I'll never get past the time the app took me off to a dead end while I was on the way to a wedding, and I've not used it since except to glance at the design. It may have the most visually pleasing navigation mode aesthetic, but I don't trust their directions at the tails. Some products are judged not on their mean outcome but their handling of the tails. For me, navigation is one of those.
    • It's not clear if Apple Maps should have a data edge over Google Maps and Waze (Google bought Waze but has kept the app separate). Most drivers use it on the iPhone because it's the default, but Google got a headstart in this space and also has a fleet of vehicles on the road taking Google street photos. Eventually, Google may augment that fleet with self-driving cars.
    • I trust Google Maps directions more than those of Apple Maps. However, I miss the usability of the first version of Google Maps, which came out on iOS way back with the first iPhone. I'd heard rumors Apple built that app for Google, but I'm not sure if that's true. The current flat design of Google Maps often strands me in a state in which I have no idea how to initiate navigation. I'd like to believe I'm a fairly sophisticated user and yet I sometimes sit there swiping and tapping in Google Maps like an idiot, trying to get it to start reading turn by turn directions. Drives me batty.

    I use Waze the most when driving in the Bay Area or wherever I trust that there are enough other drivers using Waze that it will offer the quickest route to my destination. That seems true in most major metropolitans. I can tell a lot of users in San Francisco use Waze because sometimes, when I have to drive home to the city from the Peninsula, I find myself in a line of cars exiting the highway and navigating through some random neighborhood side street, one that no one would visit unless guided by an algorithmic deity. 

    I use Waze with my phone mounted to one of those phone clamps that holds the phone at eye level above my dashboard because the default Tesla navigation map is still on Google Maps and is notoriously oblivious to traffic when selecting a route and estimating an arrival time. Since I use Waze more than any other navigation app, I have more specific critiques.

    • One reason I use Waze is that it seems the quickest to respond to temporary buildups of traffic. I suspect it's because the UI has a dedicated, always visible button for reporting such traffic. Since I'm almost always the driver, I have no idea how people are able to do such reporting, but either a lot of passengers are doing the work or lots of drivers able to do so while their car is stuck in gridlock. The other alternative, that drivers are filing such reports while their cars are in motion, is frightening.
    • I don't understand the other social networking aspects of Waze. They're an utter distraction. I'm not immune to the intrinsic rewards of gamification, but in the driving context, where I can't really do much more than glance at my phone, it's all just noise. I don't feel a connection to the other random Waze drivers I see from time to time in the app, all of which are depicted as various pastel-hued cartoon sperm. In wider views of the map, all the various car avatars just add a lot of visual noise.
    • I wish I could turn off some of the extraneous voice alerts, like "Car stopped on the side of the road ahead." I'm almost always listening to a podcast in the background when driving, and the constant interruptions annoy me. There's nothing I can do about a car on the side of the road, I wish I could customize which alerts I had to hear.
    • The ads that drop down and cover almost half the screen are not just annoying but dangerous as I have to glance over and then swipe them off the screen. That, in and of itself, is disqualifying. But beyond that, even while respecting the need for companies to make money, I can't imagine these ads generate a lot of revenue. I've never looked at one. If the ads are annoying, the occasional survey asking me which ads/brands I've seen on Waze are doubly so. With Google's deep pockets behind Waze, there must be a way to limit ads to those moments where they're safe or clearly requested, for example when a user is researching where to get gas or a bit to eat. When a driver has hands on the wheel and is guiding a giant mass of metal at high velocity, no cognitive resources should be diverted to remembering what brands you recall seeing on the app.
    • Waze still doesn't understand how to penalize unprotected left turns, which are almost completely unusable in Los Angeles at any volume of traffic. At rush hour it's a fatal failure, like being ambushed by a video game foe that can kill you with one shot with no advance warning. As long as it remains unfixed, I use Google Maps when in LA. I can understand why knowledge sharing between the two companies may be limited by geographic separation despite being part of the same umbrella company, but that the apps don't borrow more basic lessons from other seems a shame.
    • I use Bluetooth to listen to podcasts on Overcast when driving, and since I downloaded iOS 11, that connection has been very flaky. Also, if I don't have the podcast on and Waze gives me an voice cue, the podcast starts playing. I've tried quitting Overcast, and the podcast still starts playing every time Waze speaks to me. I had reached a good place in that Overcast would pause while Waze spoke so they wouldn't overlap, but since iOS 11 even that works inconsistently. This is just one of the bugs that iOS 11 has unleashed upon my phone, I really regret upgrading.

    IKEA's Billy bookcase

    Now there are 60-odd million in the world, nearly one for every 100 people - not bad for a humble bookcase. 
    In fact, so ubiquitous are they, Bloomberg uses them to compare purchasing power across the world. 
    According to the Bloomberg Billy Bookcase Index - yes, that's a thing - they cost most in Egypt, just over $100 (£79), whereas in Slovenia you can get them for less than $40 (£31).

    A few of the interesting stats on Ikea's Billy bookcase series.

    To get as rich as Mr Kamprad has, you have to make stuff that is both cheap and acceptably good.

    And to get even richer, you make stuff that is both cheap and the best in its class, though that's not as easy with furniture as it is with software.

    IKEA is an interesting example of disruption that I haven't read as many think pieces on as the usual suspects in tech.

    I miss first-gen Google Maps for IOS

    This is an oldie, but still relevant: an informative deep dive into the design choices of Google Maps and Apple Maps on iOS.

    I wish I had screens from the first version of Google Maps that shipped on the iPhone, a version that was rumored to have been built by Apple for Google. To me, that's still the most usable mapping app ever for iOS, and all subsequent versions, including both of the latest versions of Google Maps and Apple Maps, are more complex. The new maps may do more and offer more functionality, but if you just wanted to quickly get directions to a particular place, nothing beat the first-gen Google Maps for iOS.

    Part of this is the result of the new flat design aesthetic, which is sleek but often opaque. In many ways, touchscreen user interfaces seem to have approached a local maximum in which the only innovation is coming up with new icons that users must learn. At some point, we're just substituting new abstractions and not making significant leaps forward in usability. More apps are  better, on average, than the first generation of mobile apps, but the best designed apps today don't feel much better than the best apps from the dawn of the iOS app store.

    These days, the great leap forward in interface design feels like it's the complete removal of the abstraction of traditional software design. The interface that feels closest to achieving that in the near future is text, most often found in some sort of messaging interface. Following on its heels, with even greater potential as a democratic UI medium, is voice.

    Asym spacing

    I've never heard of this typography concept: asym spacing.

    But one tech company believes something as simple as increasing the size  of spacing between certain words could improve people’s reading comprehension. Research going back decades has found that “chunking,” a technique that separates text into meaningful units, provides visual cues that help readers better process information.
    The image below shows  the before and after of Asym’s spacing on a paragraph  of text. Quartz  is also experimenting by manually adding Asym’s spaces  to this article. The effect  is subtle, but likely will irk keen-eyed copy editors (sorry!), especially those from the print world who are accustomed  to deleting extraneous spaces.

    No idea if the science behind this is solid, but I have heard of chunking. When I took a speed-reading class in grade school, they taught us two key principles. One was not to read aloud “inside your head,” and the other was not to read linearly, one word at a time, but to look at chunks of words (which also makes it hard to read linearly).

    Maybe because I already chunk groups of words in regularly spaced text, or maybe because the asym spacing bunched odd groups of words together, I found the regularly spaced text (on the left) easier to read.

    Decoding restaurants

    Last year, on the fiftieth anniversary of restaurant desegregation, we celebrated a signifying moment in the long march toward full and equal citizenship for black Americans. But we delude ourselves if we don’t acknowledge that there is a difference between being admitted and being welcomed.
    The court order that ended desegregation stipulated that every cafe, tavern, Waffle House, and roadside joint must open its doors to all. It did not, could not, stipulate that whites in the South must also open their hearts and minds to all. Welcome was, and is, the final barrier to racial parity.
    We have witnessed remarkable progress over the past five decades, yes, and we should acknowledge this, too. What seemed fanciful, even utopian, a generation ago is now so commonplace as to not bear any comment at all. We have come to expect and accept black and white in the workplace, on the playing field, in politics, in the military, and we congratulate ourselves on our steady march to racial harmony. But our neighborhoods and our restaurants do not look much different today than they did fifty years ago. That Kingly vision of sitting down at the same table together and breaking bread is as smudgy as it’s ever been.

    Todd Kliman set out to try to understand why, decades after desegregation, so few restaurants host a mixed clientele of black and white. Of course, the issues is about more than just restaurants. The questions he asks and the theories he uncovers can be pointed at bars, clubs, neighborhoods, and schools.

    It was a man named Andy Shallal who helped me to understand the possibilities for a better, more integrated future while also reinforcing the manifold problems of the present. Shallal made me understand that no one ever need say, “keep out.” That a message is embedded in the room, in the menu, in the plates and silverware, in the music, in the color scheme. That a restaurant is a network of codes. It’s a phrase that, yes, has all sorts of overtones and undertones, still, in the South. I’m using it, here, in the semiotic sense—the communication by signs and symbols and patterns.
    I don’t see coding as inherently malicious. But we need to remember that restaurants have long existed to perpetuate a class of insiders and a class of outsiders, the better to cultivate an air of desirability. Tablecloths, waiters in jackets and ties, soft music—these are all forms of code. They all send a very specific, clear message. That is, they communicate without words (and so without incurring a legal risk or inviting criticism or censure from the public) the policy, the philosophy, the aim of the establishment.
    Today, there are many more forms of code than the old codes of the aristocracy. Bass-thumping music. Cement floors and lights dangling from the ceiling. Tattooed cooks. But these are still forms of code. They simultaneously send an unmistakable signal to the target audience and repel all those who fall outside that desired group.

    The same codes are at work in websites and applications, though they often act subconsciously. Color, typography, imagery, layout, and so many other aspects of the user experience make different users feel more welcome than others.

    Is your service more welcoming to the old or the young? Women or men? One ethnicity or another? The rich or the poor? The tech savvy or those less so? Those with fast internet access or those without? The visually inclined or the more textually focused? To new users or longtime users? The famous or the not-so-famous? Content creators or consumers?

    It's rare the service that is perfectly neutral.

    Universal sign language

    “Decide” is what is known as a telic verb—that is, it represents an action with a definite end. By contrast, atelic verbs such as “negotiate” or “think” denote actions of indefinite duration. The distinction is an important one for philosophers and linguists. The divide between event and process, between the actual and the potential, harks back to the kinesis and energeia of Aristotle’s metaphysics.
    One question is whether the ability to distinguish them is hard-wired into the human brain. Academics such as Noam Chomsky, a linguist at the Massachusetts Institute of Technology, believe that humans are born with a linguistic framework onto which a mother tongue is built. Elizabeth Spelke, a psychologist up the road at Harvard, has gone further, arguing that humans inherently have a broader “core knowledge” made up of various cognitive and computational capabilities. 
    In 2003 Ronnie Wilbur, of Purdue University, in Indiana, noticed that the signs for telic verbs in American Sign Language tended to employ sharp decelerations or changes in hand shape at some invisible boundary, while signs for atelic words often involved repetitive motions and an absence of such a boundary. Dr Wilbur believes that sign languages make grammatical that which is available from the physics and geometry of the world. “Those are your resources to make a language,” she says. As such, she went on to suggest that the pattern could probably be found in other sign languages as well.
    Work by Brent Strickland, of the Jean Nicod Institute, in France, and his colleagues, just published in the Proceedings of the National Academy of Sciences, now suggests that it is. Dr Strickland has gone some way to showing that signs arise from a kind of universal visual grammar that signers are working to.

    Fascinating. Humans associate language with intelligence to such a strong degree, I predict the critical moment in animal rights will come when a chimp or other monkey takes the stand in an animal testing court case and uses sign language to give testimony on their own behalf.

    Reading the test methodology employed in the piece, I wonder if any designers out there have done any similar studies with gestures or icons. I'm not arguing a Chomskyist position here; I doubt humans are born with some basic touchscreen gestures or base icon key in their brain's config file. This is more about second-order or learned intuition.

    Or perhaps we'll achieve great voice or 3D gesture interfaces (e.g. Microsoft Kinect) before we ever settle on any standards around gestures on flat touchscreens. If you believe, like Chomsky, that humans have some language skills (both verbal and gestural) hard-wired in the brain at birth, the most human (humane? humanist?) of interfaces would be one that doesn't involve any abstractions on touchscreens but instead rely on the software we're born with.

    Supposedly irrelevant factors

    There is a version of this magic market argument that I call the invisible hand wave. It goes something like this. “Yes, it is true that my spouse and my students and members of Congress don’t understand anything about economics, but when they have to interact with markets. ...” It is at this point that the hand waving comes in. Words and phrases such as high stakes, learning and arbitrage are thrown around to suggest some of the ways that markets can do their magic, but it is my claim that no one has ever finished making the argument with both hands remaining still. 
    Hand waving is required because there is nothing in the workings of markets that turns otherwise normal human beings into Econs. For example, if you choose the wrong career, select the wrong mortgage or fail to save for retirement, markets do not correct those failings. In fact, quite the opposite often happens. It is much easier to make money by catering to consumers’ biases than by trying to correct them. 
    Perhaps because of undue acceptance of invisible-hand-wave arguments, economists have been ignoring supposedly irrelevant factors, comforted by the knowledge that in markets these factors just wouldn’t matter. Alas, both the field of economics and society are much worse for it. Supposedly irrelevant factors, or SIFs, matter a lot, and if we economists recognize their importance, we can do our jobs better. Behavioral economics is, to a large extent, standard economics that has been modified to incorporate SIFs.

    Richard Thaler on behavioral economics. Again and again, studies have put cracks in the edifice of rational homo economicus.

    SIFs exist in product design, too. The myth of the rational utility-maximizing user can be just as pernicious and misleading an assumption. If it wasn't, we wouldn't need concepts like smart defaults in apps, the design equivalent of nudges like retirement savings programs that are opt out instead of opt in.