Web Browsers, Black Swans and Scientific Progress

A month ago, I changed web browsers from Firefox to Chrome (which recently became the most popular browser). Firefox crashed too often (about once per day). Chrome crashes much less often (once per week?) presumably because it confines trouble caused by a bad tab to that tab. ”Separate processes for each tab is EXACTLY what makes Chrome superior” to Firefox, says a user. This localization was part of Chrome’s original design (2008).

After a few weeks, I saw that crash rate was the only difference between the two browsers that mattered. After a crash, it takes a few minutes to recover. With both browsers, the “waiting time” distribution — the distribution of the time between when I try to reach a page (e.g., click on a link) and when I see it — is very long-tailed (very high kurtosis). Almost all pages load quickly (< 2 seconds). A few load slowly (2-10 seconds). A tiny fraction (0.1%?) cause a crash (minutes). The Firefox and Chrome waiting-time distributions are essentially the same except that the Chrome distribution has a thinner tail. As Nassim Taleb says about situations that produce Black Swans, very rare events (in this case, the very long waiting times caused by crashes) matter more (in this case, contribute more to total annoyance) than all other events combined.

Curious about Chrome/Firefox differences, I read a recent review (“Chrome 24 versus Firefox 18 — head to head”). Both browsers were updated shortly before the review. The comparison began like this:

Which browser got the biggest upgrade? Who’s the fastest? The safest? The easiest to use? We took a look at Chrome 24 and Firefox 18 to try and find out.

Not quite. The review compared the press releases about the upgrades. It said nothing about crash rate.

Was the review superficial because the reviewer wasn’t paid enough? If so, Walt Mossberg, the best-paid tech reviewer in the world, might do a good review. The latest browser review by Mossberg I could find (2011) says this about “speed”:

I found the new Firefox to be snappy. . . . The new browser didn’t noticeably slow down for me, even when many tabs were opened. But, in my comparative speed tests, which involve opening groups of tabs simultaneously, or opening single, popular sites, like Facebook, Firefox was often beaten by Chrome and Safari, and even, in some cases, by the new version 9 of IE . . . These tests, which I conducted on a Hewlett-Packard desktop PC running Windows 7, generally showed very slight differences among the browsers.

No mention of crash rate, the main determinant of how long things take. Mossberg ignores it — the one difference between Chrome and Firefox that really matters. He’s not the only one. As far as I can tell, all tech reviewers have failed to measure browser crash rate. For example, this review of the latest Firefox. ”I’m still a big Firefox fan,” says the reviewer.

Browser reviews are a small example of a big rule: People with jobs handle long-tailed distributions poorly. In the case of browser reviews, the people with jobs are the reviewers; the long-tailed distribution is the distribution of waiting times/annoyance. Reviewers handle this distribution badly in the sense that they ignore tail differences, which matter enormously.

Another browser-related example of the rule is the failure of the Mozilla Foundation (people with jobs) to solve Firefox’s crashing problem. My version of Firefox (18.0.1) crashed daily. Year after year, upgrade after upgrade, people at Mozilla failed to add localization. Their design is “crashy”. They fail to fix it. Users notice, change browsers. Firefox may become irrelevant for this one reason. This isn’t Clayton Christensen’s “innovator’s dilemma”, where industry-leading companies become complacent and lose their lead. People at Mozilla have had no reason to be complacent.

Examples of the rule are all around us. Some are easy to see:

1. Taleb’s (negative) Black Swans. Tail events in long-tailed distributions often have huge consequences (making them Black Swans) because their possibility has been ignored or their probability underestimated. The system is not designed to handle them. All of Taleb’s Black Swans involve man-made systems. The financial system, hedge funds, New Orleans’s levees, and so on. These systems were built by people with jobs and react poorly to rare events (e.g., Long Term Capital Management). Taleb’s anti-fragility is what others have called hormesis. Hormesis protects against bad rare events. It increases your tolerance, the dose (e.g., the amount of poison) needed to kill you. As Taleb and others have said, many complex systems (e.g., cells) have hormesis. All of these systems were fashioned by nature, none by people with jobs. No word means anti-fragile, as Taleb has said, because there exist no products or services with such a property. (Almost all adjectives and nouns were originally created to describe products and services, I believe. They helped people trade.) No one wanted to say buy this, it’s anti-fragile. Designers didn’t (and still don’t) know how to add hormesis. They may even be unaware the possibility exists. Products are designed by people with jobs. Taleb doesn’t have a job. Grasping the possibility of anti-fragility — which includes recognizing that tail events are underestimated — does not threaten his job or make it more difficult. If a designer tells her boss about hormesis her boss might ask her to include it.

2. The Boeing 787 (Dreamliner) has had battery problems. The danger inherent in use of a lithium battery has a long-tailed distribution: Almost all uses are safe, a very tiny fraction are dangerous. In spite of enormous amounts of money at stake, Boeing engineers (people with jobs) failed to devise adequate battery testing and management. The FAA (people with jobs) also missed the problem.

3. The designers of the Fukushima nuclear power plant (people with jobs) were perfectly aware of the possibility of a tsunami. They responded badly (did little or nothing) when their assumptions about tsunami likelihood were criticized. The power of the rule is suggested by the fact that this happened in Japan, where most things are well-made.

4. Drug companies (people with jobs) routinely hide or ignore rare side effects, judging by the steady stream of examples that come to light. An example is the tendency of SSRIs to produce violence, including suicide. The whole drug regulatory system (people with jobs) seems to do a poor job with rare side effects.

Why is the rule true? Because jobs require steady output. Tech reviewers want to write a steady stream of reviews. The Mozilla Foundation wants a steady stream of updates. Companies that build nuclear power plants want to build them at a steady rate. Boeing wants to introduce new planes at a steady rate. Harvard professors (criticized by Taleb) want to publish regularly. At Berkeley, when professors come up for promotion, they are judged by how many papers they’ve written. Long-tailed distributions interfere with steady output. To seriously deal with them you have to measure the tails. That’s hard. Adding hormesis (Nature’s protection against tail events) to your product is even harder. Testing a new feature to learn its effect on tail events is hard.

This makes it enormously tempting to ignore tail events. Pretend they don’t exist, or that your tests actually deal with them. At Standard & Poor’s, which rated all sorts of financial instruments, people in charge grasped that they were doing a bad job modelling long-tailed distributions and introduced new testing software that did a better job. S & P employees rebelled: We’ll lose business. Too many products failed the new tests. So S & P bosses watered down the test: “If the transaction failed E3.0, then use E3Low [which assumes less variance].” Which test (E3.0 or E3Low) was more realistic? The employees didn’t care. They just wanted more business.

It’s easy to rationalize ignoring tail events. Everyone ignores them. Next tsunami, I’ll be dead. The real reason they are ignored is that if your audience is other people with jobs (e.g., a regulatory agency, reviewers for a scholarly journal, doctors), it will be easy to get away with ignoring them or making unrealistic assumptions about them. Tail events from long-tailed distributions make a regulator’s job much harder. They make a doctor’s job much harder. If doctors stopped ignoring the long tails, they would have to tell patients That drug I just prescribed — I don’t know how safe it is. The hot potato (unrealistic risk assumptions) is handed from one person to another within a job-to-job system (e.g., drug companies market new drugs to the FDA and to doctors) but eventually the hot potato (or ticking time bomb) must be handed outside the job-to-job system to an ordinary Person X (e.g., a doctor prescribes a drug to a patient). It is just one of many things that Person X buys. He doesn’t have the time or expertise to figure out if what he was told about risk (the probability of very bad very rare events) is accurate. Eventually, however, inaccurate assumptions about tail events may be exposed when people without jobs related to the risk (e.g., parents whose son killed himself after taking Prozac, everyone in Japan, airplane passengers who will die in a plane crash) are harmed. Such people, unlike people with related jobs, are perfectly free to complain and willful ignorance may come to light. In other words, doctors cannot easily complain about poor treatment of rare side effects (and don’t), but patients and their parents can (and do).

There are positive Black Swans too. In some situations, the distribution of benefit has a very long-tailed distribution. Almost all events in Category X produce little or no benefit, a tiny fraction produce great benefit. One example is scientific observations. Almost all of them have little or no benefit, a very tiny fraction are called discoveries (moderate benefit), and a very very tiny fraction are called great discoveries (great benefit). Another example is meeting people. Almost everyone you meet — little or no benefit. A tiny fraction of people you meet — great benefit. A third example is reading something. In my life, almost everything I’ve read has had little or no benefit. A very tiny fraction of what I’ve read has had great benefits.

I came to believe that people with jobs handle long-tailed distributions badly because I noticed that jobs and science are a poor mix. My self-experimentation was science, but it was absurdly successful compared to my professional science (animal learning research). I figured out several reasons for this but in a sense they all came down to one reason: my self-experimentation was a hobby, my professional science was a job. My self-experimentation gave me total freedom, infinite time, and commitment to finding the truth and nothing else. My job, like any job, did not. And, as I said, I saw that scientific progress per observation had a power-law-like distribution: Almost all observations produce almost no progress, a tiny fraction produce great progress.

It is easy enough for scientists to recognize the shape of the distribution of progress per observation but, if you don’t actually study the distribution, you’re not going to have much of an understanding. Professional scientists ignore it. Thinking about it would not help them get grants and churn out papers. (Grants are given by people with jobs, who also ignore the distribution.) Because they don’t think about it, they have no idea how to change the “slope” of the power-law distribution (such distributions are linear on log-log coordinates). In other words, they have no idea how to make rare events more likely. Because it is almost impossible to notice the absence of very rare events (the great discoveries that don’t get made), no one notices. I seem to be the only one who points out that year after year, the Nobel Prize in Physiology/Medicine indicates lack of progress on major diseases. When I was a young scientist, I wanted to learn how to make discoveries. I was surprised to find that everything written on the topic — which seemed pretty important — was awful. Now I know why. Everything on the topic was written by a person with a job.

With long-tailed distributions of benefit, there is nothing like hormesis. If any organism has evolved something to improve long-tailed distributions of benefit, I don’t know what it is. Our scientific system handles the long-tailed distribution of progress poorly in two ways:

1. The people inside it, such as professional scientists, do a poor job of increasing the rate of progress, i.e., making the tails thicker. I think you can make the tails thicker via subject-matter knowledge (Pasteur’s “chance favors the prepared mind”), methodological knowledge (better measurements, better experiments, better data analysis), and novelty. Professional scientists understand the value of the first two factors, but they ignore the third. They like to do the same thing over and over because it is safer. Great for their careers, terrible for the rest of us.

2. When an unlikely observation comes along, the system is not set up to develop it. An example is Galvani’s discovery of galvanism, which led to batteries, which led to widespread electricity. This one discovery, from one observation, arguably produced more progress than all scientific observations in the last 100 years. Galvani’s job (surgery research) left him unable to go further with his discovery. (“Galvani had certain commitments. His main one was to present at least one research paper every year at the Academy.”) His research job left him unable to develop one of the greatest discoveries of all time. In contrast, Darwin (no job) was able to develop the observations that led to his theory of evolution. It took him 18 years to write one book, longer than any job would have allowed. He wouldn’t have gotten tenure at Berkeley.

After a discovery has been made, the shape of the benefit distribution changes. It becomes more Gaussian, less long-tailed. As our understanding increases, science becomes engineering, which becomes design, which becomes manufacturing. Engineering and design and making things fit well with having a job. Take my chair. Every time I use it, I get a modest benefit, always about the same size. Every time I use my pencil, I get a modest benefit, always about the same size. No long-tailed distribution.

Modern science works well as a way of developing discoveries, not making them. An older system was better for encouraging discovery. Professors mainly taught. Their output was classes taught. They did a little research on the side. If they found something, fine, they had enough expertise to publish it, but nothing depended on their rate of publication. Mendel was expert enough to write up his discoveries but his job in no way required him to do so. Just as Taleb recommends most of your investments should be low-risk, with a small fraction high-risk, this is a “job portfolio” where most of the job is low benefit with high certainty and a small fraction of the job is high benefit with low certainty. In the debate over climate change (is the case that humans are dangerously warming the planet as strong as we’re told?) it is striking that everyone with any power on the mainstream side of the debate (scientists, journalists, professional activists) has a job involving the subject. Everyone on the other side with any power (Stephen McIntyre, Bishop Hill, etc.) does not. People without jobs are much more free to speak the truth as they see it.

We need personal science (using science to help yourself) to better handle long-tailed distributions, but not just for that reason. Jobs disable people in other ways, too. Personal science matters, I’ve come to believe, for three reasons.

1. Personal scientists can make discoveries that professional scientists cannot. The Shangri-La Diet is one example. Tara Grant’s discovery of the effect of changing the time of day she took Vitamin D is another. For all the reasons I’ve said.

2. Personal scientists can develop discoveries that professional scientists cannot. Will there be a clinical trial of the Shangri-La Diet (by a professional weight-control researcher) in my lifetime? Who knows. It is so different from what they now believe. (When I applied to the UC Berkeley Animal Care and Use Committee for permission to do animal tests of SLD, I was turned down. It couldn’t possibly be true, said the committee.) Long before that, the rest of us can try it for ourselves and tell others what happened.

3. By collecting data, personal scientists can help tailor any discovery, even a well-developed one, to their own situation. For example, they can make sure a drug or a diet works. (That’s how my personal science started — testing an acne medicine.) They can test home remedies. By tracking their health with sensitive tests, they can make sure a prescribed drug has no bad side effects. Individualizing treatments takes time, which gets in the way of steady output. You have all the time in the world to gather data that will help you be healthy. Your doctor doesn’t. People who have less contact with you than your doctor, such as drug companies, insurance companies, medical school professors and regulatory agencies, are even less interested in your special case.

22 thoughts on “Web Browsers, Black Swans and Scientific Progress

  1. Whenever Chrome is slow for me, I get a new tab, do Shift-ESC, and find the process that is running slowly. (Mysteriously, it is usually in control of _several_ tabs, so when I End Process it kills all of them. I thought the point of Chrome was to fully separate these.) Anyway it means I never crash Chrome, and I never wait minutes — if something doesn’t load in 5 sec or so, I End its Process and load the pages one by one, pretty instantly.

  2. First of all, thank you Seth, for that article. “All of these systems were fashioned by nature, none by people with jobs.” – quote of the day! As an MD, I rather give people useful advice on what changes to make and what to measure instead of prescribing stuff.

    Elizabeth, Roger: Please read the introduction to Taleb’s Anti-Fragile. Fragile means losing from disorder (entropy). Robust, durable, strong means NOT losing from disorder. Anti-fragile means gaining from disorder. That’s the main point.

  3. Whoops, just read the last comment. Okay. Still think resilient is a candidate.

    Seth: derp is right. Anti-fragile means gains from disorder. Resilient, hardy, stable mean doesn’t lose from disorder. I suppose Taleb was slightly wrong in the sense that hormetic means something pretty close to “gains from disorder”. Hormetic things (= things that show hormesis) gain from disorder — up to a point. But they are the only anti-fragile things that exist. I read the book before publication. Blame me for not point this out at the time.

  4. One of the most fascinating examples of this to me is incubators and who they choose to fund.

    What’s interesting is that the only reason incubators can exist is because the few black swans that account for their returns. Most companies suck, and a few fall in the “succeed, but don’t make us money” category. Only a fraction do well enough to make the incubator money.

    Here’s Paul Graham, who runs one of the most successful incubators in the world, talking about how they specifically DON’T pick an investment strategy that would make them the most money, because they want a steady stream of “pretty good” companies to look good at their job, rather than picking a strategy that would be more successful, but would cause them to look bad.

    https://paulgraham.com/swan.html

    Seth: That’s an excellent example.

  5. I’ve read your complaints about the professionalization of science many times, but it just occurred to me that they make a prediction: Why don’t scientists at liberal arts colleges make all the breakthrough discoveries? They’re like the olde-tyme professors you describe — they have the training of professional scientists, but their advancement is determined by teaching success, with discovery viewed as a nice bonus.

    I can think of a few reasons:

    (1) They actually are making the breakthroughs (unlikely, but possible
    (2) They actually have similar publication incentives as scientists at research universities (possible, especially at higher-tier colleges)
    (3) Only bad scientists end up at liberal arts colleges (doubtful — faculty hiring is extremely competitive)
    (4) The teaching load at most liberal arts colleges is incompatible with any time-consuming hobby, including science (possible; 3-4 courses per semester is a lot, and they don’t have grad students to do the grunt work)
    (5) They are trained by professional scientists and have imported that worldview — they see a teaching career as hamstringing, not liberating, their ability to do what they think of as science (my guess for Seth’s preferred answer)

    Seth: One thing to keep in mind: liberal-arts colleges don’t cover many subjects. For example, much of my research is about the intersection of nutrition and psychology. Liberal-arts schools don’t have nutrition departments.

  6. I ask from ignorance: do such colleges have the quality and quantity of research lab space, equipment and technical support that one would need? I don’t mean that they must equal the ever-growing empires of the top research universities, but there may be some ill-defined minimum that’s necessary and which these colleges don’t provide.

    On the other hand: in Britain the funding bodies used to chunter about “critical mass” in research, an evident ploy to shut down any small research groups (or individuals) that might flourish. I say “ploy” because I was once at a meeting where a bigshot bureaucrat was explaining this business and when I asked him about the evidence for his critical mass premise he could produce none. That’s “none” as in zero – not even a little bit. I refrained from pointing out that his reasoning was therefore without weight, because the joke would have gone right over his head. Scientists are happy to conspire with such oafs because it’s the oafs who direct the funding for the scientists’ dreams of glory.

    Seth: At Reed College, in psychology, it seemed to me that the professors had enough resources to do most research. But psychology might be the lowest-tech laboratory science.

  7. A related issue is ‘educated incapacity.’

    I also often use the phrase to describe the limitations of the expert—or even of just the “well educated.” The more expert—or at least the more educated—a person is, the less likely that person is to see a solution when it is not within the framework in which he or she was taught to think. When a possibility comes up that is ruled out by the accepted framework, an expert—or well-educated individual—is often less likely to see it than an amateur without the confining framework. For example, one naturally prefers to consult a trained doctor than an untrained person about matters of health. But if a new cure happens to be developed that is at variance with accepted concepts, the medical profession is often the last to accept it. This problem has always existed in all professions, but it tends to be accentuated under modern conditions.

    https://www.hudson.org/index.cfm?fuseaction=publication_details&id=2219

    Seth: I agree, that is another important way that jobs (or job training) are disabling.

  8. Outstanding post, Seth.

    As others have alluded to, Taleb makes a big deal out of the fact that “robust” is not the opposite of “fragile”. That would be like saying that “bland” is the opposite of “delicious”, where in fact “foul-tasting” is really the opposite of “delicious”.

  9. A lot of interesting points.

    News media (especially if on paper) suffer from the job problem on the production side, too. They need to produce a certain amount of “news” per time period, regardless of how much is actually happening.

    I haven’t been able to find a useful review of back-up software. It’s easy to find reviews of how easy it is to get the software to make backups (the job), but I haven’t seen any reviews of how well the software restores (the black swan).

    Have you read Root-Bernstein? (_Discovering_, _Sparks of Genius_) If I remember correctly, he says that great scientists pay a lot of attention to their tools so they can be sure they’re measuring what they think they’re measuring, and great discoveries tend to come from trying to solve a practical problem which is somewhat outside their specialty. He probably has more that I don’t remember.

    Seth: News media is a very good example. They are so clearly disabled in other ways by their job, not just by the need for steady output. There are many things they can’t say no matter what the truth is. I have read Discovering by Root-Bernstein but not Sparks of Genius. Discovering is one of the books I was thinking of when I said the advice about how to make discoveries was poor. Discoveries has about 10 or 20 pieces of advice that don’t fit together well. None of which struck me as useful or explained anything I noticed myself. Sparks of Genius might be better, I’ll look for it.

  10. Excellent post. Have you written anywhere a complete article summarizing your arguments for how the institutionalization of science harms its progress?

    Have you read Bruce Charlton’s “The Story of Real Science”? He gives reasons, some similar to yours, some different, for why he thinks science is dying in western civilization.
    https://thestoryofscience.blogspot.com/

    Seth: A complete article, no. That’s a good suggestion. I didn’t know about Bruce Charlton’s “Story of Real Science”, thanks for pointing it out.

  11. Firefox crashing daily? Either your hardware is defective or you’re doing something way too extreme (50+ tabs open at the same time, 20+ youtube videos playing at the same time or something like that).

    As I see, on my home machine I had 19 crashes since Jul 2010, last one was 1 Dec 2012. Previously most of the crashes were caused by buggy plug-ins, but once they were externalized into an independent process they no longer crash the browser.

    Seth: I doubt it’s my hardware since crashes became much rarer when I switched to Chrome. I never have 50 tabs open; I often get to 10. I rarely play more than 1 youtube video at a time. I think there must be a third explanation. Maybe the explanation is that lots of websites have buggy software, but I’m not sure. Maybe you have more memory than me. Maybe it has something to do with being in China. In any case, localization was a feature of Chrome — a difficult feature to add — from the beginning for this very reason: because crashes happen pretty often.

    How do you explain the increasing popularity of Chrome at the expense of Firefox? Crash rate is the only big difference between them that I’ve noticed.

  12. Seth, any thoughts about how more people could have the free time to do independent work?

    Seth: I think this is already happening. As people become more productive, they need to work less to produce enough money. Less work, more free time.

  13. Seth, I’d still run a memtest. It does not take much time (overnight) and it can eliminate or confirm the diagnosis. Firefox and Chrome do have a different memory footprint and memory allocation, so it is very possible that one hits a defective memory cell more often than another, causing more crashes.

    Seth: I did a memory test. No problems were found.

  14. I agree wtih the previous posters – this crash rate if either browser is definitely not normal, I would recommend running a stress-test on your hardware (OCCT – https://www.ocbase.com – is free, popular, and offers several mixes of CPU/memory tests). Also, AdBlock Plus plugin could help a lot, since most buggy code is some kind of intrusive advertising.

    Seth: I’m curious: how do you know what crash rate is normal? If crash rate is usually very low — as you seem to imply — why did the designers of Chrome include localization from the beginning?

  15. Taking advantage of multiple processor cores, security, and of course minimizing impact of any crashes – this does not imply that they are expected to be frequent, even rare crashes could be annoying if they take down the whole application. Normal crash rate = what I observe across multiple computers and users. I personally use Chrome all day, every day on two different computers, visit a lot of “suspicious” sites and experience several crashes a year at most. Maybe you just have some atypical browsing habits, maybe there is a problem in some of your plugins (Flash), but frequent crashes could indicate a hardware problem – I had the same problem when I built a new computer; one of my memory sticks was faulty (as diagnosed by Memtest86+), the problem went away when I replaced it.

  16. Chrome is actually going down in market share by some counts. Firefox 18 and 19 have been noticeably more crashy than 17 IME, but there’s also a long-term supported release of 17 if you prefer stability. They did have to be berated to produce it. Chrome also wants a steady rate of updates, so I’m not sure why you hold that against Firefox.

    Mozilla did investigate using separate processes to display the browser UI, web content, and plugins, but it was put on hold because it was a giant undertaking and they decided to deliver some smaller gains first instead, including memory usage (How much memory does your Chrome use?) They are starting work on it again now, starting from the mobile version first.

    Your use of localization conflicts with its standard use in computing, which is adapting an interface to another language, primarily by string translation.

Leave a Reply

Your email address will not be published. Required fields are marked *