Stoplights, Experimental Design, Evidence-Based Medicine, and the Downside of Correctness

The Freakonomics blog posted a letter from reader Jeffrey Mindich about an interesting traffic experiment in Taiwan. Timers were installed alongside red and green traffic lights:

At 187 intersections which had the timers installed, those that counted down the remaining time on green lights saw a doubling in the number of reported accidents . . . while those that counted down until a red light turned green saw a halving in . . . the number of reported accidents.

Great research! Unexpected results. Simple, easy-to-understand design. Large effects — to change something we care about (such as traffic accidents) by a factor of two in a new way is a great accomplishment. This reveals something important — I don’t know what — about what causes accidents. I expect it can be used to reduce accidents in other situations.

It’s another example (in addition to obstetrics) of what I was talking about in my twisted skepticism post — the downside of “correctness”. There’s no control group, no randomization (apparently), yet the results are very convincing (that adding the timers caused the changes in accidents). The evidence-based medicine movement says treatment decisions should be guided by results from controlled randomized trials, nothing less. This evidence would fail their test. Following their rules, you would say: “This is low-quality evidence. Controlled experiment needed.” The Taiwan evidence is obviously very useful — it could lead a vast worldwide decrease in traffic accidents — so there must be something wrong with their rules, which would delay or prevent taking this evidence as seriously as it deserves.

14 thoughts on “Stoplights, Experimental Design, Evidence-Based Medicine, and the Downside of Correctness

  1. I had not heard that someone (anyone?) was claiming that “nothing” should be put into practice without RCTs. The traffic light example seems like a straw man. We routinely *stop* RCTs when a large effect is seen. We routinely make bad medical decisions when small effects that are unvetted by RCTs are the goal.

    Use of EPO to treat chemo-induced anemia was approved, in part, because no large enough negative effect on cancer outcome was noticed. Only RCTs could show that there is indeed good reason to suspect that EPO worsens the outcome for some cancer patients that receive it.

    Likewise, the effect of HRT on heart disease and cancer risk seemed small but compelling (due to the N of the uncontrolled studies done). Only an RCT was able to show that the heart benefit was overestimated and the cancer risk underestimated. That RCT, of course, did not answer all questions about HRT, but it did answer the “first, do no harm” question.

    It’s hard to forget that one of the first applications of statistics to medical treatments was to inspect the efficacy of “bleeding” patients (i.e., cutting, or leaches). When the results showed that “bleeding” was not at all correlated with outcome, what did the study author do? He concluded that the study must be wrong, and that we must bleed patients earlier and more aggressively. Such is the power of subjective experience and expectation over objective, statistical correctness.

    Like Feynman said, the first order of business is not to fool ourselves. When medical effects or benefits are small or subjective, it’s hard to point to any tool more successful at preventing self-fooling than the RCT. Large, non-subjective effects are obviously less likely to be proved false by an RCT. However, it appears that at this point in history, a large percentage of new medical practice is focussed more in the area of the smaller, more subjective effect — hence there is no shortage of places where RCTs continue to be required to avoid fooling ourselves and hurting patients. If you want to see what medical practice looks like without RCTs, look no further than the snake pit that is back surgery to relieve pain today.

  2. I’d also argue that the study in Taiwan very well could have elements of a control group and randomization. There are many more than 187 intersections in Taiwan and one could easily consider deciding which intersections to place the countdown timers on randomly and comparing them to a control group of similar intersections.

    Great research in that it was a roll-out to test something (timers) that were believed to be helpful and the analysis was done to determine that there were unexpected results, but I think the RCT argument in this case is a bit of a stretch.

  3. No control group? Where did the baseline from the “doubling in the number” come from? Presumably the control is the long period of time for which data exists and there was no countdown. This assumes that the base rate hasn’t changed because drivers are confused as to which type of countdown might be at any particular traffic light.

    I think this is a neat result, and it would be nice to see the total cost of the study. Especially given the uncompelling data cited for red light cameras, it would be a breath of fresh air to hear motions for wider tests, if the accident rate changes are pretty uniform.

    I really don’t know where you’re coming from here. To the extent you and I find the study “convincing”, I’m sure it can be traced to intuition leached from proper statistical theory (fairly large N, control data, and no stated bias in randomization). To the extent that it is not iron-clad, I’m sure there are traditional objections.

  4. NE1 writes “No control group? Where did the baseline from the “doubling in the number” come from? Presumably the control is the long period of time for which data exists and there was no countdown.”

    I agree, and furthermore, though I’ve never been to Taiwan, I bet they have a hell of a lot more than 187 intersections with traffic lights, so the remaining intersections that didn’t receive timers also serve as controls (assuming there is a record of the accident rate at those intersections as well).

  5. NE1 & Aaron, yes, the baseline came from earlier measurements of the same intersections.

    “Like Feynman said, the first order of business is not to fool ourselves.” I think this gets to the heart of the matter. I would prefer that medical research concentrate on maximizing the benefits of treatment rather than on minimizing the extent to which doctors fool themselves.
    I agree that RCTs are often helpful. It’s when non-RCT evidence is ignored or dismissed or not taken seriously that the trouble begins.

    “I had not heard that someone (anyone?) was claiming that “nothing” should be put into practice without RCTs.” A high-level panel making nutrition recommendations ignored non-RCT evidence. For details, see
    https://sethroberts.org/2007/08/08/something-is-better-than-nothing-part-2/

  6. A broader political point is the question whether the NHS/medicare/whatever it’s called in one’s country should be expected to pay only for those treatments which have been shown to work by the most rigorous scientific standards. I think it’s not hard to make the argument that the answer is “yes”.

    Of course, this is distinct from the question whether scientists should disregard evidence that doesn’t live up to those standards.

  7. Taking back science – “Ideas are tested by experiment. That is the core of science. By teaching people to hold their beliefs up to experiment, Mythbusters is doing mere to drag humanity out of the unscientific darkness than a thousand lessons in rigor.” – Zombie Fenyman

  8. I’m not saying don’t test ideas. I’m talking about the nature of the test. Different purposes suggest different types of test. If your goal is maximizing patient benefit, you do Test X; if it is “not fooling yourself” you do Test Y.

  9. If traffic in Taiwan is anything like that of mainland China, then these results don’t surprise me at all. People drive a lot differently over there with a great deal less observance of “traffic laws” and consideration of safety.

    I would guess that it’s common to treat a red light more like a stop sign when you don’t know how long it’ll last. With the timer, knowing that it’ll turn green in X seconds, you may be more likely to wait. Similarly, I can imagine that drivers would speed up to get through the intersection on seeing that only 2 seconds of green are left — higher speed –> more accidents.

  10. Ummm…. is there something wrong with not wanting to fool yourself? I mean, the whole point of experimental design is to guard against fooling yourself into believing that your treatment is responsible for the supposed patient benefit observed.

  11. I am actually ok with being labelled conservative if that means I am hesitant to update my beliefs based on a poorly designed experiment or an observational study. I do not deny that such studies have the potential to find interesting, and real, results, but the quality of the data, study design, and analysis, should all affect how and to what extent I update my beliefs.

    In regards to the traffic study, I am particularly concerend with your statement, “There’s no control group, no randomization (apparently), yet the results are very convincing (that adding the timers caused the changes in accidents). ” Aside from the fact that you made a causal inference about the result seen, I don’t know how you can seemingly care so little about the design of the experiment…. you seem to not know anything about the design, and the scary part is that you don’t seem to care!

    I know of a great way to “reduce” the number of traffic accidents. Place the timers at intersections which had the highest accident rates the previous year. You are almost guaranteed to observe lower accident rates the following year. This is a simple example of regression to the mean.

  12. “You don’t seem to care”. Please see my posts on experimental design and scientific method — e.g., interviews with Brian Wansink and Saul Sternberg.

    The broad point I am making is following or advocating rules without understanding them, including their weaknesses, causes trouble. Randomization is good but other schemes may be good enough.

Leave a Reply

Your email address will not be published. Required fields are marked *