IRBs

Scott Alexander reviews a book about institutional review boards (IRBs), the panels that review the ethics of medical trials: From Oversight to Overkill, by Dr. Simon Whitney. From the title alone, you can see where this is going.

IRBs are supposed to (among other things) make sure patients are fully informed of the risks of a trial, so that they can give informed consent. They were created in the wake of some true ethical disasters, such as trials that injected patients with cancer cells (“to see what would happen”) or gave hepatitis to mentally defective children.

Around 1974, IRBs were instituted, and according to Whitney, for almost 25 years they worked well. The boards might be overprotective or annoying, but for the most part they were thoughtful and reasonable.

Then in 1998, during in an asthma study at Johns Hopkins, a patient died. Congress put pressure on the head of the Office for Protection from Research Risks, who overreacted and shut down every study at Johns Hopkins, along with studies at “a dozen or so other leading research centers, often for trivial infractions.” Some thousands of studies were ruined, costing millions of dollars:

The surviving institutions were traumatized. They resolved to never again do anything even slightly wrong, not commit any offense that even the most hostile bureaucrat could find reason to fault them for. They didn’t trust IRB members - the eminent doctors and clergymen doing this as a part time job - to follow all of the regulations, sub-regulations, implications of regulations, and pieces of case law that suddenly seemed relevant. So they hired a new staff of administrators to wield the real power. These administrators had never done research themselves, had no particular interest in research, and their entire career track had been created ex nihilo to make sure nobody got sued.

Today IRB oversight has become, well, overkill. For one study testing the transfer of skin bacteria, the IRB thought that the consent form should warn patients of risks from AIDS (which you can’t get by skin contact) and smallpox (which has been eradicated). For a study on heart attacks, the IRB wanted patients—who are in the middle of a heart attack—to read and consent to a four-page form of “incomprehensible medicalese” listing all possible risks, even the most trivial. Scott’s review gives more examples, including his own personal experience.

In many cases, it’s not even as if a new treatment was being introduced: sometimes an existing practice (giving aspirin for a heart attack, giving questionnaires to psychology patients) was being evaluated for effectiveness. There was no requirement that patients consent to “risks” when treatment was given arbitrarily; but if outcomes were being systematically observed and recorded, the IRBs could intervene.

Scott summarizes the pros and cons of IRBs, including the cost of delayed treatments or procedure improvements:

So the cost-benefit calculation looks like – save a tiny handful of people per year, while killing 10,000 to 100,000 more, for a price tag of $1.6 billion. If this were a medication, I would not prescribe it.

FDA

The IRB story illustrates a common pattern:

  • A very bad thing is happening.
  • A review and approval process is created to prevent these bad things. This is OK at first, and fewer bad things happen.
  • Then, another very bad thing happens, despite the approval process.
  • Everyone decides that the review was not strict enough. They make the review process stricter.
  • Repeat this enough times (maybe only once, in the case of IRBs!) and you get regulatory overreach.

The history of the FDA provides another example.

At the beginning of the 20th century, the drug industry was rife with shams and fraud. Drug ads made ridiculously exaggerated or completely fabricated claims: some claimed to cure consumption (that is, tuberculosis); another claimed to cure “dropsy and all diseases of the kidneys, bladder, and urinary organs”; another literally claimed to cure “every known ailment”. Many of these “drugs” contained no active ingredients, and turned out to be, for example, just cod-liver oil, or a weak solution of acid. Others contained alcohol—some in concentrations at the level of hard liquor, making patients drunk. Still others contains dangerous substances such as chloroform, opiates, or cocaine. Some of these drugs were marketed for use on children.

National Library of Medicine

In 1906, in response to these and other problems, Congress passed the Pure Food & Drug Act, giving regulatory powers to what was then the USDA Bureau of Chemistry, and which would later become the FDA.

This did not look much like the modern FDA. It had no power to review new drugs or to approve them before they went on the market. It was more of a police agency, with the power to enforce the law after it had been violated. And the relevant law was mostly concerned with truth in advertising and labeling.

Then in 1937, the pharmaceutical company Massengill put a drug on the market called Elixir Sulfanilamide, one of the first antibiotics. The antibiotic itself was good, but in order to produce the drug in liquid form (as opposed to a tablet or powder), the “elixir” was prepared in a solution of diethylene glycol—which is a variant of antifreeze, and is toxic. Patients started dying. Massengill had not tested the preparation for toxicity before selling it, and when reports of deaths started to come in, they issued a vague recall without explaining the danger. When the FDA heard about the disaster, they forced Massengill to issue a clear warning, and then sent hundreds of field agents to talk to every pharmacy, doctor, and patient and track down every last vial of the poisonous drug, ultimately retrieving about 95% of what had been manufactured. Over 100 people died; if all of the manufactured drug had been consumed, it might have been over 4,000.

In the wake of this disaster, Congress passed the 1938 Food, Drug, and Cosmetic Act. This transformed the FDA from a police agency into a regulatory agency, giving them the power to review and approve all new drugs before they were sold. But the review process only required that drugs be shown safe; efficacy was not part of the review. Further, the law gave the FDA 60 days to reply to any drug application; if they failed to meet this deadline, then the drug was automatically approved.

I don’t know exactly how strict the FDA was after 1938, but the next fifteen years or so were the golden age of antibiotics, and during that period the mortality rate in the US decreased faster than at any other time in the 20th century. So if there was any overreach, it seems like it couldn’t have been too bad.

The modern FDA is the product of a different disaster. Thalidomide was a tranquilizer marketed to alleviate anxiety, trouble sleeping, and morning sickness. During toxicity testing, it seemed to be almost impossible to die from an overdose of thalidomide, which made it seem much safer than barbiturates, which were the main alternative at the time. But it was also promoted as being safe for pregnant mothers and their developing babies, even though no testing had been done to prove this. It turned out that when taken in the first several weeks of pregnancy, thalidomide caused horrible birth defects that resulted in deformed limbs and other organs, and often death. The drug was sold in Europe, where some 10,000 infants fell victim to it, but not in the US, where it was blocked by the FDA. Still, Americans felt they had had a close call, too close for comfort, and conditions were ripe for an overhaul of the law.

The 1962 Kefauver–Harris Amendment required, among other reforms, that new drugs be shown to be both safe and effective. It also lengthened the review period from 60 to 180 days, and if the FDA failed to respond in that time, drugs would no longer be automatically approved (in fact, it’s unclear to me what the review period even means anymore).

You might be wondering: why did a safety problem create an efficacy requirement in the law? The answer is a peek into how the sausage gets made. Senator Kefauver had been investigating drug pricing as early as 1959, and in the course of hearings, a former pharma exec remarked that some drugs on the market are not only overpriced, they don’t even work. This caught Kefauver’s attention, and in 1961 he introduced a bill that proposed enhanced controls over drug trials in order to ensure effectiveness. But the bill faced opposition, even from his own party and from the White House. When Kefauver heard about the thalidomide story in 1962, he gave it to the Washington Post, which ran it on the front page. By October, he was able to get his bill passed. So the law that was passed wasn’t even initially intended to address the crisis that got it passed.

I don’t know much about what happened in the ~60 years since Kefauver–Harris. But today, I think there is good evidence, both quantitative and anecdotal, that the FDA has become too strict and conservative in its approvals, adding needless delay that holds back treatments from patients. Scott Alexander tells the story of Omegaven, a nutritional fluid given to patients with digestive problems (often infants) that helped prevent liver disease: Omegaven took fourteen years to clear FDA’s hurdles, despite dramatic evidence of efficacy early on, and in that time “hundreds to thousands of babies … died preventable deaths.” Alex Tabarrok quotes a former FDA regulator saying:

In the early 1980s, when I headed the team at the FDA that was reviewing the NDA for recombinant human insulin, … we were ready to recommend approval a mere four months after the application was submitted (at a time when the average time for NDA review was more than two and a half years). With quintessential bureaucratic reasoning, my supervisor refused to sign off on the approval—even though he agreed that the data provided compelling evidence of the drug’s safety and effectiveness. “If anything goes wrong,” he argued, “think how bad it will look that we approved the drug so quickly.”

Tabarrok also reports on a study that models the optimal tradeoff between approving bad drugs and failing to approve good drugs, and finds that “the FDA is far too conservative especially for severe diseases. FDA regulations may appear to be creating safe and effective drugs but they are also creating a deadly caution.” And Jack Scannell et al, in a well-known paper that coined the term “Eroom’s Law”, cite over-cautious regulation as one factor (out of four) contributing to ever-increasing R&D costs of drugs:

Progressive lowering of the risk tolerance of drug regulatory agencies obviously raises the bar for the introduction of new drugs, and could substantially increase the associated costs of R&D. Each real or perceived sin by the industry, or genuine drug misfortune, leads to a tightening of the regulatory ratchet, and the ratchet is rarely loosened, even if it seems as though this could be achieved without causing significant risk to drug safety. For example, the Ames test for mutagenicity may be a vestigial regulatory requirement; it probably adds little to drug safety but kills some drug candidates.

FDA delay was particularly costly during the covid pandemic. To quote Tabarrok again:

The FDA prevented private firms from offering SARS-Cov2 tests in the crucial early weeks of the pandemic, delayed the approval of vaccines, took weeks to arrange meetings to approve vaccines even as thousands died daily, failed to approve the AstraZeneca vaccine, failed to quickly approve rapid antigen tests, and failed to perform inspections necessary to keep pharmaceutical supply lines open.

In short, an agency that began in order to fight outright fraud in a corrupt pharmaceutical industry, and once sent field agents on a heroic investigation to track down dangerous poisons, now displays an overly conservative, bureaucratic mindset that delays lifesaving tests and treatments.

NEPA

One element in common to all stories of regulatory overreach is the ratchet: once regulations are put in place, they are very hard to undo, even if they turn out to be mistakes, because undoing them looks like not caring about safety. Sometimes regulations ratchet up after disasters, as in the case of IRBs and the FDA. But they can also ratchet up through litigation. This was the case with NEPA, the National Environmental Policy Act.

Eli Dourado has a good history of NEPA. The key paragraph of the law requires that all federal agencies, in any “major action” that will significantly affect “the human environment,” must produce a “detailed statement” on the those effects, now known as an Environmental Impact Statement (EIS). In the early days, those statements were “less than ten typewritten pages,” but since then, “EISs have ballooned.”

In brief, NEPA allowed anyone who wanted to obstruct a federal action to sue the agency for creating an insufficiently detailed EIS. Each time an agency lost a case, it set a new precedent and increased the standard that all future EISes had to follow. Eli recounts how the word “major” was read out of the law, such that even minor actions required an EIS; the word “human” was read out of the law, interpreting it to apply to the entire environment; etc.

Eli summarizes:

… the incentive is for agencies and those seeking agency approval to go overboard in preparing the environmental document. Of the 136 EISs finalized in 2020, the mean preparation time was 1,763 days, over 4.8 years. For EISs finalized between 2013 and 2017 , page count averaged 586 pages, and appendices for final EISs averaged 1,037 pages. There is nothing in the statute that requires an EIS to be this long and time-consuming, and no indication that Congress intended them to be.

Alec Stapp documents how NEPA has now become a barrier to affordable housing, transmission lines, semiconductor manufacturing, congestion pricing, and even offshore wind.

The EIS for NY state congestion pricing ran 4,007 pages and took 3 years to produce. @AidenRMackenzie

NRC

The problem with regulatory agencies is not that the people working there are evil—they are not. The problem is the incentive structure:

  • Regulators are blamed for anything that goes wrong.
  • They are not blamed for slowing down or preventing growth and progress.
  • They are not credited when they approve things that lead to growth and progress.

All of the incentives point in a single direction: towards more stringent regulations. No one regulates the regulators. This is the reason for the ratchet.

I think the Nuclear Regulatory Commission (NRC) furnishes a clear case of this. In the 1960s, nuclear power was on a growth trajectory to provide roughly 100% of today’s world electricity usage. Instead, it plateaued at about 10%. The proximal cause is that nuclear power plant construction became slow and expensive, which made nuclear energy expensive, which mostly priced it out of the market. The cause of those cost increases is controversial, but in my opinion, and that of many other commenters, it was primarily driven by a turbulent and rapidly escalating regulatory environment around the late ‘60s and early ‘70s.

At a certain point, the NRC formally adopted a policy that reflects the one-sided incentives: ALARA, under which exposure to radiation needs to be kept, not below some defined threshold of safety, but “As Low As Reasonably Achievable.” As I wrote in my review of Why Nuclear Power Has Been a Flop:

What defines “reasonable”? It is an ever-tightening standard. As long as the costs of nuclear plant construction and operation are in the ballpark of other modes of power, then they are reasonable.

This might seem like a sensible approach, until you realize that it eliminates, by definition, any chance for nuclear power to be cheaper than its competition. Nuclear can‘t even innovate its way out of this predicament: under ALARA, any technology, any operational improvement, anything that reduces costs, simply gives the regulator more room and more excuse to push for more stringent safety requirements, until the cost once again rises to make nuclear just a bit more expensive than everything else. Actually, it‘s worse than that: it essentially says that if nuclear becomes cheap, then the regulators have not done their job.

ALARA isn’t the singular root cause of nuclear’s problems (as Brian Potter points out, other countries and even the US Navy have formally adopted ALARA, and some of them manage to interpret “reasonable” more, well, reasonably). But it perfectly illustrates the problem. The one-sided incentives mean that regulators do not have to make any serious cost-benefit tradeoffs. IRBs and the FDA don’t pay a price for the lives lost while trials or treatments are waiting on approval. The EPA (which now reviews environmental impact statements) doesn’t pay a price for delaying critical infrastructure. And the NRC doesn’t pay a price for preventing the development of abundant, cheap, reliable, clean energy.

Google

All of these examples are government regulations, but a similar process happens inside most corporations as they grow. Small startups, hungry and having nothing to lose, move rapidly with little formal process. As they grow, they tend to add process, typically including one or more layers of review before products are launched or other decisions are made. It’s almost as if there is some law of organizational thermodynamics decreeing that bureaucratic complexity can only ever increase.

Praveen Seshadri was the co-founder of a startup that was acquired by Google. When he left three years later, he wrote an essay on “how a once-great company has slowly ceased to function”:

Google has 175,000+ capable and well-compensated employees who get very little done quarter over quarter, year over year. Like mice, they are trapped in a maze of approvals, launch processes, legal reviews, performance reviews, exec reviews, documents, meetings, bug reports, triage, OKRs, H1 plans followed by H2 plans, all-hands summits, and inevitable reorgs. The mice are regularly fed their “cheese” (promotions, bonuses, fancy food, fancier perks) and despite many wanting to experience personal satisfaction and impact from their work, the system trains them to quell these inappropriate desires and learn what it actually means to be “Googley” — just don’t rock the boat.

What Google has in common with a regulatory agency is that (according to Seshadri at least) its employees are driven by risk aversion:

While two of Google’s core values are “respect the user” and “respect the opportunity”, in practice the systems and processes are intentionally designed to “respect risk”. Risk mitigation trumps everything else. This makes sense if everything is going wonderfully and the most important thing is to avoid rocking the boat and keep sailing on the rising tide of ads revenue. In such a world, potential risk lies everywhere you look.

A “minor change to a minor product” requires “literally 15+ approvals in a ‘launch’ process that mirrors the complexity of a NASA space launch,” any non-obvious decision is avoided because it “isn’t group think and conventional wisdom,” and everyone tries to placate everyone else up and down the management chain to avoid conflict.

A startup that operated this way would simply go out of business; Google can get away with this bureaucratic bloat because their core ads business is a cash cow that they can continue to milk, at least for now. But in general, this kind of corporate sclerosis leaves a company vulnerable to changes in technology and markets (as indeed Google seems to be falling behind startup competitors in AI).

The difference with regulation is that there is no requirement for agencies to serve customers in order to stay in existence, and no competition to disrupt their complacency, except at the international level. If you want to build a nuclear plant, you obey the NRC or you build outside the US.

Against the review-and-approval model

In the wake of disaster, or even in the face of risk, a common reaction is to add a review-and-approval process. But based on examples such as these, I now believe that the review-and-approval model is broken, and we should find better ways to manage risk and create safety.

Unfortunately, review-and-approval is so natural, and has become so common, that people often assume it is the only way to control or safeguard anything, as if the alternative is anarchy or chaos. But there are other approaches.

One example I have discussed is factory safety in the early 20th century, which was driven by a change to liability law. The new law made it easier for workers and their families to receive compensation for injury or death, and harder for companies to avoid that liability. This gave factories the legal and financial incentive to invest in safety engineering and to address the root causes of accidents in the work environment, which ultimately reduced injury rates by around 90%.

Jack Devanney has also discussed liability as part of a better scheme for nuclear power regulation. I have commented on liability in the context of AI risk, and Robin Hanson wrote an essay with a proposal (see however Tyler Cowen’s pushback on the idea). And Alex Tabarrok mentioned to me that liability appears to have driven remarkable improvements in anesthesiology.

I’m not suggesting that that liability law is the solution to everything. I just want to point out that other models exist, and sometimes they have even worked.

Open questions

Some things I’d like to learn more about:

  • What areas of regulation have not fallen into these traps, or at least not as badly? For instance, building codes and restaurant health inspections seem to have helped create safety without killing their respective industries. Driver’s licenses seem to enforce minimal competence without preventing anyone who wants to from driving or imposing undue burden on them. Are there positive lessons we can learn from some of these boring examples of safety regulation that don’t get discussed as much?
  • What other alternative models to review-and-approval exist, and what do we know about them, either empirically or theoretically?
  • How does the Consumer Product Safety Commission work? From what I have gathered so far, they develop voluntary standards with industry, enforce some mandatory standards, ban a few extremely dangerous products, and manage recalls. They don’t review products before they are sold, but they do in at least some cases require testing. However, any lab can do the testing, which I imagine creates competition that keeps costs reasonable. (Labs testing children’s products have to be accredited by CPSC, but other labs don’t even need that.)
  • Why is there so much bloat in the contract research organizations (CROs) that run clinical trials for pharma? Shouldn’t there be competition in that industry too?
  • What lessons can we learn from other countries? All my research so far is about the US, and I want to get the proper scope.

Thanks to Tyler Cowen, Alex Tabarrok, Eli Dourado, and Heike Larson for commenting on a draft of this essay.

7

3 comments, sorted by Click to highlight new comments since: Today at 7:38 AM
New Comment

Nice post! Seems like a really important but unsexy thing to understand / improve regulation. It's a shame to be limited not by technical problems but by misaligned orgs. It's like stunted growth due to malnourishment

typo: products or launched or other --> launches?

Was supposed to be “before products are launched”. Fixed, thanks