September 2023 – EconForEverybody

The Paperclip on Dev Anand and Guru Dutt

Today is the 100th birth anniversary of #DevAnand and we take a look back at how a launderer’s mistake helped shape a friendship between two of Hindi cinema’s most unique and innovative minds – a thread (1/21) pic.twitter.com/PVsWXvpRzC
— The Paperclip (@Paperclip_In) September 26, 2023

On Learning Languages

The Economist has a nice little write-up on learning languages, and which ones take the longest to learn:

The difficulty in learning a foreign language lies not only in its inherent complexity. Languages are complex in different ways (though all are learnable by infants). The main reason a language is hard is that it is different from your own.
https://www.economist.com/graphic-detail/2023/09/18/which-languages-take-the-longest-to-learn

What was most fascinating to me was this chart (why should this chart be titled “My Aunt’s Pen” in French? The Economist has some truly enjoyable puns, but this one is a mystery to me):

https://www.economist.com/graphic-detail/2023/09/18/which-languages-take-the-longest-to-learn

I speak three languages fluently: Hindi, Marathi and English. I can now read French, and can write it reasonably well. But my attempts at speaking French have so far been consistently disastrous. I have spent some time on Duolingo learning German and Italian as well, and this bit resonated:

“If you want to learn a language just for fun, start with Swedish. If you want to rack up an impressive number, stay in Europe.”

Quite a few words were common in these three languages, and as the Economist article points out, there are some pleasant surprises that await given the existence of the proto-Indo-European language. My favorite example is the word “himbeer” which means raspberry in German. Literally the berry that is available in winter/snow (“him”). Once you make the connection with the Himalayas, many other delights await.

And that takes us to the excellent Duolingo blog, which comes up with some very informative posts. One of these in the recent past, for example, dealt with the question of which words (if any) are the same across many (if not all) languages.

Before you read on, spend some time in trying to figure out if there are any such words.

So there’s pineapple, or ananas. The same blogpost also tells us that at one point of time, the word apple was a generic word for any fruit. Which, I suppose, makes the fact that potatoes in France are called the “apple of the land” slightly less puzzling. Oranges almost makes the list, as do “taxi” and “tomato“.

But the two words that were absolute champions?

Coffee, and chocolate. There’s a fun follow-up post based on community feedback, if you’re interested. But a fun way to learn more about the world we live in is to learn as many languages as possible. It expands your vocabulary, allows you to make connections across languages (and therefore cultures), and provides the kind of exercise for your brain that probably won’t come from any other endeavor.

Bonus: an immensely enjoyable blog post about languages, colors, and specifically the color blue.

Showernomics

One of my favorite blog posts about behavioral economics was from the year 2017. Maya Bar-Hillel and Cass Sunstein co-wrote it, based on their experiences of having traveled to Stockholm in that year. They were there to celebrate the fact that their colleague, Richard Thaler, had won the Nobel Prize in Economics.

And so naturally, they wrote a blog post about light switches, bathtubs, guardrails and showers. Of course.

The post was about design choices and nudges, and makes for fascinating reading.

The Nobel Award ceremonies in Stockholm in December are a grand affair. Lodging at the Grand Hotel is part and parcel of the grandeur. We enjoyed this privilege in December of 2017, when Richard H. Thaler won the Nobel Prize for his “Contributions to Behavioral Economics”. But this was not an unqualifiedly happy hotel experience. Through a critique of the hotel’s bathroom design, we address a pervasive and even fundamental challenge in everyday life: navigability.
One of Thaler’s best-known and most influential contributions was developed with one of the current authors, and presented in their book Nudge (2008). That book elaborates two central ideas. The first involves nudges: small interventions that gently steer choosers towards, or away from, this option or that without imposing mandates or economic incentives, and without limiting the choice set. The second involves choice architecture, understood as the particulars of the setting within which choices are made, or the framing of the choices themselves. Consider the arrangement of food options in a cafeteria or the listing of items on a menu. Nudges often operate via changes in choice architecture. Automatic (but not binding) enrollment in a pension plan, and automatic payment of credit card bills and mortgages, are nudges. So is a text message reminding people that a bill is due or that a doctor’s appointment is nigh; so are the default settings on computers and cell phones.
https://bppblog.com/2017/12/20/ablutions-at-the-grand-hotel-on-navigability-and-choice-architecture/

As always, please read the whole post. (My word for the day is “finjans“). But the point of interest as regards our blog post today came later on in the post, and to give you context, you’ll have to take a look at this picture, and then the accompanying text:

https://bppblog.com/2017/12/20/ablutions-at-the-grand-hotel-on-navigability-and-choice-architecture/

It is not news that water already in the pipes when one first turns water on is at room temperature at best. In December in Stockholm that means: cold. The water has to run a bit before the hot water arrives. As our bathrooms were designed, someone wishing to shower under the ceiling showerhead could not avoid a startlingly cold dousing. Alas, even after figuring out what knobs and levers to manipulate, there was no alternative to standing directly underneath that showerhead when turning it on! The knobs were simply too far away to be reached with an outstretched arm from a suitable distance. Each shower from that source thus inevitably began with a gulp, a yelp, and a backwards hop, landing one directly on the tub’s drainage hole – placed unusually in the middle of the bathtub.
The design solution is easy enough, since plumbing does not constrain the obvious: the tinkering area – the knobs and levers – should not be beneath a fixed showerhead. This would benefit not only hotel guests, but also maintenance personnel. This is best done at the plumbing installment stage, but can be fixed even at this late stage by extending, even by only a foot, the arm of the water pipe that runs parallel to the ceiling (of course, the protective glass partition would also have to be extended). Lessons: don’t make your design more complex than necessary, and try out your design before adopting it widely.
https://bppblog.com/2017/12/20/ablutions-at-the-grand-hotel-on-navigability-and-choice-architecture/

We’ve all experienced this, of course. And the reason I was inspired to write this post is because of a tweet I came across today morning:

This one gets the Oscar for the best hotel shower for 2 reasons:
1. The temperature knob is separate from on/off. So once you get the right temp you’re good.
2. The knobs are on the side, not below the shower. So no athletic jumps required to avoid getting frozen or scalded!!! pic.twitter.com/FdfNZXjjkt
— Parminder Singh (@parrysingh) September 28, 2023

I’m fairly sure Cass Sunstein and Maya Bar-Hillel would wholeheartedly approve of the design choices in this tweet, and I’m equally sure that Parminder Singh would appreciate the difficulty that both academicians faced in Sweden. Maybe the hotel that Parminder Singh is staying at could share some notes with the Grand Hotel in Stockholm?

But for us students of economics, three lessons:

Learn to see like an economist, and once you do, never stop looking. No matter where you are, including your bathrooms!
Learn to make connections across domains. It takes rare old skill to talk meaningfully about finjans, shower heads (four of ’em, that too) and exits towards airports in the same post. Once you have he underlying theory down pat in your own head, you’ll be surprised at how many connections you are able to make.
Write! Write about all of your observations, always and everyday. Don’t worry about who reads the stuff that you write, and feel free to not share it with anybody if you prefer. But write. It is excellent exercise for the brain. You can’t possibly be asking “but what can I write about?” after reading this, surely. Why, academicians talk about bathtubs, even!

And finally, one small correction to Parminder’s excellent tweet, if I may be so bold. That hotel shower gets the Nobel, of course, not the Oscar.

Peace or Economics, you ask?

I say both.

A Tweet, A Reply, And So A Blogpost

A genetically modified pig heart has been transplanted into a man with heart disease – he is breathing on his own and the heart is functioning without support. https://t.co/WExRuvtmUr
— New Scientist (@newscientist) September 25, 2023

It goes without saying that I do not know enough about the details, but I certainly treat this tweet as good news. And in case you missed reading about it, there’s also this from last month.

It is remarkable how much progress we’re making in the medical field, and based on what little I understand of the developments over the last two to three years, we’re only getting started.
But it was a reply to this tweet that had my EFE antennae really and truly perk up:

We need to move fast

Autonomous cars are coming and this will decrease the number of organs available for transplant by 33%
— Stefan Bodnarescu (@stefan_bod) September 26, 2023

There’s so much to analyse in that short little tweet!

Autonomous cars are coming – they’ve “been coming” for a long time, it is true. But whenever they do, y’know, actually come, will they make the world a better place or not?
We can (and do) worry about what impact this will have on employment, car ownership patterns, parking lots within cities and lots of other things. But what about fatalities?
Do I mean fatalities caused by having autonomous cars, or fatalities avoided because we have autonomous cars? Well, the net effect, of course.
This tweet makes the claim that fatalities will, on net, go down because of autonomous cars. Maybe you agree, maybe you don’t. But especially if you do not, I would argue that you should focus on not just newspaper reports about deaths caused by autonomous cars, but also check to see if fatality statistics drop as autonomous cars become more prevalent. This is where a carefully designed econometric analysis can be truly useful. Counterfactuals really and truly matter!
But let’s assume, for the moment, that fatality statistics will actually come down. If they do, surely that’s a good and wonderful thing?
But ah, TANSTAAFL! What this tweet is really getting at is the opportunity cost of a reduction in fatalities as a consequence of greater deployment of autonomous cars. That is, the author of the tweet assumes that fatalities will come down with autonomous cars… but then asks about some of the second order effects.
And one second order effect, he says, is that we simply will not have as many organs up for donation as we used to earlier. Fewer fatalities by definition means fewer deaths (which is awesome), but it also means lesser organs up for donation (which is not so awesome)
And so we need to get a move on in biomedical sciences, and make sure we figure out how to grow organs suitable for human transplants.
Have fun going further out on this limb if you are a student of economics. Imagine, for example, what a world with abundant organs for transplants might look like. Will people end up being less careful about their health? Is that a good thing or a bad thing?
You might be tempted to say it is a bad thing. But consider this: will not this cavalier attitude towards health lead to greater demand for better quality of transplants and at lower prices?
Note that I have no clue what the “correct” answer is! I’m simply trying to point out that simple applications of simple economic concepts can help you frame better and more thought-provoking questions.

Cricket and the Dunning Kruger Effect

I was all of twelve years old when Sachin decided to go mad in New Zealand. It was the first time he had been asked to open the batting for India, and as with all things Indian cricket back then, it wasn’t a well planned, well thought out thing. Navjot Singh Sidhu, if memory serves me right, had a stiff neck, and so the greatest ODI opener ever became an opener. So it goes.

But that was the day I really and truly became a cricket fan. I have memories of watching the ’92 World Cup, and even fonder memories of the Hero Cup – but Sachin’s batting as an opener is what turned me into a cricket devotee.

As with many people these days, though, so also with me. There is so much cricket being played these days that it is hard to maintain the same level of passion. There’s all these leagues, plus the never ending parade of bilateral one-dayers and T20’s, and Test matches to boot. It is simply too much to keep up with, so I don’t.

And which is why I maintain that this really ought to be the last ODI World Cup. Announce it as such, celebrate the grand old tournament and the grand old format one last time, and then do what we’ve all pretty much done in any case, and move on to a world of T20’s and (some) Test matches.

It’s never going to happen, of course. So long as there is a single rupee to be flogged out of it, the format will continue to be tormented and tortured, and we will keep watching, zombie-like, for years to come.

So we might as well analyse it, and ask how we might think of the ODI format using principles of economics. Should one think of it as a slightly more aggressive version of Test cricket, or should one think of it as a slightly less aggressive version of T20 cricket?

That, at any rate, is the question that Nathan Leamon asks in a nice little write-up for ESPNCricinfo. It’s a question that has been asked for as long as the latest format of the sport has been around, of course. The reason this article is interesting is because Leamon claims that this is the first ODI World Cup where most players will approach it after having been steeped in not Test Cricket, but T20 cricket.

When the ODI format was first introduced, players played it as a shortened Test match. Test matches was the format they were used to, so their way of playing ODI’s was conditioned by the style they had been trained in and for. Which, of course, is why ODI’s from the ’70’s and the ’80’s were rather more slow and steady in their outlook. But the madness and mayhem of the ’90s and the ’00’s was because youngsters had grown up playing ODI’s, and were as a consequence more agile in the field, faster with the bat, and more imaginative with the ball. Indian fans of a suitable age, please note that I am talking about global trends, not about the Indian team of the 1990’s in particular.

But over the last decade, as Leamon puts it:

“The growth of T20 franchise leagues around the world, in particular the IPL, which overnight became the richest game in town, meant that the next generation of pro cricketers played T20 cricket from day one. The format became its own world. The shots played in T20 cricket started to look designed for that format, not for defending your wicket in a Test match a hundred years ago.

As the years went by, T20 cricket overcame the Anxiety of Influence and, slowly but surely, the direction of the flow of ideas reversed. It became the main source of cricketing innovation. T20 shots and tactics started to diffuse into 50-over cricket and even, to a much lesser extent, Test matches.”

And especially over the last three years or so, partly because of the pandemic, and partly as a consequence of commercial considerations, T20 has been the format of choice, regardless of whether it is club or country. To the extent that Joe Root of England has played all of 12 ODI matches since the 2019 World Cup.

And so this World Cup, in 2023, will be the first World Cup where the format (ODI’s) will be driven by “levels of batting aggression and bowling defensiveness” that come from the T20 culture.

It’s all well and good to say this, but what does this mean in practice?

Consider these three points from Leamon’s write-up:

In T20 cricket, a single is a “win” for the bowling team. In Tests, it is a “win” for the batting side. What about the 2023 World Cup?
When a wicket falls in a T20 match, it often has no response on the scoring rate. In a Test match, it usually slows the rate at which runs are going to be scored. What about the 2023 World Cup?
And finally, a quote from the article worth reproducing in full:
“Most teams are going to arrive at this World Cup with a lot less knowledge of where ODI cricket currently is, than they have had at every recent tournament. The winning team is likely to be the one that quickly and successfully overcomes this lack of understanding and finds the right balance of techniques and tactics for the situation.”

As always, the real fun is when you take this lesson, and apply it to other walks of life. How long before blog posts are attempted by people who have grown up composing tweets? How long before television series are directed by people who have grown up making TikToks (and I’m sure this has happened already)? How might each of these formats benefit (or otherwise) as a consequence?

Note: To understand the reference to Dunning-Kruger, you will have to read the original Cricinfo piece. Worth it, I assure you.

Theory is one thing, implementation is a whole other story

In a paper written earlier this year, I and my co-author, Murali Neelakantan argued for unifying India’s (many) healthcare markets. Part of our proposal touched upon the need to unify a crucial aspect of these markets: procurement.

Furthermore, to ensure cost eff iciency and streamlined operations, the government will act as a monopsony buyer, procuring medical goods and services for all empanelled hospitals. This centralised approach will enable economies of scale, reduce costs, and guarantee steady demand for healthcare providers. By taking on the responsibility of procuring medical supplies, the government can negotiate better prices and allocation of resources within the healthcare system. We recognise that the healthcare requirements will vary across the country and even across states but we argue that there can be central procurement nevertheless. The local hospital or health authority will purchase based on the price notified by the central procurement agency.
https://ippr.in/index.php/ippr/article/view/213/92

It is one thing to say this as a theorist. As any public policy analyst will tell you, it is quite another to actually implement a scheme such as this. There will be teething troubles, there will be glitches. There will be leakages and pilferages. There will be stumbling blocks and unforeseen issues. Why, where will you start even, leave alone the question of actually making the whole thing work!

All good questions, of course. Entirely valid points. But we did point out that at least one state in India has already taken steps in this direction. Tamil Nadu already does centralized procurement of medicines, among other things worth emulating:

Another instance of successful healthcare reform at the state level can be found in Tamil Nadu, where the state government has implemented a range of innovative measures to improve the accessibility and affordability of healthcare services. These initiatives include the Tamil Nadu Medical Services Corporation (TNMSC), which centralises the procurement and distribution of drugs and medical equipment, resulting in more efficient and cost-effective processes (Parthasarathi and Sinha 2016).
https://ippr.in/index.php/ippr/article/view/213/92

But then, about two weeks ago, came news of a most excellent paper, written by CS Pramesh et al. Allow me to quote the abstract in its entirety:

In health systems with little public funding and decentralized procurement processes, the pricing and quality of anti-cancer medicines directly affects access to effective anti-cancer therapy. Factors such as differential pricing, volume-dependent negotiation and reliance on low-priced generics without any evaluation of their quality can lead to supply and demand lags, high out-of-pocket expenditures for patients and poor treatment outcomes. While pooled procurement of medicines can help address some of these challenges, monitoring of the procurement process requires considerable administrative investment. Group negotiation to fix prices, issuing of uniform contracts with standardized terms and conditions, and procurement by individual hospitals also reduce costs and improve quality without significant investment. The National Cancer Grid, a network of more than 250 cancer centres in India, piloted pooled procurement to improve negotiability of high-value oncology and supportive care medicines. A total of 40 drugs were included in this pilot. The pooled demand for the drugs from 23 centres was equivalent to 15.6 billion Indian rupees (197 million United States dollars (US$)) based on maximum retail prices. The process included technical and financial evaluation followed by contracts between individual centres and the selected vendors. Savings of 13.2 billion Indian Rupees (US$ 166.7million) were made compared to the maximum retail prices. The savings ranged from 23% to 99% (median: 82%) and were more with generics than innovator and newly patented medicines. This study reveals the advantages of group negotiation in pooled procurement for high-value medicines, an approach that can be applied to other health systems.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10452934/ (Emphasis added)

There is an important difference between what was attempted here and what we are suggesting in our paper. Our paper talks of centralized procurement, while this paper speaks of implementing a pooled procurement approach. As they go on to say in their paper, “…centralized procurement systems require considerable administrative and managerial resources. A pooled procurement approach that is less resource-intensive and sustainable without significant investment is the WHO-suggested group contracting approach”.

But note that they did not give up on centralized procurement – they thought it easier to begin with pooled procurement, before tackling the much bigger beast that is centralized procurement. (Also note that there is academic research on how centralized procurement can be of benefit, especially in developing nations.)

And they’re quite right, of course. Beginning at a relatively smaller scale and then attempting more ambitious targets is unglamorous, perhaps – but it is also a much more sensible way of doing things. These four paragraphs in particular make for fascinating reading in terms of actually working through the nitty-gritty of implementing pooled procurement. And if you are going to spend time reading those four paragraphs later, please also do spend time on Fig.2.

What were the key takeaways?

Considerable savings, both on generic drugs, as well as on innovator drugs.
“This outcome suggests that the concentration of demand significantly strengthened our negotiating power, while the centralized negotiation approach, combined with larger purchase quantities, allowed us to secure substantial price discounts.”
Opportunity costs matter!
“The potential impact of cost savings is huge, in not only improving the affordability of care and decreasing out-of-pocket costs for patients, but allowing for the re-allocation of drug procurement funds towards other initiatives to deliver high-quality care”
Enforcement of quality standards became easier, because of pooled procurement.
“These savings are notable because they were achieved without compromising on quality, due to strict standards imposed on both the drugs and the companies.”
Pooled procurement helps individual patients across India, regardless of region-wise differences.
Lower treatment abandonment rates (yay!), and therefore higher survival rates (double yay!).
Lesser financial burden on the patients!

And to end, the paragraph that I hope will launch a thousand studies, and eventually, the implementation of centralized procurement of drugs and consumables in India:

Based on the success of our piloting of pooled procurement in the network, conducting such negotiations may be relevant at a larger scale for oncology drugs, such as through the national health authority, as that will enhance the bargaining power as well as have far-reaching impact on access and affordability across the entire national network. Negotiation on a national level could also address the challenges of vendor monopoly or patented drugs supplied by a single vendor. Furthermore, to determine the final price for innovator and single vendor drugs, a comprehensive evaluation of the available literature on efficacy and safety data is crucial. If a drug meets the threshold for significant clinical benefits, cost-effectiveness assessment using adaptive health technology can provide guidance for negotiating prices.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10452934/

Bloomberg on China’s Property Crisis

On Zero Sum Thinking

Zero-sum thinking is a key mindset that shapes how we view the world. Excited to share a new paper on the roots and consequences of Zero-sum thinking with @sahilchinoy, @DrNathanNunn @SMGSequeira. A summary thread🧵1/23 https://t.co/wqk3BQ5lGZ pic.twitter.com/7a0bRxy1J2
— Stefanie Stantcheva (@S_Stantcheva) September 18, 2023

On The Economics of Time Wasting in Football

Or soccer, if you so prefer.

So what is the issue here? If you are a football fan, you know it all too well. But if you are not, here is some background:

The International Football Association Board (Ifab) has ordered referees around the world to clamp down on time-wasting and add on the exact time taken for goal celebrations, substitutions, injuries, red cards, penalties and VAR checks. Referees would previously add 30 seconds for each goal or substitution.
An average of 16 minutes and 34 seconds was added to matches on the opening weekend of the EFL season, with 29 bookings for time-wasting compared with two on the final day of last season. Any player who stands in front of a free-kick to prevent it being taken quickly or kicks the ball away to delay a restart, for example, will be cautioned.
https://www.theguardian.com/football/2023/aug/10/added-time-howard-webb-no-backing-down-referees-ifab

… and you’d think that’s a good thing, right? Even the most casual fan of football is all too aware of the fact that there is a lot of time-wasting that goes on. So anything that helps eradicate it is a good thing, surely?

Well, turns out there’s two ways to “deal” with time-wasting. One, do it hockey style. Field hockey uses a stop-clock, and the clock is, well, stopped when play is stopped. The countdown timer is reactivated when play begins, and we “count down” the amount of time that needs to be played.

The second method, which is the one that football currently favors, is to not use a stopwatch, but rather to count the number of minutes that play has been stopped for, and add those minutes as “extra” time.

So why the second method, and not the first?

Well, economics:

Why is this happening? As ever, follow the money. The drive to increase active “game time” (itself a vapid, ill-defined concept) comes directly from Fifa. And Fifa is essentiality a TV rights distributions agency, its entire model based around increasing screen revenues. What we have here is the laws of the game being employed as a tool to doctor the perceived TV entertainment value of the product; as expressed via a massively overengineered notion of what the referee’s role should be, clumsily grasped value judgments of what entertainment looks like, and how this sport, our own shared treasure, should feel and look.
https://www.theguardian.com/football/blog/2023/aug/17/time-wasting-in-football-is-ugly-maddening-and-absolutely-vital

If you found it difficult to untangle the simple message in that paragraph, here is the quick takeaway: longer games means more advertising opportunities. But it also means more goal-scoring opportunities, which means more entertainment… which means, well, more advertising opportunities.

Is that a cynical hypothesis, or is this backed up by data?

In the 49 games played so far, the average match duration is now 101 minutes and 40 seconds, an increase of three minutes and 36 seconds on last season. This means Premier League games are lasting even longer than matches at last year’s men’s World Cup, where world governing body FIFA pressed the need to give fans better value for money in terms of action.
Even more dramatic is the uplift in effective playing time — the amount of time the ball is in play — with that increasing by four minutes and 25 seconds to 59 minutes and 30 seconds.
These extra minutes have produced more goals, with 151 being scored already this season, 3.1 per game. And 22 of those goals have come in added time, compared to only five at this point of the season last year.
https://theathletic.com/4886737/2023/09/21/premier-league-time-wasting/

So who’s complaining? We have more people watching more goals being scored, and therefore advertising revenue has gone up. Seems like a good thing all round. No?

Ah, but TANSTAAFL.

The move has not been greeted by everyone, though, with several prominent managers and players pointing out this would lead to more fatigue, injuries and potential burnout. European football’s governing body UEFA has refused to implement the new guidelines, opting instead to tell their match officials to keep the game moving.
https://theathletic.com/4886737/2023/09/21/premier-league-time-wasting/

So what should we be optimizing for?

Should players be optimizing for not doing time wasting? What if they see their opponents doing it? How does the game theoretic solution then work out? How does it work out under the stopwatch rule? How does it work out under the added-time rule?
Should the organizers be optimizing for players’ well-being? Or for advertising revenue?
Should player recruitment change to account for the fact that matches will last longer? Should the number of substitutions be increased to account for longer matches? But will that not affect smaller clubs disproportionately?

If you are a fan of the sport, thinking through all this (and more) gives you a way to understand how one decision can impact so many others. Learn to transfer this type of interconnected thinking into other domains.

If you are a fan of economics or public policy, use the tools of your trade to think through the “best” solution. But realize that no solution is perfect, and that somebody, somewhere, will end up complaining. Maybe the players will play for too long, maybe you’ll stymie revenue growth, maybe coaches will come up with strategies to “work around” this problem.

And if you are a fan of both football and economics/public policy tell me what I’m missing!

Ethan Mollick et al on AI’s Jagged Frontiers

The public release of Large Language Models (LLMs) has sparked tremendous interest in how humans will use Artificial Intelligence (AI) to accomplish a variety of tasks. In our study conducted with Boston Consulting Group, a global management consulting firm, we examine the performance implications of AI on realistic, complex, and knowledge-intensive tasks. The pre-registered experiment involved 758 consultants comprising about 7% of the individual contributor-level consultants at the company. After establishing a performance baseline on a similar task, subjects were randomly assigned to one of three conditions: no AI access, GPT-4 AI access, or GPT-4 AI access with a prompt engineering overview. We suggest that the capabilities of AI create a “jagged technological frontier” where some tasks are easily done by AI, while others, though seemingly similar in difficulty level, are outside the current capability of AI. For each one of a set of 18 realistic consulting tasks within the frontier of AI capabilities, consultants using AI were significantly more productive (they completed 12.2% more tasks on average, and completed tasks 25.1% more quickly), and produced significantly higher quality results (more than 40% higher quality compared to a control group). Consultants across the skills distribution benefited significantly from having AI augmentation, with those below the average performance threshold increasing by 43% and those above increasing by 17% compared to their own scores. For a task selected to be outside the frontier, however, consultants using AI were 19 percentage points less likely to produce correct solutions compared to those without AI. Further, our analysis shows the emergence of two distinctive patterns of successful AI use by humans along a spectrum of human- AI integration. One set of consultants acted as “Centaurs,” like the mythical half- horse/half-human creature, dividing and delegating their solution-creation activities to the AI or to themselves. Another set of consultants acted more like “Cyborgs,” completely integrating their task flow with the AI and continually interacting with the technology.
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4573321

That’s the abstract of a paper written by a team of academicians based in the United States, of whom Prof. Ethan Mollick is one. The idea behind the paper is very simple: can we quantify just how much of an improvement in productivity is made possible because of using AI?

And the TL;DR is that productivity is way up. From the abstract: “consultants using AI were significantly more productive (they completed 12.2% more tasks on average, and completed tasks 25.1% more quickly), and produced significantly higher quality results (more than 40% higher quality compared to a control group)”

Some points of especial interest from my perspective:

The advantages of AI are substantial, but unclear. We don’t know which tasks will be completed more efficiently (and better) by using AI, and which won’t. Worse, nobody knows for sure. It is very much a trial-and-error thing. (pg 3)
This is a dynamic problem. How our interaction with AI changes, how the nature of our tasks change, and how AI gets better – all of these will vary with time. This paper will be outdated within a matter of weeks, not days – but that is a feature, not a bug. (pg 4)
What was the task itself? Note that there were two different experiments, and within each experiment, there were two tasks. The first experiment was “within the frontier”, which means an experiment that was thought to be well within GPT-4’s capabilities. For each experiment, participants were “benchmarked” using an assessment task, and were then asked to work on an “experimental” task. I will always be referring to the “experimental” task:
“In this experimental task, participants were tasked with conceptualizing a footwear idea for niche markets and delineating every step involved, from prototype description to market segmentation to entering the market. An executive from a leading global footwear company verified that the task design covered the entire process their company typically goes through, from ideation to product launch.5 Participants responded to a total of 18 tasks (or as many as they could within the given time frame). These tasks spanned various domains. Specifically, they can be categorized into four types: creativity (e.g., “Propose at least 10 ideas for a new shoe targeting an underserved market or sport.”), analytical thinking (e.g., “Segment the footwear industry market based on users.”), writing proficiency (e.g., “Draft a press release marketing copy for your product.”), and persuasiveness (e.g., “Pen an inspirational memo to employees detailing why your product would outshine competitors.”). This allowed us to collect comprehensive assessments of quality.” (pg 8 and pg 9)
This was especially impressive:
“Our results reveal significant effects, underscoring the prowess of AI even in tasks traditionally executed by highly skilled and well-compensated professionals. Not only did the use of AI lead to an increase in the number of subtasks completed by an average of 12.5%, but it also enhanced the quality of the responses by an average of more than 40%. These effects support the view that for tasks that are clearly within its frontier of capabilities, even those that historically demanded intensive human interaction, AI support provides huge performance benefits.” (pg 12)
The “outside the frontier” task:
“Participants used insights from interviews and financial data to provide recommendations for the CEO. Their recommendations were to pinpoint which brand held the most potential for growth. Additionally, participants were also expected to suggest actions to improve the chosen brand, regardless of the exact brand they had chosen” (pg 13)
Even in the case of these tasks, there was improvement across the board in terms of lesser time spent, and also in terms of improvement of quality in output (pg 14 and 15)
The authors found that there were two dominant approaches:
“The first is Centaur behavior. Named after the mythical creature that is half-human and half-horse, this approach involves a similar strategic division of labor between humans and machines closely fused together.12 Users with this strategy switch between AI and human tasks, allocating responsibilities based on the strengths and capabilities of each entity. They discern which tasks are best suited for human intervention and which can be efficiently managed by AI.
The second model we observed is Cyborg behavior. Named after hybrid human- machine beings as envisioned in science fiction literature, this approach is about intricate integration. Cyborg users don’t just delegate tasks; they intertwine their efforts with AI at the very frontier of capabilities. This strategy might manifest as alternating responsibilities at the subtask level, such as initiating a sentence for the AI to complete or working in tandem with the AI.” (pg 16)
And finally, their concluding paragraph:
“Finally, we note that our findings offer multiple avenues for interpretation when considering the future implications of human/AI collaboration. Firstly, our results lend support to the optimism about AI capabilities for important high-end knowledge work tasks such as fast idea generation, writing, persuasion, strategic analysis, and creative product innovation. In our study, since AI proved surprisingly capable, it was difficult to design a task in this experiment outside the AI’s frontier where humans with high human capital doing their job would consistently outperform AI. However, navigating AI’s jagged capabilities frontier remains challenging. Even for experienced professionals engaged in tasks akin to some of their daily responsibilities, this demarcation is not always evident. As the boundaries of AI capabilities continue to expand, often exponentially, it becomes incumbent upon human professionals to recalibrate their understanding of the frontier and for organizations to prepare for a new world of work combining humans and AI. Overall, AI seems poised to significantly impact human cognition and problem-solving ability. Similarly to how the internet and web browsers dramatically reduced the marginal cost of information sharing, AI may also be lowering the costs associated with human thinking and reasoning, with potentially broad and transformative effects”

This chart tells uite the story:

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4573321 (pg 28)

The appendix (pg 44 onwards) details the tasks, if you would like to go through them.

Finally, a part of the abstract that I’m still thinking about:

“Consultants across the skills distribution benefited significantly from having AI augmentation, with those below the average performance threshold increasing by 43% and those above increasing by 17% compared to their own scores. For a task selected to be outside the frontier, however, consultants using AI were 19 percentage points less likely to produce correct solutions compared to those without AI”

A lovely, thought-provoking paper. Whatever your own opinions about the impact of AI upon productivity, employment and output, a carefully designed academic study such as this is worth reading, and critiquing.

And if you are currently in college (any college), learn how to get better at working with AI!

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: