The public release of Large Language Models (LLMs) has sparked tremendous interest in how humans will use Artificial Intelligence (AI) to accomplish a variety of tasks. In our study conducted with Boston Consulting Group, a global management consulting firm, we examine the performance implications of AI on realistic, complex, and knowledge-intensive tasks. The pre-registered experiment involved 758 consultants comprising about 7% of the individual contributor-level consultants at the company. After establishing a performance baseline on a similar task, subjects were randomly assigned to one of three conditions: no AI access, GPT-4 AI access, or GPT-4 AI access with a prompt engineering overview. We suggest that the capabilities of AI create a “jagged technological frontier” where some tasks are easily done by AI, while others, though seemingly similar in difficulty level, are outside the current capability of AI. For each one of a set of 18 realistic consulting tasks within the frontier of AI capabilities, consultants using AI were significantly more productive (they completed 12.2% more tasks on average, and completed tasks 25.1% more quickly), and produced significantly higher quality results (more than 40% higher quality compared to a control group). Consultants across the skills distribution benefited significantly from having AI augmentation, with those below the average performance threshold increasing by 43% and those above increasing by 17% compared to their own scores. For a task selected to be outside the frontier, however, consultants using AI were 19 percentage points less likely to produce correct solutions compared to those without AI. Further, our analysis shows the emergence of two distinctive patterns of successful AI use by humans along a spectrum of human- AI integration. One set of consultants acted as “Centaurs,” like the mythical half- horse/half-human creature, dividing and delegating their solution-creation activities to the AI or to themselves. Another set of consultants acted more like “Cyborgs,” completely integrating their task flow with the AI and continually interacting with the technology.https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4573321
That’s the abstract of a paper written by a team of academicians based in the United States, of whom Prof. Ethan Mollick is one. The idea behind the paper is very simple: can we quantify just how much of an improvement in productivity is made possible because of using AI?
And the TL;DR is that productivity is way up. From the abstract: “consultants using AI were significantly more productive (they completed 12.2% more tasks on average, and completed tasks 25.1% more quickly), and produced significantly higher quality results (more than 40% higher quality compared to a control group)”
Some points of especial interest from my perspective:
- The advantages of AI are substantial, but unclear. We don’t know which tasks will be completed more efficiently (and better) by using AI, and which won’t. Worse, nobody knows for sure. It is very much a trial-and-error thing. (pg 3)
- This is a dynamic problem. How our interaction with AI changes, how the nature of our tasks change, and how AI gets better – all of these will vary with time. This paper will be outdated within a matter of weeks, not days – but that is a feature, not a bug. (pg 4)
- What was the task itself? Note that there were two different experiments, and within each experiment, there were two tasks. The first experiment was “within the frontier”, which means an experiment that was thought to be well within GPT-4’s capabilities. For each experiment, participants were “benchmarked” using an assessment task, and were then asked to work on an “experimental” task. I will always be referring to the “experimental” task:
“In this experimental task, participants were tasked with conceptualizing a footwear idea for niche markets and delineating every step involved, from prototype description to market segmentation to entering the market. An executive from a leading global footwear company verified that the task design covered the entire process their company typically goes through, from ideation to product launch.5 Participants responded to a total of 18 tasks (or as many as they could within the given time frame). These tasks spanned various domains. Specifically, they can be categorized into four types: creativity (e.g., “Propose at least 10 ideas for a new shoe targeting an underserved market or sport.”), analytical thinking (e.g., “Segment the footwear industry market based on users.”), writing proficiency (e.g., “Draft a press release marketing copy for your product.”), and persuasiveness (e.g., “Pen an inspirational memo to employees detailing why your product would outshine competitors.”). This allowed us to collect comprehensive assessments of quality.” (pg 8 and pg 9)
- This was especially impressive:
“Our results reveal significant effects, underscoring the prowess of AI even in tasks traditionally executed by highly skilled and well-compensated professionals. Not only did the use of AI lead to an increase in the number of subtasks completed by an average of 12.5%, but it also enhanced the quality of the responses by an average of more than 40%. These effects support the view that for tasks that are clearly within its frontier of capabilities, even those that historically demanded intensive human interaction, AI support provides huge performance benefits.” (pg 12)
- The “outside the frontier” task:
“Participants used insights from interviews and financial data to provide recommendations for the CEO. Their recommendations were to pinpoint which brand held the most potential for growth. Additionally, participants were also expected to suggest actions to improve the chosen brand, regardless of the exact brand they had chosen” (pg 13)
- Even in the case of these tasks, there was improvement across the board in terms of lesser time spent, and also in terms of improvement of quality in output (pg 14 and 15)
- The authors found that there were two dominant approaches:
“The first is Centaur behavior. Named after the mythical creature that is half-human and half-horse, this approach involves a similar strategic division of labor between humans and machines closely fused together.12 Users with this strategy switch between AI and human tasks, allocating responsibilities based on the strengths and capabilities of each entity. They discern which tasks are best suited for human intervention and which can be efficiently managed by AI.
The second model we observed is Cyborg behavior. Named after hybrid human- machine beings as envisioned in science fiction literature, this approach is about intricate integration. Cyborg users don’t just delegate tasks; they intertwine their efforts with AI at the very frontier of capabilities. This strategy might manifest as alternating responsibilities at the subtask level, such as initiating a sentence for the AI to complete or working in tandem with the AI.” (pg 16)
- And finally, their concluding paragraph:
“Finally, we note that our findings offer multiple avenues for interpretation when considering the future implications of human/AI collaboration. Firstly, our results lend support to the optimism about AI capabilities for important high-end knowledge work tasks such as fast idea generation, writing, persuasion, strategic analysis, and creative product innovation. In our study, since AI proved surprisingly capable, it was difficult to design a task in this experiment outside the AI’s frontier where humans with high human capital doing their job would consistently outperform AI. However, navigating AI’s jagged capabilities frontier remains challenging. Even for experienced professionals engaged in tasks akin to some of their daily responsibilities, this demarcation is not always evident. As the boundaries of AI capabilities continue to expand, often exponentially, it becomes incumbent upon human professionals to recalibrate their understanding of the frontier and for organizations to prepare for a new world of work combining humans and AI. Overall, AI seems poised to significantly impact human cognition and problem-solving ability. Similarly to how the internet and web browsers dramatically reduced the marginal cost of information sharing, AI may also be lowering the costs associated with human thinking and reasoning, with potentially broad and transformative effects”
This chart tells uite the story:
The appendix (pg 44 onwards) details the tasks, if you would like to go through them.
Finally, a part of the abstract that I’m still thinking about:
“Consultants across the skills distribution benefited significantly from having AI augmentation, with those below the average performance threshold increasing by 43% and those above increasing by 17% compared to their own scores. For a task selected to be outside the frontier, however, consultants using AI were 19 percentage points less likely to produce correct solutions compared to those without AI”
A lovely, thought-provoking paper. Whatever your own opinions about the impact of AI upon productivity, employment and output, a carefully designed academic study such as this is worth reading, and critiquing.
And if you are currently in college (any college), learn how to get better at working with AI!