


Using Chain-of-Density can produce more bang for your buck when leveraging generative AI.
Trying to put five pounds of rocks into a three-pound bag.
That old adage about filling a bag or sack is indicative that sometimes you are faced with the difficult chore of seeking to squeeze down something larger into something smaller in size. Turns out that we do this all the time, particularly when attempting to summarize materials such as a lengthy article or a voluminous blog posting. You have to figure out how to convey the essence of the original content and yet do so with less available space when doing so.
Welcome to the world of summarization and the at times agonizing tradeoffs in deriving sufficient and suitable summaries. It can be challenging and exasperating to devise a summary. You want to make sure that crucial bits and pieces make their way into the summary. At the same time, you don’t want the summary to become overly unwieldy and perhaps begin to approach the same size as the original content being summarized.
I bring up this topic because a common use of generative AI consists of getting the AI app to produce a summary for you. You feed an article or some narrative into the generative AI and ask for a handy dandy summary. The AI app complies. But you have to ask yourself, is the summary any good? Does it do a proper job of summarizing? Has anything vital been left out? Could the summary be more tightly conceived? Etc.
A new method of devising summaries involves a clever prompting strategy that aims to bolster generative AI toward attaining especially superb or at least better than usual kinds of summaries. The technique is known as Chain-of-Density (CoD). Anybody versed in prompt engineering ought to become familiar with this insightful technique. Consider Chain-of-Density as not only helpful for producing summaries but there are a lot of other benefits garnered by understanding how the technique works and the ways in which this can power up your overall prompting prowess all-told.
In today’s column, I am continuing my ongoing special series that closely explores the newest advances in prompt engineering for generative AI and will carefully reveal to you the keystone ins and outs of the Chain-of-Density prompting technique. The underlying research that has developed the technique will be examined. Furthermore, I will highlight several examples of how to do so in a practical day-to-day means of leveraging the CoD capacities.
Explaining The Chain And The Density
You might be wondering why the technique refers to a Chain-of-Density.
I’m glad you asked.
Allow me a moment to explain.
When you are trying to craft a summary, you often might do so in a series of successive attempts. Your first shot might be to craft a summary that has only a few of the biggest points that need to be included. After considering the initial draft, the odds are that you might further refine the summary by adding more elements to it. This can go on and on. Depending on how thorough you are, you might do a handful or more of these refining iterations. Each iteration can be construed as a chain of iterative summaries, one leading to the next for a given instance of trying to write a summary.
That’s the “chain” part of this process.
Let’s add some further terminology to describe the summary-making effort.
A summary typically starts as somewhat sparse when you first toss it together. There isn’t much of any substance in the summary. You are usually seeking to further pack substance into the summary and do so while fighting the length of the summary. The more substance that you can jam into the summary, the higher the density of the summary.
We can give a name to the substance by saying that we are trying to identify important “entities” within the original content. Those entities might be facts or figures. The entities are said to be anything especially instrumental to the overall meaning of the original content. A hope is to carry over as many of the demonstrative entities as feasible into the summary.
Your summary-making process then is to iteratively devise a summary by starting with a sparse version and then adding more and more entities or substances to increase the density until you reach some desired or suitable end-state. The series of iterations acts as a chain. Each is used to connect to the next. You usually will retain the entities from one version to the next version, and be decidedly adding more of the entities available in the original as you seek to jampack the summary accordingly.
Reflect on the adage of putting five pounds of rocks into a three-pound bag.
Maybe you put one pound of rocks into the three-pound bag at the initial attempt. The bag is considered sparsely populated. There is still room to spare. The density is low. You then put a second pound of rocks into the bag. The density is increasing. The sparseness is lessening. Finally, you put in a third-pound of rocks. You have hit the maximum density and the sparseness has presumably dropped to near zero.
Suppose that the bag can be elongated.
Wonderful, you exclaim, being overjoyed at having more available space. Imagine though that you are going to hand the bag over to someone else. The larger and heavier the bag, the less useful it becomes. The same applies to summaries.
A rule of thumb is that you want to minimize the length or size of the summary, meanwhile maximizing the summarization content. The two factors are often in contention with each other. You are tempted to increase the length to get more substance included. The length being increased will potentially undercut that the summary is supposed to be a summary.
A person might seemingly just go ahead and read the original content if the summary approaches the size of the original material being summarized. The summary isn’t especially a summary anymore at that juncture. Indeed, sometimes a summary turns out to be longer than the original content that is supposedly being summarized.
How can this be, you might be thinking?
The answer has to do with being extractive versus being abstractive.
During the summarization process, you are looking at two possibilities of the content being carried over into the summary. First, you aim to be extractive, primarily extracting key aspects and shoveling those into the summary. Second, you might at times be abstractive, whereby you go beyond the words themselves of the original content and begin to reinterpret or perhaps elaborate beyond what the summary per se has to say.
A purely extractive summary is more likely to be construed as a fair and balanced reflection of the original content. You are not changing things up. You are only carrying the essentials (entities) over into the summary. The problem with an abstractive summary is that you are potentially changing up things and will be biasing or in some manner altering the meaning found within the original content being summarized. The danger is that this kind of summary is no longer seen as fair and balanced, and instead is based on the perceptions and opinions of the summarizer.
In a sense, if you want an unadorned straightforward summary, you are better off with an extractive summary. If you want an adorned or embellished summary, that goes beyond the words presented in the original source, you might seek an abstractive summary. The thing is, the abstractive summary might no longer be an apt reflection of the original source. That is also how the summary might become longer than the original since the embellishments can possibly increase the size of things and you could find yourself looking at a summary that is much longer than the source used for the summary.
A quick lighthearted recap of the aforementioned characteristics of summaries might be useful here.
Here it is. I am reminded of the somewhat funny anecdote about a student in school who is trying to write an essay that summarizes a topic such as the life and times of Abraham Lincoln. Envision that the student hasn’t read the article assigned about the history of Lincoln. The student is in a panic because they are supposed to write a summary based on the reading and they haven’t read the piece at all.
What does the student do?
They wing it.
Their essay starts by saying that Abraham Lincoln was an important person in history. Lincoln did great things, the essay says. People looked up to Lincoln. The essay raves about Lincoln being a super-duper. The student looks at their essay so far and realizes that the teacher is bound to detect that something is amiss. The essay doesn’t have a whit of substance or entities that are particularly notable regarding Lincoln, such as currently lacking to mention Lincoln as being a U.S. president or anything about the Civil War, and so on.
The student will get nailed on the sparseness of the essay. It is abundantly sparse. They had better increase the density or they risk getting an F grade on the essay. So, the student adds the date of birth of Lincoln, the date of Lincoln’s assassination, and a few other facts and figures that are readily found in the original assigned article. The student is refining the summary. The first version is being chained across to a more elaborated version. The essay is increasing in density.
I suppose we might grumble that the student is doing this in the worst of ways. The presumed purpose was for the student to study the article and learn something about Lincoln. After doing so, the essay or summary was only a means of showcasing what the student learned. Instead, the student is doing a somewhat rote method of merely aiming to produce an essay to get the assignment done.
Well, I won’t delve any further into the plight of this stressed-out student and whether they were right or wrong in their endeavors. Some might be sympathetic to the plight of the student, perhaps having been in similar late-night homework-neglected (overworked?) dire circumstances when in school. Others might be upset that the student presumably is either lazy or not taking seriously the valued nature of the assignment.
We shall move on.
A summary of the key ideas introduced by my elaboration about summary-making is this:
Here’s what we’ll cover next.
I’ve ably prepared you for being able to leverage these summary-making precepts when using generative AI, especially by invoking a Chain-of-Density prompt engineering approach. I will explain what Chain-of-Density consists of. Examples will be shown.
Before I dive into my in-depth exploration of this vital topic, let’s make sure we are all on the same page when it comes to the foundations of prompt engineering and generative AI. Doing so will put us all on an even keel.
Prompt Engineering Is A Cornerstone For Generative AI
As a quick backgrounder, prompt engineering or also referred to as prompt design is a rapidly evolving realm and is vital to effectively and efficiently using generative AI or the use of large language models (LLMs). Anyone using generative AI such as the widely and wildly popular ChatGPT by AI maker OpenAI, or akin AI such as GPT-4 (OpenAI), Bard (Google), Claude 2 (Anthropic), etc. ought to be paying close attention to the latest innovations for crafting viable and pragmatic prompts.
For those of you interested in prompt engineering or prompt design, I’ve been doing an ongoing series of insightful explorations on the latest in this expanding and evolving realm, including this coverage:
Anyone stridently interested in prompt engineering and improving their results when using generative AI ought to be familiar with those notable techniques.
Moving on, here’s a bold statement that pretty much has become a veritable golden rule these days:
If you provide a prompt that is poorly composed, the odds are that the generative AI will wander all over the map and you won’t get anything demonstrative related to your inquiry. Being demonstrably specific can be advantageous, but even that can confound or otherwise fail to get you the results you are seeking. A wide variety of cheat sheets and training courses for suitable ways to compose and utilize prompts has been rapidly entering the marketplace to try and help people leverage generative AI soundly. In addition, add-ons to generative AI have been devised to aid you when trying to come up with prudent prompts, see my coverage at the link here.
AI Ethics and AI Law also stridently enter into the prompt engineering domain. For example, whatever prompt you opt to compose can directly or inadvertently elicit or foster the potential of generative AI to produce essays and interactions that imbue untoward biases, errors, falsehoods, glitches, and even so-called AI hallucinations (I do not favor the catchphrase of AI hallucinations, though it has admittedly tremendous stickiness in the media; here’s my take on AI hallucinations at the link here).
There is also a marked chance that we will ultimately see lawmakers come to the fore on these matters, possibly devising and putting in place new laws or regulations to try and scope and curtail misuses of generative AI. Regarding prompt engineering, there are likely going to be heated debates over putting boundaries around the kinds of prompts you can use. This might include requiring AI makers to filter and prevent certain presumed inappropriate or unsuitable prompts, a cringe-worthy issue for some that borders on free speech considerations. For my ongoing coverage of these types of AI Ethics and AI Law issues, see the link here and the link here, just to name a few.
With the above as an overarching perspective, we are ready to jump into today’s discussion.
Using Generative AI Prompting To Get Summaries Generated
Making summaries in generative AI is easy-peasy.
You can use a prompt as simple as this to do so:
At that juncture, you would either directly include the article in the same prompt, or you could hit a return and the generative AI would likely say something like it is ready to summarize the article and please go ahead and provide the article in your next prompt.
Voila, shortly thereafter you will have a nice new gleaning summary that has been generated by the AI app.
I must caution you though that as I have repeatedly noted in my training classes about generative AI and prompt engineering, the results coming out of generative AI are like a box of chocolates. You never know what you might get.
A summary generated by the AI could be amazing and spot-on. That is the happy face scenario. The summary might be atrocious and barely a summary of any value. That is the sad face scenario. The good news is that most of the time the odds are that the summary will be relatively well done. Summarizing is an intrinsic capability of most generative AI apps and exploits the impressive pattern-matching computational facilities therein.
If you don’t like the summary or believe it could use some additional punching up, you can merely say so in your subsequent prompts. You tell the AI that perhaps the summary is not long enough. Or maybe the summary is overly long. The summary might be bereft of substance from the source of the summary. And so on.
The AI app won’t complain. No whining will usually occur. The generative AI will comply and redo the summary. This can occur as much as you like. Unlike when dealing with a human who might have written a summary, you can endlessly prod and poke about revising the summary when using generative AI.
How can you judge a summary?
A common and obvious approach is to read the source material and compare it to the summary. You would want to see that whatever you consider to be significant was carried over into the summary. Another encompassed facet would be whether the carryover was faithful or opted to embellish or change up the meaning of the source.
One confusion that sometimes gets in the way of assessing a summary is the matter of summarization versus simplification. Do not unduly equate those two. A summary doesn’t necessarily have to be a simplification. It could be that whatever complexity existed in the source is going to also come across in the summary. Simplification is a type of transformation involving simplifying one thing to be more readily accessible or understandable. A summary doesn’t have to be a simplification.
If you want the summary to be simplified, you will usually need to ask for that to be undertaken. Remember though that I said that the generative AI is like a box of chocolates, such that the AI might do a simplification as part of the summarization. You might not have asked for a simplification outright. Nonetheless, the AI opted to go that path.
All right, you probably already realized that generative AI by default has the capability to generate summaries and usually does a reasonably sound job in doing so. There is a chance that you might need to finesse things and do a series of prompts to guide the AI toward a summary that meets your needs.
Seems like that is the end of the story.
But you would be mistaken in believing so.
We can try to ramp up the summary capabilities of generative AI. Let’s take the usual ad hoc means of doing so and turn it into something systematic and reusable. A devoted technique would be greatly advantageous for your prompt engineering skillset and can improve the odds of getting consistently buffo summaries.
In a recent research paper entitled “From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting” by Griffin Adams, Alexander R. Fabbri, Faisal Ladhak, Eric Lehman, and Noémie Elhadad, posted online on September 8, 2023, the researchers present a new technique they have coined as Chain-of-Density (CoD).
Here are some salient excerpts from the research paper:
The researchers opted to use GPT-4. That being said, just about any generative AI app can similarly be used. In a moment, I will be showing you examples based on using ChatGPT. The fundamentals of the technique remain about the same.
In the research paper, the research structure consisted of summarizing articles of a general nature that were culled from a news database:
A few points are worth noting about the above.
First, they reviewed the generated summaries to try and assess whether the technique derived better summaries than a conventional vanilla prompt (similar to the prompt I showed you in the prior subsection herein), and whether they were as good as human-derived summaries. They conclude that indeed the CoD technique performed well.
As an aside, make sure to read the details of the study if you want to see how they did those reviews. Any experimental setting can impact how outcomes arise and if you are thinking of doing akin experiments you might find it useful to explore what approach this research opted to undertake.
Second, they ran into the usual conundrum about summarization whereby there are tradeoffs between informational compactness and readability. In short, you might get yourself a wicked summary but it is so jampacked that humans reading the summary are left with a foul taste in their mouths. A summary can lose steam if it is at the extremes of density.
Third, for those of you who want to do similar research, the researchers kindly have put together a set of annotated CoD summaries and unannotated summaries for you to freely make use of. One of the biggest hurdles for doing generative AI research involves collecting data for your experimentation. Having a ready-made dataset can speed up the research effort, reduce costs, and allow for replicated studies.
How does the Chain-of-Density technique work?
They used a chain or series of iterative summaries that are launched by a prompt and the generative AI is told to incrementally or iteratively improve or make denser each summary based on stipulations given by the prompt. I trust that sounds familiar as per my earlier discussion on such matters.
I will show the prompt in a second.
I know you are eager to see it.
Just first a quick overview by the researchers in their paper about the density aspects:
And, as stated in the paper, they went the route of starting with a sparse summary and having it iteratively infused with more and more entities:
Note that they opted to keep the length of the summary as a static size. This in a sense forces the AI app to stay within those stipulated bounds. If you want to use a three-pound bag, you make it so, and the AI app is not to try and be tricky by sneakily increasing the size. I will revisit this assumption later on.
We shall now take a look at the prompt they used.
I applaud researchers who show their prompts. If a study doesn’t reveal the prompts used, we are left in the dark. We have no means to judiciously weigh the results that the experimenters produced. Furthermore, the lack of showing prompts leaves practitioners in the lurch since they have nothing tangible to try and incorporate into their prompt engineering repertoire.
Their utterly generic prompt that was used as a basis for comparison with this:
As is typical, the prompt asks that a summary be produced, emphasizing that it should be very short. Be forewarned that saying something vague such as being a very short summary is going to get you all kinds of wild variations in size. In this case, they immediately stated that the size of the summary should not exceed 70 words. That’s what they wanted in this particular setting.
Next, their Chain-of-Density prompt consisted of two major parts. One part describes the iterative chaining process. The second part describes the guidelines that they want the generative AI to abide by.
Here is the first part of their CoD prompt:
I hope you can discern how this prompt tells the AI app to do a series of iterations when producing a summary. In this instance, they have said that they want two steps to be undertaken, doing so each time for a total of five iterations. The two steps consist of identifying some entities within the article that are not yet in the summary. They asked to find one to three such entities each time. The second step involves putting those “missing entities” into the summary.
To clarify what kinds of entities they want the AI to find, they refer to the entities as missing entities meaning that they are currently entities missing from the iteratively produced summary. Those would be entities that are missing so far and that must meet the requirements of being relevant, specific, novel, faithful, and can be anywhere in the source being summarized.
They also provide these guidelines as part of the prompt given to the AI app:
Note that the guidelines stipulate that the initial summary should be about 80 words in size and consist of four to five sentences. This is how they did their experiment. You can of course make use of other parameters as suitable for a given summary situation at hand.
The guidelines also direct the AI app to be careful and make every word count. The AI is told to aim for a high density. This is a prudent indication.
An especially restrictive guideline is that entities cannot be dropped out of subsequent summaries during the iterative process. You can say that this is good because it makes sure that things don’t disappear throughout the iterations. You can also say that this might be a bit undermining if there is a chance that better entities could be fit into the summary that might now have less preferable entities as a carryover from a prior iteration (you might be willing to drop out lesser entities for greater entities, rather than being forced into a legacy carryover based on earlier guesses).
This is a key tradeoff of how to conduct summarization.
Devising Your Prompting Template For Summarization
I made use of the CoD style prompt via a series of ad hoc experiments using ChatGPT for doing akin summarizations. They worked out pretty well. I’ll show you some in the next section.
By the way, I didn’t use precisely the same wording and decided to play with things to see what impact different wording might have. There is a lot of flexibility in how you might word such instructions. Also, keep in mind that every generative AI app might react differently to a given prompt. Even the same generative AI app can react differently since there is a probabilistic and statistical variation embedded into the computational pattern-matching mechanisms.
Here are ten crucial parameters that I came up with and can be chosen by you as desired when undertaking this style of prompting strategy:
Those last three of my above-listed parameters have to do with telling the generative AI to showcase each iterated summary, explain the basis, and do a self-rating of each iterated summary.
I explicitly asked for those details so that I could gauge the impact of playing with the various other parameters. On a daily basis, I doubt you would want all of that added verbiage. It would be seemingly overly verbose. I did find the added indications quite telling and valuable when first determining how to best use this approach.
You might want to try the same.
Setting The Stage For A Close Look At Chain-of-Density
I thought you might like to see some examples associated with using the Chain-of-Density approach.
Am I right?
I hope so.
Furthermore, I decided that some plainly simple legal-domain examples might be interesting, informative, and viable to use when exploring the Chain-of-Density prompting technique.
Before we get into the weeds, please realize that the research study that I’ve been discussing involved the summarization of general news articles. That is a suitable selection since generative AI is principally data-trained across the board and not especially honed to a specific domain. You would be safest to stick with general topics and not try to use conventional generative AI for domain-specific topics unless you’ve done something of a customized nature to try and get the AI up-to-speed in that desired domain.
I have previously shown my honing of generative AI to the legal domain, see the link here and the link here, just to name a few such analyses regarding AI applied to the legal realm. Let me say this in the loudest and clearest of terms — you should be extremely cautious when trying to apply generative AI to specific domains for which the generative AI has not had additional dedicated data training accordingly. Two lawyers found out about this the hard way when using conventional ChatGPT for legal tasks and they got into quite hot water for doing so (see my analysis at the link here).
Anyway, here’s what I did for this CoD exploration.
I wanted to find some data that was already readily available and that had a legalese element to it. Right away, I thought of the now classic paper from long ago (the year 2019, which in AI years is a near lifetime!), entitled “Plain English Summarization of Contracts” by Laura Manor and Junyi Jessy Li, Proceedings of the Natural Legal Language Processing Workshop 2019, Association for Computational Linguistics.
They examined licensing agreements that you sign up for or that you automatically accept whenever you visit various websites or play online games. I would venture that almost no one actually reads those licensing agreements. You ought to. The problem is that you are agreeing to things that you don’t even know what you’ve agreed to do, or not do. You are spinning the roulette wheel that there isn’t something in the licensing that is going to get you into trouble. Sheepishly, shamefully, we all do it. We are all at risk.
Maybe there is a light at the end of that tunnel.
Suppose that the legalese could be summarized in a manner that would be easier for you to comprehend. The idea is that people might pay attention to licensing agreements and be more circumspect if the often voluminous and legally imposing narratives were summarized and perhaps translated into plain language.
The study by these authors sought to craft a dataset of licensing agreements along with human-derived summaries. Researchers who wanted to subsequently test out generative AI or any kind of AI that might do summaries could readily make use of the dataset.
That’s me!
Here is what the authors indicated they did (my selected excerpts):
I tried out several snippets of licensing agreements or terms of service, along with the human-derived best summary included. I aimed to use ChatGPT to do a Chain-of-Density summarization, playing with variants of the prompting technique, and do so by summarizing the licensing agreement snippets.
A basis for comparison to what ChatGPT had to say could be made to the human-derived best summary in the dataset. Plus, I used my own noggin to do the comparisons too.
I only have space here in today’s column to cover one such example. I am working on a potential follow-up encompassing a more detailed exposition, so keep your eyes out for that later coverage. Let’s focus here on one notably intriguing and useful example.
This is an original snippet of a licensing agreement as available by the authors:
Mull that over.
Here is the human-derived summary that was obtained:
The human-derived summary is certainly short and seemingly in plain language. But, it is also rather wanting, if you give it a close look.
I’ll explain some key problems with it.
You might be tempted to proclaim that the summary is admirably short, coming in at around 20 words in size versus the 4x larger sized 80+ words of the source, and thus there is only so much room to squeeze in things if you want to be succinct. However, a summary is going to be problematic if it omits crucial elements (entities) or potentially misstates or misinterprets what is indicated in the original (extractive versus abstractive). This is especially so if we could get those points included or straightened out, and if doing so could still be done rather succinctly (either in the same 20 words or nearly in that same range).
Here are some of the particularly worrying concerns about this particular human-derived summary:
I picked this example because it has some prominent lessons to be learned.
First, just because a human does a summary doesn’t mean that the summary will be any good or perhaps not the best that the summary could potentially be. I mention this due to the likely retort by some that you should always use a human to devise a summary rather than AI, believing that the human will always do a better job. That is not necessarily the case.
Second, it might be prudent to consider using generative AI to do a summary and then have a human refine the summary. The advantage is that the human is potentially going to expend much less effort than having to do a summary from scratch. That won’t always be the case because it could be that the AI-devised summary is totally off-base and the human will be doing more work than if they had started with a blank slate. I would dare say that a reasonably good generative AI is likely to produce a reasonably good summary and thus not require a human refiner to overwork the result.
Third, in a domain such as the law, trying to summarize legalese is fraught with dangers. You can readily omit something of a legally important effect. You can misstate something. The person relying on the summary is taking a leap of faith that the summary is complete and correct. The famous line about consulting with an attorney is indeed the sensible thing to do whenever a layperson is trying to figure out a legal matter, even in the case of licensing agreements.
You might find of interest that OpenAI squarely warns you to not use ChatGPT or GPT-4 for seeking legal advice and that you should consult with a human attorney, see my coverage on this aspect at the link here. A rather zany fad that has somewhat appeared regarding ChatGPT and other AI apps consists of people who use generative AI to produce legal-looking documents to try and intimidate others into thinking that an attorney has been consulted, see my discussion at the link here. Ugh.
Returning to the CoD prompt technique, I used the above licensing passage as a means of data exploring the Chain-of-Density prompting approach. Let’s see what we can get generative AI to do on this. Can we get the AI to do a better job? Or, will the AI fall down on this summarization task and do worse than the human-derived summary?
Place your bets and get yourself ready for a fun time at the roulette table.
First, I asked ChatGPT to summarize the licensing passage. Keep in mind that I did so with a purely vanilla prompt that had no specific instructions or guidelines, and here’s what I got:
I’d say this was a dud or at least a letdown of a summary.
The size in words is nearly the same as the source material. To some degree, you could also argue that the summary proffers a somewhat simplification of the source, though I did not explicitly ask for a simplification. All in all, we can applaud the AI app for having complied with the request, though the summary is not especially fruitful.
I should bring up an allied factor. In this case, the source passage is only about 80 words in size. When using much larger narratives, the summary of a straight-ahead nature might be more useful. For example, I tried a different licensing snippet of about 500 words and got a summary of about 100 words. I found that summary useful since it covered the material in one-fifth less space.
The size of the source matters heavily when considering summarizations. I will also add that you might not be able to summarize rather lengthy source material. Most of the generative AI apps have size limitations known as context length constraints. You cannot just feed an entire encyclopedia into generative AI and ask for a summary. The length goes over what is currently permitted. There are tricks to cope with this, and you can expect that ongoing advancements to generative AI will increase the size limits, see my coverage at the link here.
Getting back to the matter at hand, I told ChatGPT to do a tighter job on the summary of the licensing agreement passage. I didn’t crow about the length. I merely asked to have the summary made tighter in a shorter length (no specification of the size that I wanted).
Here’s what I got:
I kind of liked this summary that ChatGPT derived. The length is about 40 words, so roughly half the size of the source. The summary includes the things that I mentioned earlier that were missing in the human-derived summary. I would rate this summary better than the human-derived one.
Can we do better?
Let’s try the Chain-of-Density technique.
I decided to make things “fair” by putting the size limit of the summaries at 40 words, ergo matching the above version that was derived by ChatGPT without any indication by me regarding the size. This will allow an apples-to-apples comparison. I also started a new conversation so that the prior effort to do the summary of the passage would not get mingled into the CoD directives.
My prompt asked too that ChatGPT should show each of the iterated summaries. Here then are the five iterated summaries:
The first summary is labeled as Summary 1 and would be the summary that is the first shot at doing a CoD series of summaries. You’ll notice that this initial summary contains the lingo of “this article discusses” which was part of the templated instructions for the CoD prompt.
Further, note that the first summary has omitted the aspects about the Trainer Guidelines and the Privacy Policy. This is what the human-derived summary did too. In the case of the generative AI, it was essentially following orders and had been limited to just one to three entities for the first round.
The second summary, labeled as Summary 2, does include those entities. All in all, this second summary seems pretty good.
The third summary almost seemed to reach for a bit of tomfoolery. The flavorful question about using the services was not part of the source and seems to be an attempt to craft a more engaging summary. The fourth summary goes back to the roots and seems akin to the second summary, though the wording doesn’t flow as readily. Finally, the fifth summary is okay, but I still personally prefer the second summary.
You might find of interest that I had ChatGPT do a self-rating of the summaries that the AI app produced, and here’s what the response was:
I agree that the first summary deserved a 7 out of 10 (this was a rating scale concocted by ChatGPT, which I had left open-ended for ChatGPT to ascertain). The second summary got an 8, though I would suggest it is a 9 if the others that follow are also 9s. Maybe we can agree to disagree on this, me and ChatGPT.
I do give ChatGPT credit for not declaring that the final summary was a 10. This could happen in the sense that just as humans might overinflate their work, we can expect that ChatGPT might do the same. This is not due to sentience. It is due to pattern-matching on vast amounts of online text for which humans do that kind of puffery all the time.
A few quick lessons from this are that sometimes a summary can be in the eye of the beholder. There is a point at which a summary rearranges items but does not particularly enhance the summary itself. I believe this might have occurred with my request in this case.
Another lesson is that since the source had so few entities to play with, there is only so much that can be done to derive a summary. In my longer experiments that consisted of hundreds and thousands of words, there is much more to be dealt with. This, in turn, radically impacts the nature and quality of the summary produced.
My example with the brief licensing agreement passage is quite short in size and sparse in the number of entities contained within. I would like to show you much longer examples, but the size limitation for today’s column is already at the allowed range. As mentioned earlier, if readers express interest, I can do a follow-up showcasing larger examples that are more robust.
I am particularly pursuing the CoD as an instrumental approach in the legal domain. This appears to have especially worthwhile benefits for legal professionals. Others have noticed this too.
Esteemed industry and scholarly colleague, Dazza Greenwood, founder of law.MIT.edu (research) and CIVICS.com (consultancy), recently posted online this notable insight:
I wholeheartedly concur.
Conclusion
Time now to do a recap and provide final comments. I assuredly recommend that you include Chain-of-Density in your prompt engineering skillset. That’s the bottom line of the veracity of the technique.
I give it two thumbs up.
Play around with the capability. Be ready to use it when the situation seems suitable to do so. Don’t wait until the last minute and struggle with this latest prompting technique at the time of need. Do your homework beforehand (no last-minute essay summaries about Abraham Lincoln).
In one sense, the CoD is an elephant gun. You should use this technique on larger-sized source materials and when the volume of entities is relatively high. For smaller-sized source materials, you can just use the plain vanilla summary, followed by a few additional clarification or refinement prompts. No need to shoot an ant with an elephant gun. The same goes for a larger source that has very few entities.
A clever way to use Chain-of-Density would be to improve a prior summary that someone handed to you. There is a chance that the iterative process will make the summary tighter and better. A downside will be that if you don’t have the source that was used for that summary, you are gradually reaching a point of diminishing returns. Also, and perhaps obviously, if you do have the source, you probably don’t need to use the other summary anyway, unless you want to do a comparison and possibly have that aid the summarization process.
One notable concern that you should always have at the top of your mind entails whether the generative AI might encounter an AI hallucination, error, falsehood, bias, glitch, or other malady when trying to generate a summary. The chances of this multi-step process possibly treading into an AI landmine are heightened due to the length of the process, though this is tempered by the seeming fact that when you get generative AI to do chains, such as Chain-of-Thought (see my analysis of the link here), this often seems to keep away the ghosts and goblins. A tradeoff might be occurring internally as the pattern-matching in a mathematical manner becomes more methodical yet also is undertaking more steps than usual.
A final remark for now will give you some reasoned reflective thought.
Here’s the deal. I have started using a similar directed iterative approach in other situations of prompting settings. My claim is that you can use the overarching conception of doing self-improvement iterations in circumstances beyond those of a summarization task. We already know that Chain-of-Thought is handy, along with Skeleton-of-Thought, and other stepwise directives. The Chain-of-Density is similar, though adds some niceties about how to incrementally make improvements.
I’m a mixer and matcher when it comes to prompt engineering.
Be on the lookout for an upcoming column that brings together a slew of other well-known prompting strategies. You either will admire their synergy and beauty in unison, or some might recoil at the Frankenstein appearance (well, don’t let the looks fool you, there are a lot of combinatorial benefits to be had).
That’s about the end of today’s story.
So, in a summary of a size of about 25 words in length, the gist is that Chain-of-Density is great for getting generative AI to do impressive summarization, possibly applicable in other settings too, so use the technique wisely and with aplomb.
Enough said.