


From uninvited results at the top of your search engine queries to offering to write your emails and helping students do homework, generative A.I. is quickly becoming part of daily life as tech giants race to develop the most advanced models and attract users.
All those prompts come with an environmental cost: A report last year from the Energy Department found A.I. could help increase the portion of the nation’s electricity supply consumed by data centers from 4.4 percent to 12 percent by 2028. To meet this demand, some power plants are expected to burn more coal and natural gas.
And some chatbots are linked to more greenhouse gas emissions than others. A study published Thursday in the journal Frontiers in Communication analyzed different generative A.I. chatbots’ capabilities and the planet-warming emissions generated from running them. Researchers found that chatbots with bigger “brains” used exponentially more energy and also answered questions more accurately — up until a point.
“We don’t always need the biggest, most heavily trained model, to answer simple questions. Smaller models are also capable of doing specific things well,” said Maximilian Dauner, a Ph.D. student at the Munich University of Applied Sciences and lead author of the paper. “The goal should be to pick the right model for the right task.”
The study evaluated 14 large language models, a common form of generative A.I. often referred to by the acronym LLMs, by asking each a set of 500 multiple choice and 500 free response questions across five different subjects. Mr. Dauner then measured the energy used to run each model and converted the results into carbon dioxide equivalents based on global averages.In most of the models tested, questions in logic-based subjects, like abstract algebra, produced the longest answers — which likely means they used more energy to generate compared with fact-based subjects, like history, Mr. Dauner said.
A.I. chatbots that show their step-by-step reasoning while responding tend to use far more energy per question than chatbots that don’t. The five reasoning models tested in the study did not answer questions much more accurately than the nine other studied models. The model that emitted the most, DeepSeek-R1, offered answers of comparable accuracy to those that generated a fourth of the amount of emissions.