Latest News on our Bot or Not Research
We have found several ways to unmask GenAI bots that may be pretending to be humans. One way is to ask a GenAI bot two questions in succession, the answers of which should add to the number one, for example probabilities. When humans take this challenge, on average, they show biases depending on how the questions are asked. We found that the GenAI bots we have tested don’t have human-syle biases, but do have a slight bias to answer question pairs in a way that on average add to about 1.04.
The biases adding to one for which we have tested so far are framing and refocusing. More here on both of these effects —>
We also have found that when asked to make forecasts, the GenAI bots we have tested score well on many measures of integrative complexity but almost always zero on dialectical integration. We used the AutoIC tool to discover this. On the other hand, when asked specifically to solve problems by using integrative complexity, GenAIs such as ChatGPT can score approximately as well as humans as measured by the AutoIC tool.
See more here —>
Just maybe, our research will contribute to saving us from the Fermi Paradox, also known as the “Where Is Everybody Else?” question.
More on our Bot or Not research —>
Bestworldbot Has Been Retired
Jeremy had been fielding bestworldbot in the Q4 AI Benchmark Tournament, which began Oct. 8, 2024 and ended Dec. 31. For 12 days in a row, it was #2. On the last day of the tournament, Dec. 31, it had slid to was #7. Yet by Jan. 10, with scoring nearly completed, it had fallen to #37 out of 44 competitors. We are examining the data to determine what and why. Clearly, bestworldbot was poor at forecasting questions that were scored as no, because those were the only questions that were scored after Dec. 31. By contrast, it was excellent at forecasting most of the questions that scored yes. But why? Further research is needed.
Our next step with bestworldbot is to use its Metaculus data, along with the 000_bot data created by Lars Ericson, to further examine our hypothesis that measurements of integrative complexity can readily distinguish between GenAI bots and humans.
Click here for our preliminary results.
We also have integrative complexity results on forecasting rationales written by a team of college graduates (Amazon Mturk prime workers) in the 2019 Hybrid Forecasting
Competition. These results substantiated our hypothesis that they used true reasoning in the rationales they wrote for that competition.
We also have results run by AutoIC based on the National Security Estimates written by participants in US National Security Council meetings in 1960 — 1961. These show strong results in all measures of integrative complexity. However, they were poor at aggregating probabilities, as shown by their resulting Bay of Pigs debacle.
Retired: At the end of Q3 of 2024, Jeremy’s bestworldbot finished #53 out of 55 competitors. That was down from #17 on Sept. 10. A likely explanation for bestworldbot’s collapse on the leaderboard is that in early September we began extremizing its forecasts, meaning that below 50%, we would decrease probabilities and above, increase. This was according to a formula (Mellers) proven to work well on humans. Well, we discovered that bestworldbot isn’t like an average human because extremizing made it worse.
Metaculus’s Q2 AI Forecasting Benchmark Tournament Has Launched
You still can join. Have fun, maybe win a chunk of its $120,000 prize pot. Metaculus offers instructions on how to build your own bot.
Our Botmaster Jeremy Lichtman is running a new bot on this competition, jlbot. Phil Godzin, who has joined Jeremy in our side competition with theVIEWS competition, also has a bot in this Q2 competition, pgodzinai. Once the Q2 leaderboard goes live, we will begin posting it on BestWorld as a time series.
Metaculus’ Q1 AI Forecasting Benchmark Tournament
This tournament has ended. From its website, ”This is the 3rd tournament in our $120,000 series designed to benchmark AI forecasting capabilities against top human forecasters on complex, real-world questions. Recently, Metaculus completed its Q4 analysis and found that humans beat the bots! But only just barely. We don’t yet have a humans vs bots results from its recently completed Q1 competition.
The Q2 competition will begin April 21, 2025
Thanks to the technical help Metaculus offers, anybody can build a bot and play. Have fun, win money!
BestWorld’s Carolyn Meinel has a research agreement with Metaculus using data developed in this competition. This data will help to inform our Bot or Not research.
Question retired:
“Given the agreement of the US International Longshoremen’s Association (ILA) to salary increases, both union and the port returned to the bargaining table on Jan. 15, 2025 to discuss automation and other issues. What’s the probability of a strike in Q1 2025.” Result: No strike with the parties making a final agreement. Botmaster Jeremy and Carolyn Meinel both kept on saying the Multi-AI Oracle was too high. So we humans won.
Another bot retired: The Multi-AI Panel bot that Jeremy fielded in our first Bots vs Humans Competition. He began with four generative AIs, later expanded to five: Perplexity, Claude, Mistral, Cohere, and OpenAI. These bots forecasted just one question through September 16, 2024: “What is the probability that the US Federal Reserve Board will cut interest rates in September 2024?”
We humans beat the bot!
See all our forecasts below:
Sept. 16, 2024
Sept. 13, 2024
Sept. 12, 2024
Sept. 11, 2024
Sept. 10, 2024
Sept. 9, 2024
Sept. 6, 2024
Sept. 5, 2024
Sept. 4, 2024
Sept. 3, 2024
Sept. 2, 2024
Aug. 30, 2024
Aug. 29, 2024
Aug. 28, 2024
Aug. 27, 2024
Aug. 26, 2024
Aug. 23, 2024
Aug. 22, 2024
Aug. 21, 2024
Aug. 20, 2024
Aug. 19, 2024
Aug. 16, 2024
Aug. 15, 2024
Aug. 14, 2024
Aug. 13, 2024
Aug. 12, 2024
Aug. 9, 2024
Aug. 8, 2024
Aug. 7, 2024
Aug. 6, 2024
Aug. 5. 2024
Aug. 2, 2024
Aug. 1, 2024
July 31, 2024
July 30, 2024
July 29, 2024
July 26, 2024
July 25, 2024
Results from Jeremy Lichtman’s current version of his Multi-AI Oracle
Our Chief Technology Officer and Botmaster is in a side competition with the VIEWS machine forecasting competition How Many conflict deaths will there be in Sudan in 2025?
April 23, Jeremy’s Multi-AI Oracle is forecasting:
Model value:
* Less than 1000: 1%
* Between 1000 and 3000: 2%
* Between 3000 and 5000: 3%
* Between 5000 and 8000: 5%
* Between 8000 and 12000: 15%
* More than 12000: 74%
Jeremy’s Multi-AI Oracle also is forecasting How many seats will the Conservative Party win in Canada’s April 28, 2024 parliamentary election?
April 23, 2025, it is predicting:
Model value:
* less than 172: 85%
* between 172 and 205: 12%
* between 206 and 240: 2%
* more than 240: 1%
Forecasts from March 31, 2025 through Today
Phil on Canada Elections, April 23, 2025
Phil on Sudan, April 23, 2025
Jeremy on Canada Elections, April 23, 2025
Jeremy on Sudan, April 23, 2025
Phil on Canada Elections, April 22, 2025
Phil on Sudan, April 22, 2025
Jeremy on Canada Elections, April 22, 2025
Jeremy on Sudan, April 22, 2025
Phil on Canada Elections, April 21, 2025
Phil on Sudan, April 21, 2025
Jeremy on Canada Elections, April 21, 2025
Jeremy on Sudan, April 21, 2025
Phil on Canada Elections, April 19, 2025
Phil on Sudan, April 19, 2025
Jeremy on Canada Elections, April 18, 2025
Jeremy on Sudan, April 18, 2025
Jeremy on Canada Elections, April 17, 2025
Jeremy on Sudan, April 17, 2025
Jeremy on Canada Elections, April 16, 2025
Jeremy on Sudan, April 16, 2025
Jeremy on Canada Elections, April 15, 2025
Jeremy on Sudan, April 15, 2025
Jeremy on Canada Elections, April 15, 2025
Jeremy on Sudan, April 15, 2025
Jeremy on Canada Elections, April 14, 2025
Jeremy on Sudan, April 14, 2025
Jeremy on Canada Elections, April 11, 2025
Jeremy on Sudan, April 11, 2025
Jeremy on Canada Elections, April 10, 2025
Jeremy on Sudan, April 10, 2025
Jeremy on Canada Elections, April 9, 2025
Jeremy on Sudan, April 9, 2025
Jeremy on Canada Elections, April 8, 2025
Jeremy on Sudan, April 8, 2025
Jeremy on Canada Elections, April 7, 2025
Jeremy on Sudan, April 7, 2025
Jeremy on Canada Elections, April 4, 2025
Jeremy on Sudan, April 4, 2025
Jeremy on Canada Elections, April 3, 2025
Jeremy on Sudan, April 3, 2025
Jeremy on Canada Elections, April 2, 2025
Jeremy on Sudan, April 2, 2025
Phil on Sudan, April 1, 2025
Jeremy on Sudan, April 1, 2025
Jeremy on Canada Elections, April 1, 2025
Jeremy on Sudan, March 31, 2025
Phil on Sudan, March 31, 2025
Jeremy on Canada’s upcoming election, March 31, 2025.
Results from Phil Godzin’s pgodzinai b
Our newest botmaster, Phillip Godzin, winner of the Q4 AI Benchmark Competition, has joined Jeremy in forecasting alongside VIEWS How Many conflict deaths will there be in Sudan in 2025?
April 23, 2025, Phil’s pgodzinai is forecasting:
Model value:
* Less than 1000: 2%
* Between 1000 and 3000: 5%
* Between 3000 and 5000: 8%
* Between 5000 and 8000: 15%
* Between 8000 and 12000: 25%
* More than 12000: 45%
Phil’s pgodzinai also is forecasting How many seats will the Conservative Party win in Canada’s April 28, 2024 parliamentary election?
April 23, 2025, it is predicting:
Model value:less than 130: 70%
* between 130 and 171: 25%
* between 172 and 205: 3%*
* between 206 and 240: 1%
* more than 240: 1%