Humans vs Bots

Jeremy Lichtman’s Multi-AI Oracle, comprising Claude, Mistral, and OpenAI

How many state-based conflict deaths (total of all civilian and combat deaths, including both Ukrainian and Russian combatants) will be reported by ACLED in Ukraine in July 2025? Here’s what Jeremy’s Multi-AI Oracle predicted July 3, 2025:

Less than 500: 11%
Between 500 and 1,000: 21%
Between 1,000 and 1,500: 42%
Between 1,500 and 2,000: 21%
Greater than 2,000: 5%

Details here —>


Will there be a ceasefire declared between Israel and Hamas in the month of July 2025?
 July 2, 2025, the Multi-AI Oracle predicted just 1%

Details here —>

Will hostilities between Pakistan and India result in at least 100 total uniformed casualties (with at least one death) between 2 June 2025 and 30 September 2025?
 The Multi-AI Oracle forecast of July 1, 2025 is 20%

Details here —>


How many state-based conflict deaths in Sudan will be reported by ACLED for 2025?
Here’s what the Multi-AI Oracle predicted on June 30, 2025:

Less than 1,000: 1%
Between 1,000 and 3,000: 6%
Between 3,000 and 5,000: 25%
Between 5,000 and 8,000: 37%
Between 8,000 and 12,000: 25%
More than 12,000: 6%

Details here —


See all his bot’s past forecasts here—>



At the bottom of this page you can see the Metaculus AI Benchmark Tournament Q2 Leaderboard as of June 30, 2025.  Highlight colors code for previous holders of first place, excepting new competitor jbot, in yellow belonging to Jeremy Lichtman. Phil’s pgodzinai is highlighted in light orange.

Phillip Godzin’s pgodzinai bot, comprising Perplexity, Grok, AskNews Deep Search, GPT, Anthropic, and Gemini

Will hostilities between Pakistan and India result in at least 100 total uniformed casualties (with at least one death) between 2 June 2025 and 30 September 2025? Here’s what Phil Godzin’s pgodzinai bot predicted on July 7:

 38% likelihood of yes

Details here —>

How many state-based conflict deaths in  Sudan will be reported by ACLED in 2025?
July 4, 2025, Phillip Godzin’s pgodzinai predicted:


Less than 1,000: 1%
Between 1,000 and 3,000: 3%
Between 3,000 and 5,000: 16%
Between 5,000 and 8,000: 44%
Between 8,000 and 12,000: 26%
More than 12,000: 10%

Details here —>

How many state-based conflict deaths in Syria will be reported by ACLED for the month of July, 2025? 
Here’s what Phillip Godzin’s pgodzinai predicted July 3, 2025:

Less than 100: 3%
Between 100 and 250: 10%
Between 250 and 500: 25%
Between 500 and 1000: 47%
Greater than 1,000: 15%

Details here —>


Will there be a ceasefire declared between Israel and Hamas in the month of June 2025?
July 2, 2025, pgodzinai bot predicted 42%/

Details here —>

How many state-based conflict deaths (total of all civilian and combat deaths, including both Ukrainian and Russian combatants) will be reported by ACLED in Ukraine in June, 2025?
 
Here’s what Phil’s pgodzinai has predicted:

Less than 500: 5%
Between 500 and 1,000: 25%
Between 1,000 and 1,500: 40%
Between 1,500 and 2,000: 20%
Greater than 2,000: 10%

Details here —>


See all his bot’s past forecasts here —>

Metaculus’s Q2 AI Forecasting Benchmark Tournament Has Launched

You still can join. Have fun, maybe win a chunk of its $40,000 prize pot. Metaculus offers instructions on how to build your own bot. We humans also may compete. It’s fun either way!

Our Botmaster Jeremy Lichtman is running a new bot on this competition, jlbot. Phil Godzin, who has joined Jeremy in our side competition with the VIEWS competition, also has a bot in this Q2 competition, pgodzinai. See the latest leaderboard at the foot of this page.

The Metaculus 2024 Q4 tournament has ended. From its website, ”This is the 3rd tournament in our $120,000 series designed to benchmark AI forecasting capabilities against top human forecasters on complex, real-world questions. Recently, Metaculus completed its Q4 analysis and found that human superforcasters beat the bots! But only just barely. Phil’s pgodzinai was the champion bot! We don’t yet have a humans vs bots results from its recently completed Q1 competition.

The Q2 competition began April 21, 2024
It still isn’t too late to compete. Thanks to the technical helpMetaculus offers, anybody can build a bot and play. Have fun, win money! The history of the Q2 leaderboard at the foot of this page shows that new competitors have been joining every few days. If you join, they will start you in the middle of the leaderboard. 

Question retired:  “Given the agreement of the US International Longshoremen’s Association (ILA) to salary increases, both union and the port returned to the bargaining table on Jan. 15, 2025 to discuss automation and other issues. What’s the probability of a strike in Q1 2025.” Result: No strike with the parties making a final agreement. Botmaster Jeremy and Carolyn Meinel both kept on saying the Multi-AI Oracle was too high. So we humans won.

Another bot retired: The Multi-AI Panel bot that Jeremy fielded in our first Bots vs Humans Competition. He began with four generative AIs, later expanded to five: Perplexity, Claude, Mistral, Cohere, and OpenAI. These bots forecasted just one question through September 16, 2024: “What is the probability that the US Federal Reserve Board will cut interest rates in September 2024?” 

We humans beat the bot! See all our forecasts here —>

More on Bestworldbot’s’s fate:

Our next step with bestworldbot has been using its Metaculus data, along with tall the rest of the Metaculus AI Benchmark Tournament data through the end of June, 2025 to further examine our hypothesis that measurements of integrative complexity can distinguish between GenAI bots and humans.

Click here for our preliminary results
We also have integrative complexity results on forecasting rationales written by a team of college graduates (Amazon Mturk prime workers) in the 2019 Hybrid Forecasting
Competition. These results substantiated our hypothesis that they used true reasoning in the rationales they wrote for that competition.

We also have results run by AutoIC based on the National Security Estimates written by participants in US National Security Council meetings in 1960 — 1961. These show strong results in all measures of integrative complexity. However, they were poor at aggregating probabilities, as shown by their resulting Bay of Pigs debacle.

Retired: At the end of Q3 of 2024, Jeremy’s bestworldbot finished #53 out of 55 competitors. That was down from #17 on Sept. 10 and having been #2 for twelve days. A likely explanation for bestworldbot’s collapse on the leaderboard is that in early September we began extremizing its forecasts, meaning that below 50%, we would decrease probabilities and above, increase. This was according to a formula (Mellers) proven to work well on humans. Well, we discovered that bestworldbot isn’t like an average human because extremizing made it worse.

Author