Can you beat a bot? Join the fun! And, who’s afraid of the Fermi Paradox?

Latest News on our Bot or Not Research

We now are almost certain that we have found an easy way to unmask any GenAI bots that may be pretending to be humans. Jeremy’s Multi-AI Oracle and Lars Warren Ericson’s 000_bot have been key to uncovering how to do this. We recently found that they both score well on many measures of integrative complexity but almost always zero on dialectical integration. We used the AutoIC tool to discover this.  If substantiated, this would be huge. By contrast with the other IC measures, this one strongly detects reasoning, which is key to human-level intelligence. See more here —>

Just maybe, our research will contribute to saving us from the Fermi Paradox, also known as the “Where Is Everybody Else?” question.

More on our Bot or Not research —>

Bestworldbot Has Been Retired

Jeremy had been fielding bestworldbot in the Q4 AI Benchmark Tournament, which began Oct. 8, 2024 and ended Dec. 31. For much of the competition, it was #2. Yet by Jan. 10, with scoring nearly completed, it is #37 out of 44 competitors. Yet on the last day of the tournament, Dec. 31, it was #7. We are examining the data to determine what and why. Clearly, bestworldbot was poor at forecasting questions that were scored as no, because those were the only questions that were scored after Dec. 31. By contrast, it was excellent at forecasting most of the questions that scored yes. But why? Further research needed.

 Our next step with bestworldbot is to use its Metaculus data, along with the 000_bot data created by Lars Ericson, to further examine our hypothesis that measurements of integrative complexity can readily distinguish between GenAI bots and humans.

Click here for our preliminary results.

We also have integrative complexity results on forecasting rationales written by a team of college graduates (Amazon Mturk prime workers) in the 2019 Hybrid Forecasting
Competition. These results substantiated our hypothesis that they used true reasoning in the rationales they wrote for that competition.

We also have results run by AutoIC based on the National Security Estimates written by participants in US National Security Council meetings in 1960 — 1961. These show strong results in all measures of integrative complexity. However, they were poor at aggregating probabilities, as shown by their resulting Bay of Pigs debacle.

Metaculus’ Q1 AI Forecasting Benchmark Tournament

This tournament will launch Jan. 20, 2025. From its website, ”This is the 3rd tournament in our $120,000 series designed to benchmark AI forecasting capabilities against top human forecasters on complex, real-world questions.”

Thanks to the abundant technical help Metaculus offers, anybody can build a bot and play. Have fun, win money!

BestWorld’s Carolyn Meinel has a research agreement with Metaculus using data developed in this competition. This data will inform BestWorld’s Bot or Not research.

Now retired: The Multi-AI Panel bot that Jeremy fielded in our first Bots vs Humans Competition. He began with four generative AIs, later expanded to five: Perplexity, Claude, Mistral, Cohere, and OpenAI. These bots forecasted just one question through September 16, 2024: “What is the probability that the US Federal Reserve Board will cut interest rates in September 2024?” 

We humans beat the bot! See all our forecasts below:

Sept. 16, 2024
Sept. 13, 2024
Sept. 12, 2024
Sept. 11, 2024
Sept. 10, 2024
Sept. 9, 2024
Sept. 6, 2024
Sept. 5, 2024
Sept. 4, 2024
Sept. 3, 2024
Sept. 2, 2024
Aug. 30, 2024
Aug. 29, 2024
Aug. 28, 2024
Aug. 27, 2024
Aug. 26, 2024
Aug. 23, 2024
Aug. 22, 2024
Aug. 21, 2024
Aug. 20, 2024
Aug. 19, 2024
Aug. 16, 2024
Aug. 15, 2024
Aug. 14, 2024
Aug. 13, 2024
Aug. 12, 2024
Aug. 9, 2024
Aug. 8, 2024
Aug. 7, 2024
Aug. 6, 2024
Aug. 5. 2024
Aug. 2, 2024
Aug. 1, 2024
July 31, 2024
July 30, 2024
July 29, 2024
July 26, 2024
July 25, 2024

Our Multi-AI Oracle Experiments

Botmaster Jeremy is running the Multi-AI Oracle forecasting bot on “Given the agreement of the Dock Workers to salary increases, what’s the probability of a strike in Q1 2025?” Today, Jan. 17, 2025, 15%, up from last week’s 10%.

He also runs the Multi-AI Oracle on “What is the probability of the US Steel/Nippon Steel merger being officially announced before January 21, 2025?” Today, Jan. 17, 2025, it is predicting 5%. [Carolyn‘s note: Silly bot! This should be 0%. Only three days remain for President Biden to reverse his decision and approve the merger. Incoming President Trump also is against the merger.]

Jeremy’s Multi-AI Oracle also is forecastingHow many seats will the Conservative Party win in Canada’s next federal parliamentary election?” 
Today, Jan. 17, 2025, it is predicting
* less than 172: 15%
* between 172 and 205: 40%
* between 206 and 240: 30%
* more than 240: 15%

Retired: Jeremy’s older versions of his bestworldbot.

Below is the final Metaculus Q3 AI Benchmark leaderboard, published Oct. 8, 2024. The old bestworldbot finished #53 out of 55 competitors. That was down from #17 on Sept. 10. A likely explanation for bestworldbot’s collapse on the leaderboard is that in early September we began extremizing its forecasts, meaning that below 50%, we would decrease probabilities and above, increase. This was according to a formula (Mellers) proven to work well on humans. Well, we discovered that bestworldbot isn’t like an average human because extremizing made it worse. 

Metaculus Q4 AI Benchmark Competition Leaderboard

Dec. 13, 2024, bestworldbot, coded yellow, ended its streak of ten days at #2, slipping to #7 by Dec. 31, then a plunge to #26 Jan. 1, 2025, and #37 Jan. 13. The other color-coded bots were either currently or in the past #1, including the July – Sept. 30 Q3 competition. 

Leaderboard, Nov. 18, 2024

Author