Two Heads Are Better Than One: A Multi-Agent System Has the Potential to Improve Scientific Idea Generation

1Shanghai Artificial Intelligence Laboratory
#Co-first Authors

Corresponding Authors

⭐ Highlights

  • To the best of our knowledge, we propose the first multi-agent system for conducting scientific collaborations in an end-to-end pipeline from team organization to novel scientific idea generation. Furthermore, the real data is utilized for role-play and the objective evaluation of final outputs.
  • We conduct extensive evaluations to investigate VirSci in terms of the team settings and the novelty of generated scientific ideas. The results demonstrate that multi-agent collaboration can improve the quality of the outcomes, surpassing the SOTA single-agent method.
  • The simulation results align with the important findings in Science of Science, such as fresh teams tend to create more innovative research, showcasing the potential of VirSci as a powerful tool for future research in this field.

👀 Abstract

The rapid advancement of scientific progress requires innovative tools that can accelerate discovery. While recent AI methods, particularly large language models (LLMs), have shown promise in tasks such as hypothesis generation and experimental design, they fall short in replicating the collaborative nature of real-world scientific practices, where diverse teams of experts work together to tackle complex problems. To address the limitation, we propose an LLM-based multi-agent system, i.e., Virtual Scientists (VirSci), designed to mimic the teamwork inherent in scientific research. VirSci organizes a team of agents to collaboratively generate, evaluate, and refine research ideas. Through comprehensive experiments, we demonstrate that this multi-agent approach outperforms the state-of-the-art method in producing novel and impactful scientific ideas, showing potential in aligning with key insights in the Science of Science field. Our findings suggest that integrating collaborative agents can lead to more innovative scientific outputs, offering a robust system for autonomous scientific discovery.

⚙️ Pipeline


fail
Figure 2. Key components of the proposed system. The left section illustrates the collaborator selection process, where the team leader forms a research team. The middle section highlights the discussion routine, a fundamental part of every step in the system, where the team engages in collaborative dialogue to progress through tasks. The right section depicts the architecture of the author knowledge bank and paper database, which provide critical information used throughout the collaboration process.

📦 Experiments


fail
Table 1. Comparisons with AI Scientist. Results show that our multi-agent system outperforms the AI Scientist across all metrics, with GPT-4o achieving the highest performance.
1 GPT-4o API is “gpt-4o-2024-08-06”.

fail
Figure 3. Effects of team size and discussion turns on novelty scores. Peak innovation occurs with 8 members and 5 turns, while larger teams or excessive turns hinder creativity. “Inference Cost” is the product of team size and turns.

fail
Figure 4. Effects of team freshness on novelty. The balance of new and returning collaborators in the team has a notable impact on novelty, with 50% freshness yielding the highest historical dissimilarity and overall novelty, particularly in larger teams (size 8).

fail
Figure 5. Effects of team diversity on novelty. The optimal diversity level appears to be 50%, which maximizes novelty and impact across team sizes.

🎞️ Video Presentation

📌 Citation


@article{su2024two,
  title={Two Heads Are Better Than One: A Multi-Agent System Has the Potential to Improve Scientific Idea Generation},
  author={Su, Haoyang and Chen, Renqi and Tang, Shixiang and Zheng, Xinzhe and Li, Jingzhe and Yin, Zhenfei and Ouyang, Wanli and Dong, Nanqing},
  journal={arXiv preprint arXiv:2410.09403},
  year={2024}
}