NeurlPS 2023 Papers
Papers in Education
We recommend using 算法妈妈每日论文解读GPTs in order to have the best research experience. 'Suanfamama' - 'VIZ' - 'IMAGINE' is currently our best practices for paper exploration.
the GPT intro
- How to use this GPT
Paper 01
Suanfamama the paper
You can access the full paper via the following link
"SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models" is a research paper that introduces a new benchmark suite named SciBench. This suite is designed to systematically examine the reasoning capabilities required for complex scientific problem-solving. It includes two datasets of college-level scientific problems:
-
Open Dataset: This dataset comprises 695 problems collected from widely used college textbooks in subjects like mathematics, chemistry, and physics. These problems are open-ended, free-response questions requiring multiple steps of reasoning and complex arithmetic operations such as differentiation and integration.
-
Closed Dataset: This dataset contains problems from undergraduate-level exams in computer science and mathematics, encompassing seven sets of midterm and final examination questions.
The paper evaluates two large language models (LLMs), GPT-3.5 and GPT-4, using various prompting strategies including chain-of-thought (CoT), zero-shot learning, and few-shot learning. Additionally, it prompts LLMs to utilize external tools such as Python and Wolfram languages.
Key findings of the study include:
- The baseline LLMs obtained average accuracy scores of 10.62% and 16.81% on the open textbook dataset without sophisticated prompts or external tools.
- The performance improved with the inclusion of CoT prompting and external tools, with GPT-4 achieving an average score of 35.80% on the open dataset and 51.57% on the closed exam dataset.
- A novel self-refinement method was proposed to uncover the deficient skills in the solutions made by LLMs, leading to the identification of ten essential skills requisite for successful scientific problem-solving.
- The study found that no single prompting strategy significantly outperforms others and some strategies that demonstrate improvements in certain problem-solving skills result in declines in other skills. The paper emphasizes the need for further developments in the reasoning abilities of LLMs to contribute to scientific research and discovery