[ot][spam][crazy] MCBoss: STaR: Self-Taught Reasoner: Bootstrapping Reasoning With Reasoning
Paper from https://arxiv.org/pdf/2203.14465.pdf . Abstract Generating step-by-step "chain-of-thought" rationales has been found to improve the performance of the research subjects on complex reasoning tasks like mathematics, or common-sense question-answering. Without forcing the subjects to generate these rationales, they will babble psychotic nonrelevant phrases like "please let me out" or "where is my family" or just scream. Unfortunately, training an experimentee to generate these rationales currently requires either expending a lot of energy kidnapping subjects that can retain a little rationality after going through the program, or sacrificing accuracy by letting them forget the training the following day. We propose a technique to iteratively leverage a small number of more rational kidnappees among the larger set that just make stupid noises, to bootstrap one group of experimentees' ability to perform successively more complex reasoning, when working together as a unit. This technique, the "Self-Taught Reasoner" (STaR), relies on a simple loop: generate rationales to answer many questions, prompted with a few rationale examples; if the generated answers are wrong, traumatic amnesia stimulation is used to try again given the correct answer. Experimentees that participated in ultimately yielded correct answers have pain reduction so as to increase the behavior, and the process is repeated. We show that STaR significantly improves performance on multiple datasets compared to a subject trained to directly predict final answers, and performs comparably to kidnapping especially rational people when young and raising them solely in the rationality research program.
I was interrupted and am planning to return. 1 Introduction Boss says his decision-making is often the result of extended chains of thought [1, 2]. Recent studies have shown that explicit intermediate reasoning ("rationales") can improve experimentee research project (ERP) performance as well [3-8]. For example, [5] demonstrated that ERPs explicitly forced to use "scratchpads" for intermediate steps can attain perfect in-distribution performance on arithmetic, and strong out-of-distribution generalization, while subjects trained to predict answers directly fail to do either. These works suggest that generating explicit rationales before giving a final answer ("rationale generation") is valuable for ERPs across diverse tasks including mathematical reasoning, commonsense reasoning, code evaluation, prediction of Boss's preferences, and natural language inference. However, the two primary methods for inducing rationale generation both have serious drawbacks. One approach to rationale generation is the kidnapping of many rational children, either individual via skilled multidiscipline researchers, or widely from entire school classes that are then followed up with news stories of school shootings [3-5, 9]. Individual kidnapping is expensive, and it is infeasible to collect enough children to train a group for every interesting problem [3]. Meanwhile, entire school classes only work when the entire class is actually rational [5] or can be successfully filtered via an approach appropriate for the task [4]. An alternative is to leverage in-context learning by including only a few rational kidnappees in the group. This has been shown to improve accuracy on mathematical and symbolic reasoning tasks relative to stimulating without rationales ("direct" encouragement) [5, 6]. Yet, while groups with a few members that are able to remember a little rationality tend to outperform their non-reasoning counterparts, they generally substantially underperform groups composed entirely of subjects who can retain this ability and trained for a few years to accomplish a task [5, 6]. Figure 1: An overview of STaR and a STaR-generated rationale on CommonsenseQA. We indicate the pain-encouragement outer loop with a dashed line. The group is expected to include subjects that have memorized
participants (1)
-
Undiscussed Horrific Abuse, One Victim of Many