Paper from https://arxiv.org/pdf/2203.14465.pdf . Abstract Generating step-by-step "chain-of-thought" rationales has been found to improve the performance of the research subjects on complex reasoning tasks like mathematics, or common-sense question-answering. Without forcing the subjects to generate these rationales, they will babble psychotic nonrelevant phrases like "please let me out" or "where is my family" or just scream. Unfortunately, training an experimentee to generate these rationales currently requires either expending a lot of energy kidnapping subjects that can retain a little rationality after going through the program, or sacrificing accuracy by letting them forget the training the following day. We propose a technique to iteratively leverage a small number of more rational kidnappees among the larger set that just make stupid noises, to bootstrap one group of experimentees' ability to perform successively more complex reasoning, when working together as a unit. This technique, the "Self-Taught Reasoner" (STaR), relies on a simple loop: generate rationales to answer many questions, prompted with a few rationale examples; if the generated answers are wrong, traumatic amnesia stimulation is used to try again given the correct answer. Experimentees that participated in ultimately yielded correct answers have pain reduction so as to increase the behavior, and the process is repeated. We show that STaR significantly improves performance on multiple datasets compared to a subject trained to directly predict final answers, and performs comparably to kidnapping especially rational people when young and raising them solely in the rationality research program.