Adding reasoning to your AI? Take these resources, they may help you on your way. 
| AGI/causality/frml grammar | ||
|---|---|---|
| Deepmind Chomsky Hierarchy | Problems crafted for FSM/PDA/TM | [1] |
| automata | a neurallambda tool to gen from grammars | [1] |
| im a strange dataset | Tough for LLMs because of self-references. | [1] |
| DiagGSM8k | NL Reasoning Benchmark | [1] |
| CLadder | Causal reasoning | [1] |
| Cause-Effect Pairs | 108 datasets of 2 var dynamics (not NL) | [1] |
| MNLI Entailment | sentence parsing + entailment | [1] |
| AGENT/TOOL | ||
|---|---|---|
| THUDM AgentInstruct | long form dialogs | [1] |
| WANG AgentInstruct | gpt3 synthesized instructions | [1] |
| KnowLM Tool | prompt + tool call + answer | [1] |
| Glaive Tool Usage | sys prompt says tools + prompt + answer | [1] |
| opentoolformer retrieval | prompt + tool call | [1] |
| CODE | ||
|---|---|---|
| rosetta | same program, many diff languages | [1] |
| EvoEval Tool Use | 100 prompt + code + tests | [1] |
| MATH/LOGIC | ||
|---|---|---|
| gsm8k | Grade School Math 8k | [1] |
| MetaMath | one-shot math | [1] |
| MetaMathFewShot | few-shot math | [1] |
| MathPile | 9B tok from filtered internet | [1] |
| LogiQA | NL multi choice, requires abstraction | [1] |
| Logic-LM | a model combining auto theorem provers and llms | [1] |
| Coq Facts | 270k cog theorem prover programs | [1] |
| NATURAL LANGUAGE | ||
|---|---|---|
| UltraInteract_sft | GPT generated iterated reasoning dialogs | [1] |
| MUD videogames | (various could be training data) | |
| Winogrande | ambiguous sentences, fill in 1 word | [1] |
| Winograd_wsc | ambiguous sentences, choose the right word | [1] |
| Contradiction | 2 phrases, do they contradict | [1] |
| Recognizing Textual Entailment | 2 phrases, do they entail each other | [1] |
| Textual Entailment Pool | more entailment | [1] |
| Answer Validation | 2 phrases, does the answer solve question | [1] |
| Monotonicity Entailment | x is true, does y follow | [1] |
| entailment | passage, question -> T/F | [1] |
| Commonsense QA | muti choice QA | [1] |
| GLUE | several datasets | [1] |
| custom multi-hop | use wikipedia's graph of articles |
| TOY PROBLEMS | ||
|---|---|---|
| Big Bench Hard | 23 challenges (only 6k datapoints) | [1] |
| logical entailment dataset | logic strings by deepmind | [1] |
| logical entailment dataset code | (generate it yourself) | [1] |
| FSM Game | generate strings according to grammar | |
| Adaptive Grammar | grammar rule might change | |
| String/Graph Rewriting | string_rewriting.py |
|
| LibraryOfLogic | generate NL from multiple games | [1] |
| AB-XY Game | ||
| word ladder | ||
| parser | ||
| longest cmn subseq | ||
| string reversal | ||
| wisconsin card sorting | ||
| anagram | ||
| palindrome | ||
| permutation composition |
| TOKEN AUGMENTED REASONING | ||
|---|---|---|
| Reasoning tokens | Self-Reasoning Tokens, teaching models to think ahead | [1] |
| Quiet-STaR | LLMs Can Teach Themselves to Think Before Speaking | [1] |
| Multi-token Prediction | Multi-token prediction is favorable for the development of induction heads and algorithmic reasoning capabilities | https://arxiv.org/abs/2404.19737 |
| INDIRECT REASONING (IR) | ||
|---|---|---|
| Contrapositive and Contradiction for Automated Reasoning | use logic of contrapositives and contradictions for factual reasoning and mathematical proofs | https://arxiv.org/pdf/2402.03667 |
| DIRECT REASONING (DR) | ||
|---|---|---|
| Graph of Thoughts (GoT) | Model the information generated by an LLM as an arbitrary graph | https://arxiv.org/abs/2308.09687 |
| Self-Consistency | Self-consistency leverages the intuition that a complex reasoning problem typically admits multiple different ways of thinking leading to its unique correct answer | https://arxiv.org/abs/2203.11171 |
| Chain of Thoughts | chain of thought -- a series of intermediate reasoning steps -- significantly improves the ability of large language models to perform complex reasoning | https://arxiv.org/abs/2201.11903 |
| Chain of thoughts without prompting | CoT reasoning paths can be elicited from pre-trained LLMs by simply altering the decoding proces | https://arxiv.org/abs/2402.10200 |
| Iterative Reasoning Preference Optimization | Iterated DPO, but for CoT, repeated until performance saturates on reasoning tasks | https://arxiv.org/pdf/2404.19733 |