Merge branch 'main' into update-documentation

dottxt-ai · Aug 12, 2024 · 245c10d · 245c10d
2 parents a801cde + 5e8f770
commit 245c10d
Show file tree

Hide file tree

Showing 8 changed files with 450 additions and 9 deletions.
diff --git a/docs/cookbook/chain_of_thought.md b/docs/cookbook/chain_of_thought.md
@@ -0,0 +1,120 @@
+# Chain of thought
+
+
+Chain of thought is a prompting technique introduced in the paper ["Chain-of-Thought Prompting Elicits Reasoning in Large Language Models"](https://arxiv.org/abs/2201.11903) where throught prompting the authors generate a series of intermediate reasoning steps which improves the ability of LLMs to perform complex reasoning.
+
+In this guide, we use [outlines](https://outlines-dev.github.io/outlines/) to apply chain of thought through structured output.
+
+We use [llama.cpp](https://github.com/ggerganov/llama.cpp) using the [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) library. Outlines supports llama-cpp-python, but we need to install it ourselves:
+
+```shell
+pip install llama-cpp-python
+```
+
+We pull a quantized GGUF model, in this guide we pull [Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Theta-Llama-3-8B-GGUF) by [NousResearch](https://nousresearch.com/) from [HuggingFace](https://huggingface.co/):
+
+```shell
+wget https://hf.co/NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/resolve/main/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf
+```
+
+We initialize the model:
+
+```python
+from llama_cpp import Llama
+from outlines import generate, models
+
+llm = Llama(
+    "/path/to/model/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf",
+    tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained(
+        "NousResearch/Hermes-2-Pro-Llama-3-8B"
+    ),
+    n_gpu_layers=-1,
+    flash_attn=True,
+    n_ctx=8192,
+    verbose=False
+)
+model = models.LlamaCpp(llm)
+```
+
+## Chain of thought
+
+We first define our Pydantic class for a reasoning step:
+
+```python
+from pydantic import BaseModel, Field
+
+class Reasoning_Step(BaseModel):
+    reasoning_step: str = Field(..., description="Reasoning step")
+```
+
+We then define the Pydantic class for reasoning which will consist on a list of reasoning steps and a conclusion, and we get its JSON schema:
+
+```python
+from typing import List
+
+class Reasoning(BaseModel):
+    reasoning: List[Reasoning_Step] = Field(..., description="List of reasoning steps")
+    conclusion: str = Field(..., description="Conclusion")
+
+json_schema = Reasoning.model_json_schema()
+```
+
+We could generate a response using the json schema but for a change we will use the regex:
+
+```python
+from outlines.integrations.utils import convert_json_schema_to_str
+from outlines.fsm.json_schema import build_regex_from_schema
+
+schema_str = convert_json_schema_to_str(json_schema=json_schema)
+regex_str = build_regex_from_schema(schema_str)
+```
+
+We then need to adapt our prompt to the [Hermes prompt format for JSON schema](https://github.com/NousResearch/Hermes-Function-Calling?tab=readme-ov-file#prompt-format-for-json-mode--structured-outputs):
+
+```python
+def generate_hermes_prompt(user_prompt):
+    return (
+        "<|im_start|>system\n"
+        "You are a world class AI model who answers questions in JSON "
+        f"Here's the json schema you must adhere to:\n<schema>\n{schema}\n</schema><|im_end|>\n"
+        "<|im_start|>user\n"
+        + user_prompt
+        + "<|im_end|>"
+        + "\n<|im_start|>assistant\n"
+        "<schema>"
+    )
+```
+
+For a given user prompt:
+
+```python
+user_prompt = "9.11 and 9.9 -- which is bigger?"
+```
+
+we can use `generate.regex` by passing the Pydantic class we previously defined, and call the generator with the Hermes prompt:
+
+```python
+generator = generate.regex(model, regex_str)
+prompt = generate_hermes_prompt(user_prompt)
+response = generator(prompt, max_tokens=1024, temperature=0, seed=42)
+```
+
+We obtain a series of intermediate reasoning steps as well as the conclusion:
+
+```python
+import json
+
+json_response = json.loads(response)
+
+print(json_response["reasoning"])
+print(json_response["conclusion"])
+# [{'reasoning_step': 'Both 9.11 and 9.9 are decimal numbers.'},
+#  {'reasoning_step': 'When comparing decimal numbers, we look at the numbers after the decimal point.'},
+#  {'reasoning_step': 'In this case, 9.11 has the number 1 after the decimal point, while 9.9 has the number 9.'},
+#  {'reasoning_step': 'Since 1 is greater than 9, 9.11 is greater than 9.9.'}]
+# '9.11 is bigger.'
+```
+
+We notice that the 4th reasoning step is wrong ``Since 1 is greater than 9, 9.11 is greater than 9.9.'', so we should probably give the model some examples for this particular task.
+
+This example was originally contributed by [Alonso Silva](https://github.com/alonsosilvaallende).
diff --git a/docs/cookbook/index.md b/docs/cookbook/index.md
@@ -8,3 +8,5 @@
 - [SimToM](simtom.md): Improve LLMs' Theory of Mind capabilities with perspective-taking prompting and JSON-structured generation.
 - [Q&A with Citations](qa-with-citations.md): Answer questions and provide citations using JSON-structured generation.
 - [Knowledge Graph Generation](knowledge_graph_extraction.md): Generate a Knowledge Graph from unstructured text using JSON-structured generation.
+- [Chain Of Thought (CoT)](chain_of_thought.md): Generate a series of intermediate reasoning steps using regex-structured generation.
+- [ReAct Agent](react_agent.md): Build an agent with open weights models using regex-structured generation.
diff --git a/docs/cookbook/react_agent.md b/docs/cookbook/react_agent.md
@@ -0,0 +1,257 @@
+# ReAct Agent
+
+This example shows how to use [outlines](https://outlines-dev.github.io/outlines/) to build your own agent with open weights local models and structured outputs. It is inspired by the blog post [A simple Python implementation of the ReAct pattern for LLMs](https://til.simonwillison.net/llms/python-react-pattern) by [Simon Willison](https://simonwillison.net/).
+
+The ReAct pattern (for Reason+Act) is described in the paper [ReAct: Synergizing Reasoning and Acting in Language Models](https://arxiv.org/abs/2210.03629). It's a pattern where you implement additional actions that an LLM can take - searching Wikipedia or running calculations for example - and then teach it how to request the execution of those actions, and then feed their results back into the LLM.
+
+Additionally, we give the LLM the possibility of using a scratchpad described in the paper [Show Your Work: Scratchpads for Intermediate Computation with Language Models](https://arxiv.org/abs/2112.00114) which improves the ability of LLMs to perform multi-step computations.
+
+We use [llama.cpp](https://github.com/ggerganov/llama.cpp) using the [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) library. Outlines supports llama-cpp-python, but we need to install it ourselves:
+
+```shell
+pip install llama-cpp-python
+```
+
+We pull a quantized GGUF model, in this guide we pull [Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Theta-Llama-3-8B-GGUF) by [NousResearch](https://nousresearch.com/) from [HuggingFace](https://huggingface.co/):
+
+```shell
+wget https://hf.co/NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/resolve/main/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf
+```
+
+We initialize the model:
+
+```python
+import llama_cpp
+from llama_cpp import Llama
+from outlines import generate, models
+
+llm = Llama(
+    "/path/to/model/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf",
+    tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained(
+        "NousResearch/Hermes-2-Pro-Llama-3-8B"
+    ),
+    n_gpu_layers=-1,
+    flash_attn=True,
+    n_ctx=8192,
+    verbose=False
+)
+model = models.LlamaCpp(llm)
+```
+
+## Build a ReAct agent
+
+In this example, we use two tools:
+
+- wikipedia: \<search term\> - search Wikipedia and returns the snippet of the first result
+- calculate: \<expression\> - evaluate an expression using Python's eval() function
+
+```python
+import httpx
+
+def wikipedia(q):
+    return httpx.get("https://en.wikipedia.org/w/api.php", params={
+        "action": "query",
+        "list": "search",
+        "srsearch": q,
+        "format": "json"
+    }).json()["query"]["search"][0]["snippet"]
+```
+
+```python
+def calculate(numexp):
+    return eval(numexp)
+```
+
+We define the logic of the agent through a Pydantic class. First, we want the LLM to decide only between the two previously defined tools:
+
+```python
+from enum import Enum
+
+class Action(str, Enum):
+    wikipedia = "wikipedia"
+    calculate = "calculate"
+```
+
+Our agent will loop through Thought and Action. We explicitly give the Action Input field so it doesn't forget to add the arguments of the Action. We also add a scratchpad (optional).
+
+```python
+from pydantic import BaseModel, Field
+
+class Reason_and_Act(BaseModel):
+    Scratchpad: str = Field(..., description="Information from the Observation useful to answer the question")
+    Thought: str = Field(..., description="It describes your thoughts about the question you have been asked")
+    Action: Action
+    Action_Input: str = Field(..., description="The arguments of the Action.")
+```
+
+Our agent will reach a Final Answer. We also add a scratchpad (optional).
+
+```python
+class Final_Answer(BaseModel):
+    Scratchpad: str = Field(..., description="Information from the Observation useful to answer the question")
+    Final_Answer: str = Field(..., description="Answer to the question grounded on the Observation")
+```
+
+Our agent will decide when it has reached a Final Answer and therefore to stop the loop of Thought and Action.
+
+```python
+from typing import Union
+
+class Decision(BaseModel):
+    Decision: Union[Reason_and_Act, Final_Answer]
+```
+
+We could generate a response using the json schema but we will use the regex and check that everything is working as expected:
+
+```python
+from outlines.integrations.utils import convert_json_schema_to_str
+from outlines.fsm.json_schema import build_regex_from_schema
+
+json_schema = Decision.model_json_schema()
+schema_str = convert_json_schema_to_str(json_schema=json_schema)
+regex_str = build_regex_from_schema(schema_str)
+print(regex_str)
+# '\\{[ ]?"Decision"[ ]?:[ ]?(\\{[ ]?"Scratchpad"[ ]?:[ ]?"([^"\\\\\\x00-\\x1F\\x7F-\\x9F]|\\\\["\\\\])*"[ ]?,[ ]?"Thought"[ ]?:[ ]?"([^"\\\\\\x00-\\x1F\\x7F-\\x9F]|\\\\["\\\\])*"[ ]?,[ ]?"Action"[ ]?:[ ]?("wikipedia"|"calculate")[ ]?,[ ]?"Action_Input"[ ]?:[ ]?"([^"\\\\\\x00-\\x1F\\x7F-\\x9F]|\\\\["\\\\])*"[ ]?\\}|\\{[ ]?"Scratchpad"[ ]?:[ ]?"([^"\\\\\\x00-\\x1F\\x7F-\\x9F]|\\\\["\\\\])*"[ ]?,[ ]?"Final_Answer"[ ]?:[ ]?"([^"\\\\\\x00-\\x1F\\x7F-\\x9F]|\\\\["\\\\])*"[ ]?\\})[ ]?\\}'
+```
+
+We then need to adapt our prompt to the [Hermes prompt format for JSON schema](https://github.com/NousResearch/Hermes-Function-Calling?tab=readme-ov-file#prompt-format-for-json-mode--structured-outputs) and explain the agent logic:
+
+```python
+import datetime
+
+def generate_hermes_prompt(question, schema=""):
+    return (
+        "<|im_start|>system\n"
+        "You are a world class AI model who answers questions in JSON with correct Pydantic schema. "
+        f"Here's the json schema you must adhere to:\n<schema>\n{schema}\n</schema>\n"
+        "Today is " + datetime.datetime.today().strftime('%Y-%m-%d') + ".\n" +
+        "You run in a loop of Scratchpad, Thought, Action, Action Input, PAUSE, Observation. "
+        "At the end of the loop you output a Final Answer. "
+        "Use Scratchpad to store the information from the Observation useful to answer the question "
+        "Use Thought to describe your thoughts about the question you have been asked "
+        "and reflect carefully about the Observation if it exists. "
+        "Use Action to run one of the actions available to you. "
+        "Use Action Input to input the arguments of the selected action - then return PAUSE. "
+        "Observation will be the result of running those actions. "
+        "Your available actions are:\n"
+        "calculate:\n"
+        "e.g. calulate: 4**2 / 3\n"
+        "Runs a calculation and returns the number - uses Python so be sure to use floating point syntax if necessary\n"
+        "wikipedia:\n"
+        "e.g. wikipedia: Django\n"
+        "Returns a summary from searching Wikipedia\n"
+        "DO NOT TRY TO GUESS THE ANSWER. Begin! <|im_end|>"
+        "\n<|im_start|>user\n" + question + "<|im_end|>"
+        "\n<|im_start|>assistant\n"
+    )
+```
+
+We define a ChatBot class
+
+```python
+class ChatBot:
+    def __init__(self, prompt=""):
+        self.prompt = prompt
+
+    def __call__(self, user_prompt):
+        self.prompt += user_prompt
+        result = self.execute()
+        return result
+
+    def execute(self):
+        generator = generate.regex(model, regex_str)
+        result = generator(self.prompt, max_tokens=1024, temperature=0, seed=42)
+        return result
+```
+
+We define a query function:
+
+```python
+import json
+
+def query(question, max_turns=5):
+    i = 0
+    next_prompt = (
+        "\n<|im_start|>user\n" + question + "<|im_end|>"
+        "\n<|im_start|>assistant\n"
+    )
+    previous_actions = []
+    while i < max_turns:
+        i += 1
+        prompt = generate_hermes_prompt(question=question, schema=Decision.model_json_schema())
+        bot = ChatBot(prompt=prompt)
+        result = bot(next_prompt)
+        json_result = json.loads(result)['Decision']
+        if "Final_Answer" not in list(json_result.keys()):
+            scratchpad = json_result['Scratchpad'] if i == 0 else ""
+            thought = json_result['Thought']
+            action = json_result['Action']
+            action_input = json_result['Action_Input']
+            print(f"\x1b[34m Scratchpad: {scratchpad} \x1b[0m")
+            print(f"\x1b[34m Thought: {thought} \x1b[0m")
+            print(f"\x1b[36m  -- running {action}: {str(action_input)}\x1b[0m")
+            if action + ": " + str(action_input) in previous_actions:
+                observation = "You already run that action. **TRY A DIFFERENT ACTION INPUT.**"
+            else:
+                if action=="calculate":
+                    try:
+                        observation = eval(str(action_input))
+                    except Exception as e:
+                        observation = f"{e}"
+                elif action=="wikipedia":
+                    try:
+                        observation = wikipedia(str(action_input))
+                    except Exception as e:
+                        observation = f"{e}"
+            print()
+            print(f"\x1b[33m Observation: {observation} \x1b[0m")
+            print()
+            previous_actions.append(action + ": " + str(action_input))
+            next_prompt += (
+                "\nScratchpad: " + scratchpad +
+                "\nThought: " + thought +
+                "\nAction: " + action  +
+                "\nAction Input: " + action_input +
+                "\nObservation: " + str(observation)
+            )
+        else:
+            scratchpad = json_result["Scratchpad"]
+            final_answer = json_result["Final_Answer"]
+            print(f"\x1b[34m Scratchpad: {scratchpad} \x1b[0m")
+            print(f"\x1b[34m Final Answer: {final_answer} \x1b[0m")
+            return final_answer
+    print(f"\nFinal Answer: I am sorry, but I am unable to answer your question. Please provide more information or a different question.")
+    return "No answer found"
+```
+
+We can now test our ReAct agent:
+
+```python
+print(query("What's 2 to the power of 10?"))
+# Scratchpad:
+# Thought: I need to perform a mathematical calculation to find the result of 2 to the power of 10.
+#  -- running calculate: 2**10
+#
+# Observation: 1024
+#
+# Scratchpad: 2 to the power of 10 is 1024.
+# Final Answer: 2 to the power of 10 is 1024.
+# 2 to the power of 10 is 1024.
+```
+
+```python
+print(query("What does England share borders with?"))
+# Scratchpad:
+# Thought: To answer this question, I will use the 'wikipedia' action to gather information about England's geographical location and its borders.
+#  -- running wikipedia: England borders
+#
+# Observation: Anglo-Scottish <span class="searchmatch">border</span> (Scottish Gaelic: Crìochan Anglo-Albannach) is an internal <span class="searchmatch">border</span> of the United Kingdom separating Scotland and <span class="searchmatch">England</span> which runs for
+#
+# Scratchpad: Anglo-Scottish border (Scottish Gaelic: Crìochan Anglo-Albannach) is an internal border of the United Kingdom separating Scotland and England which runs for
+# Final Answer: England shares a border with Scotland.
+# England shares a border with Scotland.
+```
+
+As mentioned in Simon's blog post, this is not a very robust implementation at all and there's a ton of room for improvement. But it is lovely how simple it is with a few lines of Python to make these extra capabilities available to the LLM. And now you can run it locally with an open weights LLM.
+
+This example was originally contributed by [Alonso Silva](https://github.com/alonsosilvaallende).