Skip to content

Commit

Permalink
Merge branch 'main' into update-documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
lapp0 authored Aug 12, 2024
2 parents a801cde + 5e8f770 commit 245c10d
Show file tree
Hide file tree
Showing 8 changed files with 450 additions and 9 deletions.
120 changes: 120 additions & 0 deletions docs/cookbook/chain_of_thought.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
# Chain of thought


Chain of thought is a prompting technique introduced in the paper ["Chain-of-Thought Prompting Elicits Reasoning in Large Language Models"](https://arxiv.org/abs/2201.11903) where throught prompting the authors generate a series of intermediate reasoning steps which improves the ability of LLMs to perform complex reasoning.

In this guide, we use [outlines](https://outlines-dev.github.io/outlines/) to apply chain of thought through structured output.

We use [llama.cpp](https://github.com/ggerganov/llama.cpp) using the [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) library. Outlines supports llama-cpp-python, but we need to install it ourselves:

```shell
pip install llama-cpp-python
```

We pull a quantized GGUF model, in this guide we pull [Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Theta-Llama-3-8B-GGUF) by [NousResearch](https://nousresearch.com/) from [HuggingFace](https://huggingface.co/):

```shell
wget https://hf.co/NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/resolve/main/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf
```

We initialize the model:

```python
from llama_cpp import Llama
from outlines import generate, models

llm = Llama(
"/path/to/model/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf",
tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained(
"NousResearch/Hermes-2-Pro-Llama-3-8B"
),
n_gpu_layers=-1,
flash_attn=True,
n_ctx=8192,
verbose=False
)
model = models.LlamaCpp(llm)
```

## Chain of thought

We first define our Pydantic class for a reasoning step:

```python
from pydantic import BaseModel, Field

class Reasoning_Step(BaseModel):
reasoning_step: str = Field(..., description="Reasoning step")
```

We then define the Pydantic class for reasoning which will consist on a list of reasoning steps and a conclusion, and we get its JSON schema:

```python
from typing import List

class Reasoning(BaseModel):
reasoning: List[Reasoning_Step] = Field(..., description="List of reasoning steps")
conclusion: str = Field(..., description="Conclusion")

json_schema = Reasoning.model_json_schema()
```

We could generate a response using the json schema but for a change we will use the regex:

```python
from outlines.integrations.utils import convert_json_schema_to_str
from outlines.fsm.json_schema import build_regex_from_schema

schema_str = convert_json_schema_to_str(json_schema=json_schema)
regex_str = build_regex_from_schema(schema_str)
```

We then need to adapt our prompt to the [Hermes prompt format for JSON schema](https://github.com/NousResearch/Hermes-Function-Calling?tab=readme-ov-file#prompt-format-for-json-mode--structured-outputs):

```python
def generate_hermes_prompt(user_prompt):
return (
"<|im_start|>system\n"
"You are a world class AI model who answers questions in JSON "
f"Here's the json schema you must adhere to:\n<schema>\n{schema}\n</schema><|im_end|>\n"
"<|im_start|>user\n"
+ user_prompt
+ "<|im_end|>"
+ "\n<|im_start|>assistant\n"
"<schema>"
)
```

For a given user prompt:

```python
user_prompt = "9.11 and 9.9 -- which is bigger?"
```

we can use `generate.regex` by passing the Pydantic class we previously defined, and call the generator with the Hermes prompt:

```python
generator = generate.regex(model, regex_str)
prompt = generate_hermes_prompt(user_prompt)
response = generator(prompt, max_tokens=1024, temperature=0, seed=42)
```

We obtain a series of intermediate reasoning steps as well as the conclusion:

```python
import json

json_response = json.loads(response)

print(json_response["reasoning"])
print(json_response["conclusion"])
# [{'reasoning_step': 'Both 9.11 and 9.9 are decimal numbers.'},
# {'reasoning_step': 'When comparing decimal numbers, we look at the numbers after the decimal point.'},
# {'reasoning_step': 'In this case, 9.11 has the number 1 after the decimal point, while 9.9 has the number 9.'},
# {'reasoning_step': 'Since 1 is greater than 9, 9.11 is greater than 9.9.'}]
# '9.11 is bigger.'
```

We notice that the 4th reasoning step is wrong ``Since 1 is greater than 9, 9.11 is greater than 9.9.'', so we should probably give the model some examples for this particular task.

This example was originally contributed by [Alonso Silva](https://github.com/alonsosilvaallende).
2 changes: 2 additions & 0 deletions docs/cookbook/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,5 @@
- [SimToM](simtom.md): Improve LLMs' Theory of Mind capabilities with perspective-taking prompting and JSON-structured generation.
- [Q&A with Citations](qa-with-citations.md): Answer questions and provide citations using JSON-structured generation.
- [Knowledge Graph Generation](knowledge_graph_extraction.md): Generate a Knowledge Graph from unstructured text using JSON-structured generation.
- [Chain Of Thought (CoT)](chain_of_thought.md): Generate a series of intermediate reasoning steps using regex-structured generation.
- [ReAct Agent](react_agent.md): Build an agent with open weights models using regex-structured generation.
257 changes: 257 additions & 0 deletions docs/cookbook/react_agent.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,257 @@
# ReAct Agent

This example shows how to use [outlines](https://outlines-dev.github.io/outlines/) to build your own agent with open weights local models and structured outputs. It is inspired by the blog post [A simple Python implementation of the ReAct pattern for LLMs](https://til.simonwillison.net/llms/python-react-pattern) by [Simon Willison](https://simonwillison.net/).

The ReAct pattern (for Reason+Act) is described in the paper [ReAct: Synergizing Reasoning and Acting in Language Models](https://arxiv.org/abs/2210.03629). It's a pattern where you implement additional actions that an LLM can take - searching Wikipedia or running calculations for example - and then teach it how to request the execution of those actions, and then feed their results back into the LLM.

Additionally, we give the LLM the possibility of using a scratchpad described in the paper [Show Your Work: Scratchpads for Intermediate Computation with Language Models](https://arxiv.org/abs/2112.00114) which improves the ability of LLMs to perform multi-step computations.

We use [llama.cpp](https://github.com/ggerganov/llama.cpp) using the [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) library. Outlines supports llama-cpp-python, but we need to install it ourselves:

```shell
pip install llama-cpp-python
```

We pull a quantized GGUF model, in this guide we pull [Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Theta-Llama-3-8B-GGUF) by [NousResearch](https://nousresearch.com/) from [HuggingFace](https://huggingface.co/):

```shell
wget https://hf.co/NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/resolve/main/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf
```

We initialize the model:

```python
import llama_cpp
from llama_cpp import Llama
from outlines import generate, models

llm = Llama(
"/path/to/model/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf",
tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained(
"NousResearch/Hermes-2-Pro-Llama-3-8B"
),
n_gpu_layers=-1,
flash_attn=True,
n_ctx=8192,
verbose=False
)
model = models.LlamaCpp(llm)
```

## Build a ReAct agent

In this example, we use two tools:

- wikipedia: \<search term\> - search Wikipedia and returns the snippet of the first result
- calculate: \<expression\> - evaluate an expression using Python's eval() function

```python
import httpx

def wikipedia(q):
return httpx.get("https://en.wikipedia.org/w/api.php", params={
"action": "query",
"list": "search",
"srsearch": q,
"format": "json"
}).json()["query"]["search"][0]["snippet"]
```

```python
def calculate(numexp):
return eval(numexp)
```

We define the logic of the agent through a Pydantic class. First, we want the LLM to decide only between the two previously defined tools:

```python
from enum import Enum

class Action(str, Enum):
wikipedia = "wikipedia"
calculate = "calculate"
```

Our agent will loop through Thought and Action. We explicitly give the Action Input field so it doesn't forget to add the arguments of the Action. We also add a scratchpad (optional).

```python
from pydantic import BaseModel, Field

class Reason_and_Act(BaseModel):
Scratchpad: str = Field(..., description="Information from the Observation useful to answer the question")
Thought: str = Field(..., description="It describes your thoughts about the question you have been asked")
Action: Action
Action_Input: str = Field(..., description="The arguments of the Action.")
```

Our agent will reach a Final Answer. We also add a scratchpad (optional).

```python
class Final_Answer(BaseModel):
Scratchpad: str = Field(..., description="Information from the Observation useful to answer the question")
Final_Answer: str = Field(..., description="Answer to the question grounded on the Observation")
```

Our agent will decide when it has reached a Final Answer and therefore to stop the loop of Thought and Action.

```python
from typing import Union

class Decision(BaseModel):
Decision: Union[Reason_and_Act, Final_Answer]
```

We could generate a response using the json schema but we will use the regex and check that everything is working as expected:

```python
from outlines.integrations.utils import convert_json_schema_to_str
from outlines.fsm.json_schema import build_regex_from_schema

json_schema = Decision.model_json_schema()
schema_str = convert_json_schema_to_str(json_schema=json_schema)
regex_str = build_regex_from_schema(schema_str)
print(regex_str)
# '\\{[ ]?"Decision"[ ]?:[ ]?(\\{[ ]?"Scratchpad"[ ]?:[ ]?"([^"\\\\\\x00-\\x1F\\x7F-\\x9F]|\\\\["\\\\])*"[ ]?,[ ]?"Thought"[ ]?:[ ]?"([^"\\\\\\x00-\\x1F\\x7F-\\x9F]|\\\\["\\\\])*"[ ]?,[ ]?"Action"[ ]?:[ ]?("wikipedia"|"calculate")[ ]?,[ ]?"Action_Input"[ ]?:[ ]?"([^"\\\\\\x00-\\x1F\\x7F-\\x9F]|\\\\["\\\\])*"[ ]?\\}|\\{[ ]?"Scratchpad"[ ]?:[ ]?"([^"\\\\\\x00-\\x1F\\x7F-\\x9F]|\\\\["\\\\])*"[ ]?,[ ]?"Final_Answer"[ ]?:[ ]?"([^"\\\\\\x00-\\x1F\\x7F-\\x9F]|\\\\["\\\\])*"[ ]?\\})[ ]?\\}'
```

We then need to adapt our prompt to the [Hermes prompt format for JSON schema](https://github.com/NousResearch/Hermes-Function-Calling?tab=readme-ov-file#prompt-format-for-json-mode--structured-outputs) and explain the agent logic:

```python
import datetime

def generate_hermes_prompt(question, schema=""):
return (
"<|im_start|>system\n"
"You are a world class AI model who answers questions in JSON with correct Pydantic schema. "
f"Here's the json schema you must adhere to:\n<schema>\n{schema}\n</schema>\n"
"Today is " + datetime.datetime.today().strftime('%Y-%m-%d') + ".\n" +
"You run in a loop of Scratchpad, Thought, Action, Action Input, PAUSE, Observation. "
"At the end of the loop you output a Final Answer. "
"Use Scratchpad to store the information from the Observation useful to answer the question "
"Use Thought to describe your thoughts about the question you have been asked "
"and reflect carefully about the Observation if it exists. "
"Use Action to run one of the actions available to you. "
"Use Action Input to input the arguments of the selected action - then return PAUSE. "
"Observation will be the result of running those actions. "
"Your available actions are:\n"
"calculate:\n"
"e.g. calulate: 4**2 / 3\n"
"Runs a calculation and returns the number - uses Python so be sure to use floating point syntax if necessary\n"
"wikipedia:\n"
"e.g. wikipedia: Django\n"
"Returns a summary from searching Wikipedia\n"
"DO NOT TRY TO GUESS THE ANSWER. Begin! <|im_end|>"
"\n<|im_start|>user\n" + question + "<|im_end|>"
"\n<|im_start|>assistant\n"
)
```

We define a ChatBot class

```python
class ChatBot:
def __init__(self, prompt=""):
self.prompt = prompt

def __call__(self, user_prompt):
self.prompt += user_prompt
result = self.execute()
return result

def execute(self):
generator = generate.regex(model, regex_str)
result = generator(self.prompt, max_tokens=1024, temperature=0, seed=42)
return result
```

We define a query function:

```python
import json

def query(question, max_turns=5):
i = 0
next_prompt = (
"\n<|im_start|>user\n" + question + "<|im_end|>"
"\n<|im_start|>assistant\n"
)
previous_actions = []
while i < max_turns:
i += 1
prompt = generate_hermes_prompt(question=question, schema=Decision.model_json_schema())
bot = ChatBot(prompt=prompt)
result = bot(next_prompt)
json_result = json.loads(result)['Decision']
if "Final_Answer" not in list(json_result.keys()):
scratchpad = json_result['Scratchpad'] if i == 0 else ""
thought = json_result['Thought']
action = json_result['Action']
action_input = json_result['Action_Input']
print(f"\x1b[34m Scratchpad: {scratchpad} \x1b[0m")
print(f"\x1b[34m Thought: {thought} \x1b[0m")
print(f"\x1b[36m -- running {action}: {str(action_input)}\x1b[0m")
if action + ": " + str(action_input) in previous_actions:
observation = "You already run that action. **TRY A DIFFERENT ACTION INPUT.**"
else:
if action=="calculate":
try:
observation = eval(str(action_input))
except Exception as e:
observation = f"{e}"
elif action=="wikipedia":
try:
observation = wikipedia(str(action_input))
except Exception as e:
observation = f"{e}"
print()
print(f"\x1b[33m Observation: {observation} \x1b[0m")
print()
previous_actions.append(action + ": " + str(action_input))
next_prompt += (
"\nScratchpad: " + scratchpad +
"\nThought: " + thought +
"\nAction: " + action +
"\nAction Input: " + action_input +
"\nObservation: " + str(observation)
)
else:
scratchpad = json_result["Scratchpad"]
final_answer = json_result["Final_Answer"]
print(f"\x1b[34m Scratchpad: {scratchpad} \x1b[0m")
print(f"\x1b[34m Final Answer: {final_answer} \x1b[0m")
return final_answer
print(f"\nFinal Answer: I am sorry, but I am unable to answer your question. Please provide more information or a different question.")
return "No answer found"
```

We can now test our ReAct agent:

```python
print(query("What's 2 to the power of 10?"))
# Scratchpad:
# Thought: I need to perform a mathematical calculation to find the result of 2 to the power of 10.
# -- running calculate: 2**10
#
# Observation: 1024
#
# Scratchpad: 2 to the power of 10 is 1024.
# Final Answer: 2 to the power of 10 is 1024.
# 2 to the power of 10 is 1024.
```

```python
print(query("What does England share borders with?"))
# Scratchpad:
# Thought: To answer this question, I will use the 'wikipedia' action to gather information about England's geographical location and its borders.
# -- running wikipedia: England borders
#
# Observation: Anglo-Scottish <span class="searchmatch">border</span> (Scottish Gaelic: Crìochan Anglo-Albannach) is an internal <span class="searchmatch">border</span> of the United Kingdom separating Scotland and <span class="searchmatch">England</span> which runs for
#
# Scratchpad: Anglo-Scottish border (Scottish Gaelic: Crìochan Anglo-Albannach) is an internal border of the United Kingdom separating Scotland and England which runs for
# Final Answer: England shares a border with Scotland.
# England shares a border with Scotland.
```

As mentioned in Simon's blog post, this is not a very robust implementation at all and there's a ton of room for improvement. But it is lovely how simple it is with a few lines of Python to make these extra capabilities available to the LLM. And now you can run it locally with an open weights LLM.

This example was originally contributed by [Alonso Silva](https://github.com/alonsosilvaallende).
Loading

0 comments on commit 245c10d

Please sign in to comment.