{ "info": { "author": "anubrag", "author_email": "b.anurag@aol.com", "bugtrack_url": null, "classifiers": [ "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.10", "Programming Language :: Python :: 3.11", "Programming Language :: Python :: 3.12", "Programming Language :: Python :: 3.8", "Programming Language :: Python :: 3.9" ], "description": "
\n
\n \n \"guidance\"\n
\n
\n\n> *Note that v0.1 is a dramatically new version developed while releases had to be paused over the summer. If you are looking for the old version based on handlebars, you can use v0.0.64, but you should instead try porting over to the much better new version :)*\n\n**`guidance`** is a programming paradigm that offers superior control and efficiency compared to conventional prompting and chaining. It allows users to constrain generation (e.g. with regex and CFGs) as well as to interleave control (conditional, loops) and generation seamlessly. Here are some important features: \n\n1. **Pure, beautiful python** with additional LM functionality. E.g. here is [basic generation](#basic-generation):\n```python\nfrom guidance import models, gen\n\n# load a model (could be Transformers, LlamaCpp, VertexAI, OpenAI...)\nllama2 = models.LlamaCpp(path) \n\n# append text or generations to the model\nllama2 + f'Do you want a joke or a poem? ' + gen(stop='.')\n```\n> Do you want a joke or a poem? I'll give you a poem\n2. [**Constrained generation**](#constrained-generation) with [selects](#select-basic), [regular expressions](#regular-expressions), and [context-free grammars](#context-free-grammars).\n```python\nfrom guidance import select\n\n# a simple select between two options\nllama2 + f'Do you want a joke or a poem? A ' + select(['joke', 'poem'])\n```\n> Do you want a joke or a poem? A poem\n3. **Rich templates with f-strings**:\n```python\nllama2 + f'''\\\nDo you want a joke or a poem? A {select(['joke', 'poem'])}.\nOkay, here is a one-liner: \"{gen(stop='\"')}\"\n'''\n```\n> Do you want a joke or a poem? A poem. \n> Okay, here is a one-liner: \"I'm a poet, and I know it.\"\n\n4. [**Stateful control + generation**](#stateful-control--generation) makes it easy to interleave prompting / logic / generation, no need for intermediate parsers:\n```python\n# capture our selection under the name 'answer'\nlm = llama2 + f\"Do you want a joke or a poem? A {select(['joke', 'poem'], name='answer')}.\\n\"\n\n# make a choice based on the model's previous selection\nif lm[\"answer\"] == \"joke\":\n lm += f\"Here is a one-line joke about cats: \" + gen('output', stop='\\n')\nelse:\n lm += f\"Here is a one-line poem about dogs: \" + gen('output', stop='\\n')\n```\n> Do you want a joke or a poem? A poem. \n> Here is a one-line poem about dogs: \u201cDogs are the best.\u201d\n5. **Abstract chat interface** that uses the correct special tokens for any chat model:\n```python\nfrom guidance import user, assistant\n\n# load a chat model\nchat_lm = models.LlamaCppChat(model_path, n_gpu_layers=-1)\n\n# wrap with chat block contexts\nwith user():\n lm = chat_lm + 'Do you want a joke or a poem?'\n\nwith assistant():\n lm += f\"A {select(['joke', 'poem'])}.\"`\n```\n6. **Easy to write reusable components**\n```python\n@guidance\ndef one_line_thing(lm, thing, topic):\n\n # update the incoming model\n lm += f'Here is a one-line {thing} about {topic}: ' + gen(stop='\\n')\n\n # return our updated model\n return lm \n\n# pick either a joke or a poem\nlm = llama2 + f\"Do you want a joke or a poem? A {select(['joke', 'poem'], name='thing')}.\\n\"\n\n# call our guidance function\nlm += one_line_thing(lm['thing'], 'cats')\n```\n> Do you want a joke or a poem? A poem. \n> Here is a one-line poem about cats: \u201cCats are the best.\u201d\n7. **A library of pre-built components**, e.g. substring:\n```python\nfrom guidance import substring\n\n# define a set of possible statements\ntext = 'guidance is awesome. guidance is so great. guidance is the best thing since sliced bread.'\n\n# force the model to make an exact quote\nllama2 + f'Here is a true statement about the guidance library: \"{substring(text)}\"'\n```\n> Here is a true statement about the guidance library: \"the best thing since sliced bread.\"\n\n8. [**Easy tool use**](#automatic-interleaving-of-control-and-generation-tool-use), where the model stops generation when a tool is called, calls the tool, then resumes generation. For example, here is a simple version of a calculator, via four separate 'tools':\n```python\n@guidance\ndef add(lm, input1, input2):\n lm += f' = {int(input1) + int(input2)}'\n return lm\n@guidance\ndef subtract(lm, input1, input2):\n lm += f' = {int(input1) - int(input2)}'\n return lm\n@guidance\ndef multiply(lm, input1, input2):\n lm += f' = {float(input1) * float(input2)}'\n return lm\n@guidance\ndef divide(lm, input1, input2):\n lm += f' = {float(input1) / float(input2)}'\n return lm\n```\nNow we call `gen` with these tools as options. Notice how generation is stopped and restarted automatically:\n```python\nlm = llama2 + '''\\\n1 + 1 = add(1, 1) = 2\n2 - 3 = subtract(2, 3) = -1\n'''\nlm + gen(max_tokens=15, tools=[add, subtract, multiply, divide])\n```\n> 1 + 1 = add(1, 1) = 2 \n> 2 - 3 = subtract(2, 3) = -1 \n> 3 * 4 = multiply(3, 4) = 12.0 \n> 4 / 5 = divide(4, 5) = 0.8\n\n\n9. **Speed**: In contrast to chaining, `guidance` programs are the equivalent of a single LLM call. More so, whatever non-generated text that gets appended is batched, so that `guidance` programs are **faster** than having the LM generate intermediate text when you have a set structure.\n10. **Token healing**: Users deal with text (or bytes) rather than tokens, and thus don't have to worry about [perverse token boundaries issues](https://towardsdatascience.com/the-art-of-prompt-design-prompt-boundaries-and-token-healing-3b2448b0be38) such as 'prompt ending in whitespace'.\n11. **Streaming support**, also integrated with jupyter notebooks:\n \nTODO: change this image to new version with the example above.\n\n12. **High compatibility:** works with Transformers, llamacpp, VertexAI, OpenAI. Users can write one guidance program and execute it on many backends (note that the most powerful features require enpoint integration, and for now work best with transformers and llamacpp).\n\n## Table of Contents\n * [Install](#install)\n * [Loading models](#loading-models)\n * [llama-cpp](#llama-cpp)\n * [transformers](#transformers)\n * [Vertex](#vertex)\n * [OpenAI](#openai)\n * [Example notebooks](#example-notebooks)\n * [Basic generation](#basic-generation)\n * [Constrained Generation](#constrained-generation)\n * [Select (basic)](#select-basic)\n * [Regular expressions](#regular-expressions)\n * [Regex to constrain generation](#regex-to-constrain-generation)\n * [Regex as stopping criterion](#regex-as-stopping-criterion)\n * [Context-free grammars](#context-free-grammars)\n * [Stateful control + generation](#stateful-control--generation)\n * [State in immutable objects](#state-in-immutable-objects)\n * [Stateful guidance functions](#stateful-guidance-functions)\n * [Example: ReAct](#example-react)\n * [Example: Changing intermediate step of a Chat session](#example-changing-intermediate-step-of-a-chat-session)\n * [Automatic interleaving of control and generation: tool use](#automatic-interleaving-of-control-and-generation-tool-use)\n * [Gsm8k example](#gsm8k-example)\n * [Automatic call grammar for @guidance functions](#automatic-call-grammar-for-guidance-functions)\n * [Jupyter notebook streaming](#jupyter-notebook-streaming)\n * [Text, not tokens](#text-not-tokens)\n * [Fast](#fast)\n * [Integrated stateful control is faster](#integrated-stateful-control-is-faster)\n * [Guidance acceleration](#guidance-acceleration)\n\n## Install\n```bash\npip install guidance\n```\n## Loading models\n### llama-cpp\nInstall the python bindings:\n```bash\nCMAKE_ARGS=\"-DLLAMA_CUBLAS=on\" pip install llama-cpp-python\n```\nLoading the model:\n```python\nfrom guidance import models\nlm = models.LlamaCpp(path_to_model, n_gpu_layers=-1)\n```\n\n### transformers\nInstall transformers:\n```python\nfrom guidance import models\nlm = models.Transformers(model_name_or_path)\n```\n\n### Vertex\nTodo @Scott: talk about how constrained generation is different for these models\n\n### OpenAI\nTodo @Scott\n\n## Example notebooks\nComing soon\n\n## Basic generation\nAn `lm` object is immutable, so you change it by creating new copies of it. By default, when you append things to `lm`, it creates a copy, e.g.:\n```python\nfrom guidance import models, gen, select\nllama2 = models.LlamaCpp(path_to_model, n_gpu_layers=-1)\n# llama2 is not modified, and `lm` is a copy of it with the prompt appended\nlm = llama2 + 'This is a prompt'\n```\n\nYou can append _generation_ calls to it, e.g.\n```python\nlm = llama2 + 'This is a prompt' + gen(max_tokens=10)\n```\n> This is a prompt for the 2018 NaNoW\n\nYou can also interleave generation calls with plain text, or control flows:\n```python\n# Note how we set stop tokens\nlm = llama2 + 'I like to play with my ' + gen(stop=' ') + ' in' + gen(stop=['\\n', '.', '!'])\n```\n> I like to play with my friends in the park\n\n## Constrained Generation\n### Select (basic)\n`select` constrains generation to a set of options:\n```python\nlm = llama2 + 'I like the color ' + select(['red', 'blue', 'green'])\n```\n> I like the color blue\n\n### Regular expressions\n`gen` has optional arguments `regex` and `stop_regex`, which allow generation (and stopping, respectively) to be controlled by a regex. \n\n#### Regex to constrain generation\nUnconstrained:\n\n```python\nlm = llama2 + 'Question: Luke has ten balls. He gives three to his brother.\\n'\nlm += 'How many balls does he have left?\\n'\nlm += 'Answer: ' + gen(stop='\\n')\n```\n> Answer: Seven.\n\nConstrained by regex:\n\n```python\nlm = llama2 + 'Question: Luke has ten balls. He gives three to his brother.\\n'\nlm += 'How many balls does he have left?\\n'\nlm += 'Answer: ' + gen(regex='\\d+')\n```\n> Answer: 7\n\n\n#### Regex as stopping criterion\nUnconstrained:\n```python\nlm = llama2 + '19, 18,' + gen(max_tokens=50)\n```\n> 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4,\n\nStop with traditional stop text, whenever the model generates the number 7:\n```python\nlm = llama2 + '19, 18,' + gen(max_tokens=50, stop='7')\n```\n> 19, 18, 1\n \nStop whenever the model generates the character `7` without any numbers around it: \n```python\nlm = llama2 + '19, 18,' + gen(max_tokens=50, stop_regex='[^\\d]7[^\\d]')\n```\n> 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8,\n\n### Context-free grammars\nWe expose a variety of operators that make it easy to define CFGs, which in turn can be used to constrain generation.\nFor example, we can use the `select` operator (it accepts CFGs as options), `zero_or_more` and `one_or_more` to define a grammar for mathematical expressions:\n```python\nimport guidance\nfrom guidance import one_or_more, select, zero_or_more\n# stateless=True indicates this function does not depend on LLM generations\n@guidance(stateless=True)\ndef number(lm):\n n = one_or_more(select(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']))\n # Allow for negative or positive numbers\n return lm + select(['-' + n, n])\n\n@guidance(stateless=True)\ndef operator(lm):\n return lm + select(['+' , '*', '**', '/', '-'])\n\n@guidance(stateless=True)\ndef expression(lm):\n # Either\n # 1. A number (terminal)\n # 2. two expressions with an operator and optional whitespace\n # 3. An expression with parentheses around it\n return lm + select([\n number(),\n expression() + zero_or_more(' ') + operator() + zero_or_more(' ') + expression(),\n '(' + expression() + ')'\n ])\n```\n\nThe `@guidance(stateless=True)` decorator makes it such that a function (e.g. `expression`) lives as a stateless grammar that does not get 'executed' until we call call `lm + expression()` or `lm += expression()`. For example, here is an example of _unconstrained_ generation:\n```python\n# Without constraints\nlm = llama2 + 'Problem: Luke has a hundred and six balls. He then loses thirty six.\\n'\nlm += 'Equivalent arithmetic expression: ' + gen(stop='\\n') + '\\n'\n```\n> Equivalent arithmetic expression: 106 - 36 = 60\n\nNotice how the model wrote the right equation but solved it (incorrectly). If we wanted to constrain the model such that it only writes valid expressions (without trying to solve them), we can just append our grammar to it:\n```python\ngrammar = expression()\nlm = llama2 + 'Problem: Luke has a hundred and six balls. He then loses thirty six.\\n'\nlm += 'Equivalent arithmetic expression: ' + grammar + '\\n'\n```\n> Equivalent arithmetic expression: 106 - 36\n\nGrammars are very easy to compose. For example, let's say we want a grammar that generates either a mathematical expression or an expression followed by a solution followed by another expression. Creating this grammar is easy:\n\n```python\nfrom guidance import regex\ngrammar = select([expression(), expression() + regex(' = \\d+; ') + expression()])\n```\nWe can generate according to it:\n```python\nllama2 + 'Here is a math expression for two plus two: ' + grammar\n```\n> Here is a math expression for two plus two: 2 + 2\n\n```python\nllama2 + '2 + 2 = 4; 3+3\\n' + grammar\n```\n> 2 + 2 = 4; 3+3\n\nEven if you don't like thinking in terms of recursive grammars, this formalism makes it easy to constrain generation. For example, let's say we have the following one-shot prompt:\n```python\n@guidance(stateless=True)\ndef ner_instruction(lm, input):\n lm += f'''\\\n Please tag each word in the input with PER, ORG, LOC, or nothing\n ---\n Input: John worked at Apple.\n Output:\n John: PER\n worked: \n at: \n Apple: ORG\n .: \n ---\n Input: {input}\n Output:\n '''\n return lm\ninput = 'Julia never went to Morocco in her life!!'\nllama2 + ner_instruction(input) + gen(stop='---')`\n```\n> Input: Julia never went to Morroco in her life!! \n> Output: \n> Julia: PER \n> never: \n> went: \n> to: \n> Morocc: ORG \n> in: \n> her: \n> life: LOC \n> !!: \n> .: \n\nNotice that the model did not spell the word 'Morocco' correctly. Sometimes the model might also hallucinate a tag that doesn't exist. We can improve this by adding more few-shot examples, etc, but we can also constrain generation to the exact format we want:\n```python\nimport re\nguidance(stateless=True)\ndef constrained_ner(lm, input):\n # Split into words\n words = [x for x in re.split('([^a-zA-Z0-9])', input) if x and not re.match('\\s', x)]\n ret = ''\n for x in words:\n ret += x + ': ' + select(['PER', 'ORG', 'LOC', '']) + '\\n'\n return lm + ret\nllama2 + ner_instruction(input) + constrained_ner(input)\n```\n\n> Input: Julia never went to Morocco in her life!! \n> Output: \n> Julia: PER \n> never: \n> went: \n> to: \n> Morocco: ORG \n> in: \n> her: \n> life: LOC \n> !: \n> !: \n\nWhile `constrained_ner(input)` **is** a grammar that constrains the model generation, it _feels_ like you're just writing normal imperative python code with `+=` and `selects`.\n\n\n\n\n## Stateful control + generation\n### State in immutable objects\nWhenever you do `lm + grammar` or `lm + gen`, `lm + select`, etc, you return an lm object with additional state. For example:\n\n```python\nlm = llama2 + 'This is a prompt' + gen(name='test', max_tokens=10)\nlm += select(['this', 'that'], name='test2')\nlm['test'], lm['test2']\n```\n### Stateful `guidance` functions\nThe guidance decorator is `@guidance(stateless=False)` by default, meaning that a function with this decorator depends on the lm state to execute (either prior state or state generated within the function). For example:\n```python\n@guidance(stateless=False)\ndef test(lm):\n lm += 'Should I say \"Scott\"?\\n' + select(['yes', 'no'], name='answer') + '\\n'\n if lm['answer'] == 'yes':\n lm += 'Scott'\n else:\n lm += 'Not Scott'\n return lm\nllama2 + test()\n```\n> Should I say \"Scott\"?\n> yes\n> Scott\n\n### Example: ReAct\nA big advantage of stateful control is that you don't have to write any intermediate parsers, and adding follow-up 'prompting' is easy, even if the follow up depends on what the model generates.\nFor example, let's say we want to implement the first example of ReAct prompt in [this](https://www.promptingguide.ai/techniques/react), and let's say the valid acts are only 'Search' or 'Finish'. We might write it like this:\n```python\n@guidance\ndef react_prompt_example(lm, question, max_rounds=10):\n lm += f'Question: {question}\\n'\n i = 1\n while True:\n lm += f'Thought {i}: ' + gen(suffix='\\n')\n lm += f'Act {i}: ' + select(['Search', 'Finish'], name='act') \n lm += '[' + gen(name='arg', suffix=']') + '\\n'\n if lm['act'] == 'Finish' or i == max_rounds:\n break\n else:\n lm += f'Observation {i}: ' + search(lm['arg']) + '\\n'\n i += 1\n return lm\n```\nNotice how we don't have to write a parser for Act and argument and hope that the model generates something valid: we enforce it. Notice also that the loop only stops once the model chooses to act with 'Finish' (or once we hit a maximum number of rounds).\n\n### Example: Changing intermediate step of a Chat session\nWe can also hide or change some of what the model generates. For example, below we get a Chat model (notice we use special `role` blocks) to name some experts to answer a question, but we always remove 'Ferriss' from the list if he is mentioned:\n```python\nfrom guidance import user, system, assistant\nlm = llama2\nquery = 'How can I be more productive?'\nwith system():\n lm += 'You are a helpful and terse assistant.'\nwith user():\n lm += f'I want a response to the following question:\\n{query}\\n'\n lm += 'Name 3 world-class experts (past or present) who would be great at answering this.'\nwith assistant():\n temp_lm = lm\n for i in range(1, 4):\n # This regex only allows strings that look like names (where every word is capitalized)\n # list_append appends the result to a list\n temp_lm += f'{i}. ' + gen(regex='([A-Z][a-z]*\\s*)+', suffix='\\n',\n name='experts', list_append=True)\n experts = [x for x in temp_lm['experts'] if 'Ferriss' not in x]\n # Notice that even if the model generates 'Ferriss' above,\n # it doesn't get added to `lm`, only to `temp_lm`\n lm += ', '.join(experts)\nwith user():\n lm += 'Please answer the question as if these experts had collaborated in writing an anonymous answer.'\nwith assistant():\n lm += gen(max_tokens=100)\n```\n> Screenshot here\n\n### Automatic interleaving of control and generation: tool use\nTool use is a common case of stateful control. To make it easy to do so, `gen` calls take `tools` as an optional argument, where each tool is defined by (1) a grammar that triggers its call and captures the arguments (if any), and (2) the actual tool call. Then, as generation unrolls, whenever the model generates something that matches the grammar of a tool call, it (1) stops generation, (2) calls the tool (which can append whatever it wants to the LM session), and (3) continues generation.\n\nFor example, here is how we might implement a calculator tool, leveraging our `expression` grammar above:\n```python\nfrom guidance import capture, Tool\n@guidance(stateless=True)\ndef calculator_call(lm):\n # capture just 'names' the expression, to be saved in the LM state\n return lm + 'calculator(' + capture(expression(), 'tool_args') + ')'\n\n@guidance\ndef calculator(lm):\n expression = lm['tool_args']\n # You typically don't want to run eval directly for save reasons\n # Here we are guaranteed to only have mathematical expressions\n lm += f' = {eval(expression)}'\n return lm\ncalculator_tool = Tool(calculator_call(), calculator)\nlm = llama2 + 'Here are five expressions:\\ncalculator(3 *3) = 33\\ncalculator(2 + 1 * 3) = 5\\n'\nlm += gen(max_tokens=30, tools=[calculator_tool], stop='\\n\\n')\n```\n> Here are five expressions: \n> calculator(3 *3) = 33 \n> calculator(2 + 1 * 3) = 5 \n> calculator(10 / 2) = 5.0 \n> calculator(10 - 1) = 9 \n> calculator(10 * 2) = 20 \n\n### Gsm8k example\nNotice that the calculator is just called seamlessly during generation. Here is a more realistic exampe of the model solving a gsm8k question:\n\n```python\n@guidance\ndef math_with_calc(lm, question):\n # One-shot example\n lm += '''Question: John starts with 2 balls. He then quintupled his number of balls. Then he lost half of them. He then gave 3 to his brother. How many does he have left?\nReasoning:\n1. He quintupled his balls, so he has calculator(2 * 5) = 10 balls.\n1. He lost half, he has calculator(10 / 2) = 5 balls.\n3. He gave 3 to his brother, so he has calculator(5 - 3) = 2 balls.\nAnswer: 2\\n\\n'''\n lm += f'Question: {question}\\n'\n lm += 'Reasoning: ' + gen(max_tokens=200, tools=[calculator_tool], stop='Answer')\n # Only numbers or commas\n lm += 'Answer: ' + gen(regex='[-\\d,]+')\n return lm\n\nquestion = '''Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?'''\nllama2 + math_with_calc(question)\n```\n> Question: Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?\n> Reasoning: \n> 1. She lays 16 eggs per day. \n> 2. She eats 3 for breakfast, so she has calculator(16 - 3) = 13 eggs left. \n> 3. She bakes 4 muffins, so she has calculator(13 - 4) = 9 eggs left. \n> 4. She sell the remainder at the farmers' market for $2 per egg, so she makes calculator(9 * 2) = 18 dollars per day. \n> Answer: 18\n\n### Automatic call grammar for @guidance functions\nYou can also initialize a `Tool` with any `@guidance`-decorated function, and the default call grammar will be like a python call. Here is an example of using multiple such tools in the same `gen` call:\n```python\n@guidance\ndef say_scott(lm, n):\n lm += '\\n'\n for _ in range(int(n)):\n lm += 'Scott\\n'\n return lm\n\n@guidance\ndef say_marco(lm, n):\n lm += '\\n'\n for _ in range(int(n)):\n lm += 'marco\\n'\n return lm\n\ntools = [Tool(callable=say_scott), Tool(callable=say_marco)]\nllama2 + 'I am going to call say_scott and say_marco a few times:\\n' + 'say_scott(1)\\nScott\\n' + gen(max_tokens=20, tools=tools)\n```\n> I am going to call say_scott and say_marco a few times: \n> say_scott(1) \n> Scott \n> \n> say_marco(1) \n> marco \n> \n> say_scott(2) \n> Scott \n> Scott \n> \n> say_marco(2) \n> marco \n> marco \n\n## Jupyter notebook streaming\nexample here\n\n## Text, not tokens\nThe standard greedy tokenizations used by most language models introduce a variety of subtle and powerful biases, which that can have all kinds of unintended consequences for your prompts.\nFor example, take the following prompt, given to gpt-2 (standard greedy tokenization):\n\nhf_gen(prompt, max_tokens=10)\n```python\nfrom transformers import pipeline\npipe = pipeline(\"text-generation\", model=\"gpt2\")\ndef hf_gen(prompt, max_tokens=100):\n return pipe(prompt, do_sample=False, max_length=max_tokens, return_full_text=False)[0]['generated_text']\n\nprompt = 'http:'\nhf_gen(prompt, max_tokens=10)\n```\n> ' //www.youtube.com/watch'\n \n Notice how Note that the output generated by the LLM does not complete the URL with the obvious next characters (two forward slashes). It instead creates an invalid URL string with a space in the middle. Why? Because the string `://` is its own token, and so once the model sees a colon by itself, it assumes that the next characters cannot be `//`; otherwise, the tokenizer would not have used `:`, and instead would have used `://`. This is why there are warnings about ending prompts in whitespace, but the problem is way more pervasive than that: any boundary that may span multiple tokens will cause problems, e.g. notice how a partial word causes incorrect completion:\n\n ```python\nprompt = 'John is a'\nhf_gen(prompt, max_tokens=5)\n```\n> ' former member'\n\n ```python\nprompt = 'John is a fo'\nhf_gen(prompt, max_tokens=5)\n```\n> 'etus'\n\nWhile problematic enough for normal prompts, these problems would be a disaster in the kinds of prompts we wrote in this readme, where there is interleaving of prompting and generation happening multiple times (and thus multiple opportunities for problems). This is why `guidance` implements [token healing](https://towardsdatascience.com/the-art-of-prompt-design-prompt-boundaries-and-token-healing-3b2448b0be38), a feature that deals with prompt boundaries automatically, allowing users to just think in terms of **text** rather than tokens. For example:\n\n```python\nfrom guidance import models\ngpt = models.Transformers('gpt2')\nprompt = 'http:'\ngpt + prompt + gen(max_tokens=10)\n```\n> http://www.youtube.com/watch?v=\n\n\n```python\nprompt = 'John is a fo'\ngpt + prompt + gen(max_tokens=2)\n```\n> John is a former member,\n\n## Fast\n### Integrated stateful control is faster\nWe have full control of the decoding loop in our integration with `transformers` and `llamacpp`, allowing us to add control and additional prompt without any extra cost. \nIf instead we're calling a server, we pay the extra cost of making additional requests, which might be ok if the server has caching, but quickly becomes impractical if the server does not have fine-grained caching. For example, note again the output from the [gsm8k example with calculator](#gsm8k-example) above:\n\n> Reasoning: \n> 1. She lays 16 eggs per day. \n> 2. She eats 3 for breakfast, so she has calculator(16 - 3) = 13 eggs left. \n> 3. She bakes 4 muffins, so she has calculator(13 - 4) = 9 eggs left. \n> 4. She sell the remainder at the farmers' market for $2 per egg, so she makes calculator(9 * 2) = 18 dollars per day. \n> Answer: 18\n\nEvery time we call `calculator`, we have to stop geneation, append the result to the prompt, and resume generation. To avoid slowing down after the first call, a server would need to keep the KV cache up to '3 for breakfast, so she has calculator(16 - 3)', then roll forward generation from that point on. Even servers that _do_ have caching typically have a cache per prompt, and would not be able to do this. Instead, they would consider everything as a new prompt (causing significant slow downs every time `calculator` is called).\n\n### Guidance acceleration\nIn addition to the benefit above, `guidance` calls are often **faster** than running equivalent prompts the traditional way, because we can batch any additional text that is added by the user as execution unrolls (rather than generating it). Take the example below, where we generate a json with `llama2`:\n```python\n@guidance\ndef character_maker(lm, id, description, valid_weapons):\n lm += f\"\"\"\\\n The following is a character profile for an RPG game in JSON format.\n ```json\n {{\n \"id\": \"{id}\",\n \"description\": \"{description}\",\n \"name\": \"{gen('name', stop='\"')}\",\n \"age\": {gen('age', regex='[0-9]+', stop=',')},\n \"armor\": \"{select(options=['leather', 'chainmail', 'plate'], name='armor')}\",\n \"weapon\": \"{select(options=valid_weapons, name='weapon')}\",\n \"class\": \"{gen('class', stop='\"')}\",\n \"mantra\": \"{gen('mantra', stop='\"')}\",\n \"strength\": {gen('strength', regex='[0-9]+', stop=',')},\n \"items\": [\"{gen('item', list_append=True, stop='\"')}\", \"{gen('item', list_append=True, stop='\"')}\", \"{gen('item', list_append=True, stop='\"')}\"]\n }}```\"\"\"\n return lm\na = time.time()\nlm = llama2 + character_maker(1, 'A nimble fighter', ['axe', 'sword', 'bow'])\ntime.time() - a\n```\n> Output\n\nEverything that is not green is not actually generated by the model, and is thus batched (much faster). This prompt takes about 1.2 seconds on an A100 GPU. Now, if we let the model generate everything (as in the roughly equivalent prompt below), it takes roughly `2.67` seconds (not only is it slower, we also have less control over generation). \n```python\n@guidance\ndef character_maker2(lm, id, description):\n lm += f\"\"\"\\\n The following is a character profile for an RPG game in JSON format. It has fields 'id', 'description', 'name', 'age', 'armor', weapon', 'class', 'mantra', 'strength', and 'items (just the names of 3 items)'\n please set description to '{description}'\n ```json\"\"\" + gen(stop='```')\n return lm\na = time.time()\nlm = llama2 + character_maker2(1, 'A nimble fighter')\ntime.time() - a\n```\n> Output, roughly the same, but much slower.", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "agent-utils", "package_url": "https://pypi.org/project/agent-utils/", "platform": null, "project_url": "https://pypi.org/project/agent-utils/", "project_urls": null, "release_url": "https://pypi.org/project/agent-utils/0.1.0/", "requires_dist": [ "diskcache (>=5.2.1,<6.0.0)", "openai (>=0.27.8,<0.28.0)", "pyparsing (>=3.0.0,<4.0.0)", "pygtrie (>=2.4.0,<3.0.0)", "platformdirs (>=2.4.0,<3.0.0)", "tiktoken (>=0.3.0,<0.4.0)", "nest_asyncio (>=1.5.3,<2.0.0)", "msal (>=1.16.0,<2.0.0)", "requests (>=2.26.0,<3.0.0)", "aiohttp (>=3.8.1,<4.0.0)", "ordered_set (>=4.0.2,<5.0.0)", "pyformlang (>=0.1.23,<0.2.0)" ], "requires_python": ">=3.8,<4.0", "summary": "", "version": "0.1.0", "yanked": false, "yanked_reason": null }, "last_serial": 20619306, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "blake2b_256": "5dd55f3ff25b29d9898973fcf207df5bce10df93838515b305f81aa0a1bb0043", "md5": "c04acda34ca8a9c643aa003b06fd06da", "sha256": "1902c6ecae3e1508c4b001010bbb457e836a32a2e25dc80acccb0f75a2f6475e" }, "downloads": -1, "filename": "agent_utils-0.1.0-py3-none-any.whl", "has_sig": false, "md5_digest": "c04acda34ca8a9c643aa003b06fd06da", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.8,<4.0", "size": 98870, "upload_time": "2023-11-14T04:11:37", "upload_time_iso_8601": "2023-11-14T04:11:37.906618Z", "url": "https://files.pythonhosted.org/packages/5d/d5/5f3ff25b29d9898973fcf207df5bce10df93838515b305f81aa0a1bb0043/agent_utils-0.1.0-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "blake2b_256": "db45d13bbfa32eb17c70107cb21efde14d615f35f3d43ddc6d8e69aff2df1676", "md5": "d8fe6b21dbb9a2812ce467cd0ddc89ac", "sha256": "8aeeb443a0aa32ef094f280744e2faf5dc75dd1d63404ce71770adcc0d277d4c" }, "downloads": -1, "filename": "agent_utils-0.1.0.tar.gz", "has_sig": false, "md5_digest": "d8fe6b21dbb9a2812ce467cd0ddc89ac", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.8,<4.0", "size": 89740, "upload_time": "2023-11-14T04:11:40", "upload_time_iso_8601": "2023-11-14T04:11:40.576002Z", "url": "https://files.pythonhosted.org/packages/db/45/d13bbfa32eb17c70107cb21efde14d615f35f3d43ddc6d8e69aff2df1676/agent_utils-0.1.0.tar.gz", "yanked": false, "yanked_reason": null } ] }, "urls": [ { "comment_text": "", "digests": { "blake2b_256": "5dd55f3ff25b29d9898973fcf207df5bce10df93838515b305f81aa0a1bb0043", "md5": "c04acda34ca8a9c643aa003b06fd06da", "sha256": "1902c6ecae3e1508c4b001010bbb457e836a32a2e25dc80acccb0f75a2f6475e" }, "downloads": -1, "filename": "agent_utils-0.1.0-py3-none-any.whl", "has_sig": false, "md5_digest": "c04acda34ca8a9c643aa003b06fd06da", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.8,<4.0", "size": 98870, "upload_time": "2023-11-14T04:11:37", "upload_time_iso_8601": "2023-11-14T04:11:37.906618Z", "url": "https://files.pythonhosted.org/packages/5d/d5/5f3ff25b29d9898973fcf207df5bce10df93838515b305f81aa0a1bb0043/agent_utils-0.1.0-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "blake2b_256": "db45d13bbfa32eb17c70107cb21efde14d615f35f3d43ddc6d8e69aff2df1676", "md5": "d8fe6b21dbb9a2812ce467cd0ddc89ac", "sha256": "8aeeb443a0aa32ef094f280744e2faf5dc75dd1d63404ce71770adcc0d277d4c" }, "downloads": -1, "filename": "agent_utils-0.1.0.tar.gz", "has_sig": false, "md5_digest": "d8fe6b21dbb9a2812ce467cd0ddc89ac", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.8,<4.0", "size": 89740, "upload_time": "2023-11-14T04:11:40", "upload_time_iso_8601": "2023-11-14T04:11:40.576002Z", "url": "https://files.pythonhosted.org/packages/db/45/d13bbfa32eb17c70107cb21efde14d615f35f3d43ddc6d8e69aff2df1676/agent_utils-0.1.0.tar.gz", "yanked": false, "yanked_reason": null } ], "vulnerabilities": [] }