Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add module with grammars #543

Closed
rlouf opened this issue Jan 16, 2024 · 7 comments · Fixed by #562
Closed

Add module with grammars #543

rlouf opened this issue Jan 16, 2024 · 7 comments · Fixed by #562
Labels
examples Linked to usage examples structured generation Linked to structured generation

Comments

@rlouf
Copy link
Member

rlouf commented Jan 16, 2024

A user interface to grammar guided generation was recently added to the repository, but grammars can be hard to craft. I propose that we add a module in outlines that contains grammars that are ready to use, as was suggested several times.

Using a grammar would look like:

import outlines

sql_grammar = outlines.grammars.sql

An alternative, which was the original plan, is to have a separate library that contains grammars. We might follow this route down the line, if it turns out there's a larger audience for them. But for now it is more cumbersome for users.

@rlouf rlouf added examples Linked to usage examples structured generation Linked to structured generation labels Jan 16, 2024
@viktor-ferenczi
Copy link

viktor-ferenczi commented Jan 17, 2024

We have multiple SQL dialects, but adding a few common ones makes sense. Like full grammar for SQLite, PostgreSQL and MSSQL SELECT syntax would be used very commonly. Real-world problems I observed include keeping the model to quote table and column names properly and not confusing such quoting with string literals.

Another examples would be approximate (non-validating) grammars for the most common programming languages. It could potentially save a lot of retries with small code generation models especially when they are running quantized and prone to put random characters breaking syntax here and there.

@lapp0
Copy link
Collaborator

lapp0 commented Jan 17, 2024

That's a great idea and would lower the barrier to entry.

Here's a good repo: https://github.com/ligurio/lark-grammars/tree/master/lark_grammars/grammars

Small note: We would need to source a modified common.lark which has a regular definition for ESCAPED_STRING since larks definition is non-regular and cannot be interpreted by interegular:

https://github.com/lark-parser/lark/blob/262ab71d497a8814f0ca42ca468b923fdb47a3c7/lark/grammars/common.lark#L27

_STRING_ESC_INNER: _STRING_INNER /(?<!\\)(\\\\)*?/

ESCAPED_STRING : "\"" _STRING_ESC_INNER "\""

@rlouf
Copy link
Member Author

rlouf commented Jan 18, 2024

This repository contains a bunch of Lark grammar, with a permissive licence, but looks unmaintained. We would need to ask the owner of the repository first, and be careful with licence and attribution.

@viktor-ferenczi
Copy link

This repository contains a bunch of Lark grammar, with a permissive licence, but looks unmaintained. We would need to ask the owner of the repository first, and be careful with licence and attribution.

That repository has the ISC License.

"It is functionally equivalent to the simplified BSD and MIT licenses, but without language deemed unnecessary"

@lapp0 lapp0 mentioned this issue Jan 19, 2024
5 tasks
@lapp0
Copy link
Collaborator

lapp0 commented Jan 19, 2024

Hi @amoffat

I reviewed the SQL select statement lark files you wrote in these three files. They're easily the most comprehensive and well written I've found on GitHub!

The AGPL license of HeimdaLLM is incompatible with Outlines' Apache 2 license. Would you be willing to allow us to use your grammars in Outlines under the Apache 2 license so users of Outlines can benefit from them as well?

Thanks,
Andrew Lapp

@amoffat
Copy link

amoffat commented Jan 20, 2024

Hi 👋

I can only offer the AGPL or the commercial license for HeimdaLLM IP at this time, so I must decline the request to re-license those files. FWIW, those grammars are very (intentionally) limited and integrate tightly with the HeimdaLLM parser logic, so I think they would not be as useful as you would need.

Also, big fan of outlines. You all are doing very cool stuff 👍

@lapp0
Copy link
Collaborator

lapp0 commented Jan 20, 2024

Thanks for your consideration regardless @amoffat!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
examples Linked to usage examples structured generation Linked to structured generation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants