Skip to content
This repository has been archived by the owner on Mar 21, 2019. It is now read-only.

Investgate Piggybacking off the Haskell Compiler Infrastructure (or other libraries) #14

Open
CMCDragonkai opened this issue Apr 21, 2018 · 9 comments
Assignees
Labels
research Requires research

Comments

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Apr 21, 2018

Since we're not writing a full language, we would like to piggy back Haskell's compilation infrastructure as much as possible.

The good thing is that Haskell's GHC is really modular. So we can just reuse much of it.

For example there's the GHC as a library: https://hackage.haskell.org/package/ghc

And there's the hint library that wraps around the GHC library into an easier to consume API: https://hackage.haskell.org/package/hint

More details on this here: http://downloads.haskell.org/~ghc/latest/docs/html/users_guide/extending_ghc.html

Discussed with @olligobber

@CMCDragonkai CMCDragonkai self-assigned this Apr 21, 2018
@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Apr 21, 2018

GHC currently has type checker plugins capability which allows people to extend GHC with their own type checking code to type check new things:

http://christiaanb.github.io/posts/type-checker-plugin/

This may or may not be useful, since it's plugging into the GHC ecosystem to typecheck Haskell code (plus new code that you write).

Whereas we will have a target language we want to type check that is an external DSL. If we are converting our external DSL into Haskell ADTs, plugins like these would type check the Haskell code that we write for the compiler/interpreter instead of the target language.

An example of type checking would be like the units of measure: https://github.com/adamgundry/uom-plugin/

The examples show Haskell source file that has special types added in using the quasiquoter. The quasiquoter seems like it is still necessary to make sure that the syntax is still correctly parsed as this is a type checker plugin, not a syntax plugin.

@CMCDragonkai
Copy link
Member Author

There's a plugin system here: https://github.com/stepcut/plugins described by: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.9.7627&rep=rep1&type=pdf

It appears to show how you can load Haskell code at runtime.

The introduction to the paper shows an interesting idea. What if we did write our type system rules as an extension/plugin to the GHC system, and simply translate our Architect expressions directly into standard Haskell code.

The only problem would be error reporting. How to make sure that makes sense to the end user who is interacting at the Architect level, and not at the Haskell level.

@CMCDragonkai
Copy link
Member Author

@CMCDragonkai CMCDragonkai changed the title Investgate Piggybacking off the Haskell Compiler Infrastructure Investgate Piggybacking off the Haskell Compiler Infrastructure (or other libraries) May 30, 2018
@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented May 30, 2018

I'm working of the hnix compiler. https://github.com/haskell-nix/hnix

Since nix was a DSL for just package building, but also pure lazy and functional. I figured it's a good place to start.

After studying it for a couple weeks here are my comments.

  • I've only looked at its parser and AST types
  • It's using quite advanced Haskell machinery
  • Most importantly it's AST is NExprF and NExpr which makes use of type-level fix data structures
  • The reason to do this is to make recursive data structures explicitly recursive (that is they must take themselves again, rather than just encapsulating the recursion)
  • Doing this has advantages in that it makes creating recursive operations on the AST more generic
  • For more theory on this technique see: http://mainisusuallyafunction.blogspot.com.au/2010/12/type-level-fix-and-generic-folds.html and http://blog.sumtypeofway.com/an-introduction-to-recursion-schemes/ and http://newartisans.com/2018/04/win-for-recursion-schemes/
  • One downside to the approach, is that it adds a lot more syntax noise and raises the abstraction level. The syntax noise is mostly due to having a "functor" NExprF and then the Fix applied to data structures. And when unpacking things, you always need to write out those constructs.
  • Another downside is that type class derivations don't really work anymore (for example show no longer works on the NExpr without some extra work). So you end up needing to derive slightly different classes. But this is also complicated, so we end up using things like deriving-compat packages and template haskell to perform derivation macros like: $(deriveShow1 ''...). Google "deriving typeclass show for Fix types haskell"'. While I found the correct libraries to achieve this, I'm still not sure why this functionality is in deriving-compat.
  • The AST isn't just defined, the hnix library ended up abstracting on top of NExprF in order to create an "annotated" AST. This annotated version carries source locations in the AST. To do this, they make use of the Compose functor. This adds even more syntax overhead to alot of the functionality, and in order to reduce it, they use bidirectional patterns (which does not have a lot of documentation at the moment). I don't like using these things when there's no documentation, so in my adaptation, I've removed all of this magic.
  • One issue I've been wrestling with how to interact with overloaded operators. I was hoping to be able to easily build in overloaded operators into the language, but this ends up being quite difficult with megaparsec. The makeExprParser takes just a static list of operators to create an expression parser. But discussing this on Megaparsec issue list, it may be possible later in the future to turn the operator table into state that can be mutated as the parser is parsing, and then mutate the parser when it encounters new operators. However this is also complicated by the fact that the operator table only contains operators that is binary, unary, prefix... etc. Whereas in general Haskell operators behave just like normal functions. Their infix powers only kick in when they are used without () parantheses. The operators can be unary, binary, trinary.. etc. You can easily define operators with multiple arguments. They just return a function after you apply 2 things to them. Since we are just working at a DSL for orchestration, this is not the most important. So I've left this out, and instead I only have 1 operator, which is whitespace and it is used for function application.
  • It appears that makeExprParser is mainly used to put together the operators and the terms. If we only have function application, then all of our terms are either functions of some sort or some primitive construct.
  • Syntax wise, the most complex syntax appears to be things interacting with attribute sets. I'm not sure whether to port that over.
  • In a lot of ways, I feel that there's alot of noise syntax and complexity wise in the hnix codebase, and in my adaptation, I want to simplify things more so its clearer.
  • Top level expression in Nix refers to what starts in a file. Architect expressions are also in files, but addressability starts at any binding. So our top level is similar to haskell, an implicit "attribute set" with top level bindings. Whereas nix allows functions and other stuff to be the top level expression for a file. This is not important to us, since imports don't work directly based on files, but instead based on any expression.
  • There's a recursion between the top level parser and the makeExprParser term, where the term will refer back to the top level parser in many cases.

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented May 30, 2018

Weird syntax I discovered by investigating the hnix source code (nix syntax that I didn't know was legal):

{ a = 3; }.b or 4
# 4
({ a = { a = 3; }; }).a         .a
# 3
{ a = { a = 3;}; }.a or { a = 4; }    .a
# { a = 3; }

@CMCDragonkai
Copy link
Member Author

The book Practical Foundations of Programming Langauges first starts off with simple languages like T or M. Eventually it reaches chapter 19 which it introduces language PCF.

PCF stands for "Programming Computable Functions"

https://en.wikipedia.org/wiki/Programming_Computable_Functions

Every other advanced features like exceptions, continuations... etc are then built as "extensions" to PCF.

Here's a worked through example taking a PCF reified as a haskell data type and building a compiler that targets C. https://jozefg.bitbucket.io/posts/2015-03-24-pcf.html

It uses a cool little library called c-dsl that builds on top of language-c http://hackage.haskell.org/package/c-dsl and http://hackage.haskell.org/package/language-c for generation of the C code.

Architect language isn't generating C any time soon, but it's interesting that even the Nix language ultimately shares its origin with these extensions on top of lambda calculus.

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Jun 26, 2018

I'm currently only working on the frontend of the language (parser and AST). Since this is right now being based on Nix, the language is untyped, but I'm going to try to work types into it, while also removing some Nix stuff that isn't relevant.

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Jul 7, 2018

Some potential resources for exploring implementing type checkers (with polymorphism):

I want to explore PCF since it's the most basic implementation of a typed lambda calculus that has some extra features. And Nix doesn't have types, so while it's a good way of deriving our requirements for laziness and parsing and functional semantics, there's no type theory involved! So PCF is a good starting point.

Another megaparsec article:

@CMCDragonkai CMCDragonkai added enhancement New feature or request research Requires research and removed enhancement New feature or request labels Aug 14, 2018
@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Nov 8, 2018

The recursive nature of attribute sets or let-rec declarations combined with the defaulting syntax allows you do some interesting things here:

let f = { a ? 1, b ? a }: a + b; in f {}
# 2

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
research Requires research
Development

No branches or pull requests

1 participant