-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch parser library to megaparsec #16
Comments
Happy is much closer to the Bison parser that Nix uses, and probably of comparable speed. Does anyone mind if I switch hnix away from trifecta/parsec? |
I do. We just need to optimize the present parser. I think many more people are comfortable with reading and editing Parsec, than they are using Happy. So, for a speed gain, we'd reduce our pool of candidate contributors. I'm pretty sure we can get there with Trifecta, we probably just have some nasty backtracking happening in a few key places. |
Hmm. Well, I for one am not sure how to get Parsec/Trifecta to parse the Happy has a GLR mode which seems to be working, although it is proving difficult to integrate it with the rest of hnix. |
Wow, I didn't know that was legal syntax. Crazy. |
The performance with respect to the C parser seems to have improved:
Are further improvements expected? |
I certainly hope so; an order of magnitude seems a bit excessive to me. I'm not sure a proper review has been done of the back-tracking in |
@jwiegley sounds interesting. How would I tailor |
You'll need to switch over to using |
Oh, and to get a dump, you'll need to use |
Thanks, @jwiegley. Things don't quite compile with I'll try switching to |
I managed to get something working with The code is a hack. I had to add orphan instances for CharParsing, TokenParsing and Parsing. The package requires trifecta in addition to parsec-free (due to some seeping of DeltaParsing into the code). I also had to comment out some code where I haven't worked through the type errors. Though you'd never want to merge the code, I thought it might be helpful to publish it - see my parsec-free-hack branch. |
The nodes in parentheses represents failed parsing attempts, where the failure did not consume input. Only For this very simple example there are 1,482 failed We should see if reorganizing the parser can allow us to reduce the number of failed attempts, of which there are 4,309 total for your example. I can only imagine what |
Current status:
|
Yeah, that kind of slow-down is really not acceptable. |
At the moment Trifecta is very slow and memory consumptive. It dominates the profiling output (though maybe @ekmett knows what we're doing wrong). Yet it gives such great error message, it would be hard to give up. I wonder if we could use both attoparsec and Trifecta, and if parsing fails with the former, reparse with the latter for the error message. I mention this because right now parsing the entirety of nixpkgs takes several minutes, which is unacceptable. |
Here's the profiling results for all of nixpkgs. It appears that much of the slowness is due to the enormous turn-over in temporary allocations:
|
Here's what these numbers and errors need to be compared with, by running:
which gives:
|
With profiling disabled and |
@ekmett Suggests the attoparsec-then-trifecta approach, since we've already abstracted the parser using the parsers library. |
I'm actually just going to switch to |
While you're at it you may want to keep in mind the error messages. I just discovered that |
This work has started on the |
@mightybyte I think the main issue with the error message is that the inner string needs to get |
I feel like @taktoa did some megaparsec thing that might be useful, but maybe that was just for Ninja. |
I've never parsed Nix with |
@mightybyte Check our how the NixLanguageTests print their error messages. I’ll move that to Main.hs if you don’t get to it before me. |
@jwiegley Let me know if there's stuff that can be done, I've worked with a number of different Parsec-inspired libraries in Haskell, so I feel like I'd be comfortable helping out some with the migration. |
We're now switched to
I find it interesting that Nix spends so little time doing any computing, so that the 2.5 minutes is probably being spent just dumping the pretty print results into The In terms of pure parsing only, I see the following with
And here are the profiling results: |
@cbarrett Would you like to do some performance work in |
@jwiegley I'll take a look this week and let you know whether or not I find anything |
@nh2 Nice idea, but it didn't make a change in the overall speed. The profiling results show that it's due to allocation in megaparsec, rather than decoding in readFile. |
@nh2 And strangely enough, using Text.readFile it takes 45s to parse all of nixpkgs, but following Michael's recommended approach takes 85s. |
@jwiegley I can't reproduce it with the following code: module Main (main) where
import Control.Monad (void)
import qualified Data.ByteString as BS
import qualified Data.Text as T
import qualified Data.Text.Encoding as T
import qualified Data.Text.IO as T
import System.Environment (getArgs)
main :: IO ()
main = do
args <- getArgs
case args of
["bs", path] -> print =<< (fmap T.length <$> T.decodeUtf8' <$> BS.readFile path)
["text", path] -> print =<< (T.length <$> T.readFile path)
_ -> error "takes 2 arguments (first is 'bs' or 'text', second is the file)" and a 600 MB example text file:
So for me |
@jwiegley A few more points
|
To reduce boilerplate and make the configuration more terse - used Travis way of providing secret key (via env), discovered that style of provisioning does not work in GitHub Cachix, it currently works only in another way, through explicitly providing Cachix GitHub Action internal option for every Cachix GitHub Action use (every build). M .github/workflows/Nixpkgs-Linux-additional.yml M .github/workflows/Nixpkgs-Linux-main.yml M .github/workflows/Nixpkgs-macOS.yml
The parser is currently quite slow compared to the C parser:
compare that to hnix:
I don't know how to improve performance though.
The text was updated successfully, but these errors were encountered: