Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch some lists to Set from containers #3761

Closed
wants to merge 16 commits into from
Closed

Conversation

NoahCardoso
Copy link
Collaborator

Nub is slow and suggestion 209 pops up a lot. When working with lists that have Ord we can use toList and fromList from Data.Set. Nub has quadratic complexity because it checks each element of the list against every other element of the list. If you convert your list to a Set, you can avoid this quadratic complexity because Set does not allow duplicate elements.
image

@@ -434,7 +435,7 @@ mkTraceabilitySec (TraceabilityProg progs) si@SI{_sys = sys} = TG.traceMGF trace

-- | Helper to get headers of rows and columns
header :: ([UID] -> [UID]) -> SystemInformation -> [Sentence]
header f = TM.traceMHeader (f . nub . Map.keys . (^. refbyTable))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems odd: I don't think Map can have duplicate keys, so it seems to be that this nub is entirely redundant!

@@ -468,7 +469,7 @@ callMapTransClosure = over callMap tClosure
traceCalls :: Map QualifiedName [QualifiedName] -> [QualifiedName] ->
[QualifiedName]
traceCalls _ [] = []
traceCalls cm (c:cs) = nub $ c : traceCalls cm (nub $ cs ++
traceCalls cm (c:cs) = toList . fromList $ c : traceCalls cm (toList . fromList $ cs ++
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't have had two calls to nub - may as well do this only once.

@@ -90,7 +91,7 @@ eNamesRI' (UpFrom il) = eNames' (snd il)

-- | Get dependencies from an equation.
eDep :: CodeExpr -> [UID]
eDep = nub . eNames
eDep = toList . fromList . eNames
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should have our own 'utility' function that does exactly this, instead of inlining toList . fromList everywhere (thus making the code obscure).

Copy link
Collaborator

@balacij balacij May 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fairly certain that toList . fromList would also sort the list, since Haskell's internal Set is really an ordered binary tree (like we discussed yesterday, @NoahCardoso). That is to say, I'm a bit skeptical that eDep uses care for the order of the UIDs if nothing changed in stable/ (which is a good thing). Have you tried switching the type here from a list to a set instead? Our eDep-like (UID-dependency searching) tools have a lot of nub usage, so switching to a set type might make sense.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@balacij should I change every list of UIDs to Set of UIDs?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there will be many, but I'm not sure if all should be changed. For example, if we have a list of UIDs in a chunk, we will need to check if order matters/makes sense.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should chunks should be kept as a list but everything else should be set when possible?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's likely a lot more subtle than that: the question is when does order matter, and when it doesn't. And sometimes order matters only for display purposes.

We do over-use lists, but that's quite typical of most Haskell projects which make lists too easy to use!

@@ -76,4 +76,4 @@ eNamesRI' (UpFrom il) = eNames' (snd il)

-- | Get dependencies from an equation.
eDep :: Expr -> [UID]
eDep = nub . eNames
eDep = toList . fromList . eNames
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NoahCardoso If you're struggling with figuring out which things should be Sets and which should be lists, I think there are two ways you can go about this:

  1. Research first.
  2. Make a hypothesis and test it.

In other words,

  1. Traverse over all things that use eDep, and then all things that use those things, and figure out if order is somehow important and why it is (or why it shouldn't be...). Thankfully, there isn't too much usage of eDep, this might be easier and better, logically.
  2. Semi-confidently assume that this function should return a Set rather than a list. Our hypothesis of sorts -- the comment just says "Get dependencies from an equation," but it doesn't mention any sort of ordering of the dependencies, it just lists what a specific expression depends on (i.e., variables and functions). You can try this, and see what breaks.

Similar to eDep, I would imagine this extends to sdep, shortdep, eNames, but you can figure it out as you go along.

I took a quick peek at `eDep` usage. It doesn't like there's too much to traverse
λ ~/Programming/Drasil/ glassbrIntro rg "[^a-z]eDep" -ths
code/drasil-code/lib/Language/Drasil/Chunk/CodeBase.hs
20:codevars e m = map (varResolve m) $ eDep e
24:codevars' e m = map (varResolve m) $ nub $ eDep' e

code/drasil-lang/lib/Language/Drasil/Expr/Development.hs
9:  , eDep, eNames, eNames', eNamesRI
15:import Language.Drasil.Expr.Extract (eDep, eNames, eNames', eNamesRI)

code/drasil-lang/lib/Language/Drasil/CodeExpr/Extract.hs
2:    eDep, eDep',

code/drasil-lang/lib/Language/Drasil/CodeExpr/Development.hs
11:    eDep, eDep',
25:import Language.Drasil.CodeExpr.Extract (eDep, eDep', eNamesRI, eNamesRI')

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eDep is only used in codevars. I have eDep as a Set currently. This is roughly the order of the other functions that relate to eDep. eDep -> codevars -> fstvars -> fstdecl -> genFunc -> genModDef, genModFuncs ... -> ... generates things. Once things start to get generated in gool I assume that order matters. Codervars, fstvars and fstdecl might be able to be Sets however codevars is a list of CodeChunks which are not ord

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also meDep is used for printing. Should it be? I thought it was only for getting dependencies and its order shouldn't matter.

-- | Get dependencies from an equation.
meDep :: ModelExpr -> [UID]
meDep = nub . meNames

@balacij balacij marked this pull request as draft June 6, 2024 17:24
@balacij
Copy link
Collaborator

balacij commented Jun 6, 2024

Since this is a WIP branch now, I changed the status to a Draft PR.

@NoahCardoso NoahCardoso changed the base branch from master to main June 14, 2024 19:49
@@ -468,7 +469,7 @@ callMapTransClosure = over callMap tClosure
traceCalls :: Map QualifiedName [QualifiedName] -> [QualifiedName] ->
[QualifiedName]
traceCalls _ [] = []
traceCalls cm (c:cs) = nub $ c : traceCalls cm (nub $ cs ++
traceCalls cm (c:cs) = toList . fromList $ c : traceCalls cm ( cs ++
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that if you remove the existing toList . fromLists, this PR will be in a mergeable state, no? I think we should do that, since you're now going through the nubd lists noted by stan and figuring out which ones really need a nub and which ones really want to be a different type.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this example, you can but for nubSort toList . fromList is used to remove all duplicates from the list
nubSort :: Ord a => [a] -> [a]
nubSort = toList . fromList . sort

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, we can just revert that one to nub as it was before this PR. This way, we minimize the changes in the PR and put it in an immediately merge-able state with just the few changes from lists to Sets.

After this PR, we can figure out which nubs involve a list that should be a Set or not, and go from there. However, since toList . fromList sorts lists, I think that is also a clue that we can also change this area to using a Set (since a Set in Haskell is ordered already).

@NoahCardoso
Copy link
Collaborator Author

NoahCardoso commented Jul 2, 2024

The order of eNames does not matter however the order of eNames' (and meDep in drasil-lang/lib/Language/Drasil/ModelExpr/Extract.hs) does in drasil-lang/lib/Language/Drasil/CodeExpr/Extract.hs. I believe the only difference between the two is that the order of eNames' matters for printing. Is this important? I think that unless there is a specific reason that the order is important we should try to replace eNames' with eNames.

@@ -192,4 +192,4 @@ getConstraints cm cs = concat $ mapMaybe (\c -> Map.lookup (c ^. uid) cm) cs
-- | Get a list of 'CodeChunk's from a constraint.
constraintvars :: ConstraintCE -> ChunkDB -> [CodeChunk]
constraintvars (Range _ ri) m =
map (codeChunk . varResolve m) $ nub $ eNamesRI ri
map (codeChunk . varResolve m) $ nub $ toList $ eNamesRI ri
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since eNamesRI now returns a Set, isn't the nub redundant?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it has been removed

@balacij
Copy link
Collaborator

balacij commented Aug 21, 2024

Sorry, why was this closed? Should we keep the branch or delete it? Was there anything we could salvage @NoahCardoso ?

@NoahCardoso
Copy link
Collaborator Author

Yes anything of use was adding in #3930

@JacquesCarette JacquesCarette deleted the fixUsageOfNub branch August 23, 2024 03:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants