-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch some lists to Set from containers #3761
Conversation
@@ -434,7 +435,7 @@ mkTraceabilitySec (TraceabilityProg progs) si@SI{_sys = sys} = TG.traceMGF trace | |||
|
|||
-- | Helper to get headers of rows and columns | |||
header :: ([UID] -> [UID]) -> SystemInformation -> [Sentence] | |||
header f = TM.traceMHeader (f . nub . Map.keys . (^. refbyTable)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems odd: I don't think Map
can have duplicate keys, so it seems to be that this nub
is entirely redundant!
@@ -468,7 +469,7 @@ callMapTransClosure = over callMap tClosure | |||
traceCalls :: Map QualifiedName [QualifiedName] -> [QualifiedName] -> | |||
[QualifiedName] | |||
traceCalls _ [] = [] | |||
traceCalls cm (c:cs) = nub $ c : traceCalls cm (nub $ cs ++ | |||
traceCalls cm (c:cs) = toList . fromList $ c : traceCalls cm (toList . fromList $ cs ++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't have had two calls to nub
- may as well do this only once.
@@ -90,7 +91,7 @@ eNamesRI' (UpFrom il) = eNames' (snd il) | |||
|
|||
-- | Get dependencies from an equation. | |||
eDep :: CodeExpr -> [UID] | |||
eDep = nub . eNames | |||
eDep = toList . fromList . eNames |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should have our own 'utility' function that does exactly this, instead of inlining toList . fromList
everywhere (thus making the code obscure).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fairly certain that toList . fromList
would also sort the list, since Haskell's internal Set
is really an ordered binary tree (like we discussed yesterday, @NoahCardoso). That is to say, I'm a bit skeptical that eDep
uses care for the order of the UID
s if nothing changed in stable/
(which is a good thing). Have you tried switching the type here from a list to a set instead? Our eDep
-like (UID-dependency searching) tools have a lot of nub
usage, so switching to a set type might make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@balacij should I change every list of UIDs to Set of UIDs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there will be many, but I'm not sure if all should be changed. For example, if we have a list of UIDs in a chunk, we will need to check if order matters/makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should chunks should be kept as a list but everything else should be set when possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's likely a lot more subtle than that: the question is when does order matter, and when it doesn't. And sometimes order matters only for display purposes.
We do over-use lists, but that's quite typical of most Haskell projects which make lists too easy to use!
@@ -76,4 +76,4 @@ eNamesRI' (UpFrom il) = eNames' (snd il) | |||
|
|||
-- | Get dependencies from an equation. | |||
eDep :: Expr -> [UID] | |||
eDep = nub . eNames | |||
eDep = toList . fromList . eNames |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NoahCardoso If you're struggling with figuring out which things should be Sets and which should be lists, I think there are two ways you can go about this:
- Research first.
- Make a hypothesis and test it.
In other words,
- Traverse over all things that use
eDep
, and then all things that use those things, and figure out if order is somehow important and why it is (or why it shouldn't be...). Thankfully, there isn't too much usage ofeDep
, this might be easier and better, logically. - Semi-confidently assume that this function should return a Set rather than a list. Our hypothesis of sorts -- the comment just says "Get dependencies from an equation," but it doesn't mention any sort of ordering of the dependencies, it just lists what a specific expression depends on (i.e., variables and functions). You can try this, and see what breaks.
Similar to eDep
, I would imagine this extends to sdep
, shortdep
, eNames
, but you can figure it out as you go along.
I took a quick peek at `eDep` usage. It doesn't like there's too much to traverse
λ ~/Programming/Drasil/ glassbrIntro rg "[^a-z]eDep" -ths
code/drasil-code/lib/Language/Drasil/Chunk/CodeBase.hs
20:codevars e m = map (varResolve m) $ eDep e
24:codevars' e m = map (varResolve m) $ nub $ eDep' e
code/drasil-lang/lib/Language/Drasil/Expr/Development.hs
9: , eDep, eNames, eNames', eNamesRI
15:import Language.Drasil.Expr.Extract (eDep, eNames, eNames', eNamesRI)
code/drasil-lang/lib/Language/Drasil/CodeExpr/Extract.hs
2: eDep, eDep',
code/drasil-lang/lib/Language/Drasil/CodeExpr/Development.hs
11: eDep, eDep',
25:import Language.Drasil.CodeExpr.Extract (eDep, eDep', eNamesRI, eNamesRI')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eDep is only used in codevars. I have eDep as a Set currently. This is roughly the order of the other functions that relate to eDep. eDep -> codevars -> fstvars -> fstdecl -> genFunc -> genModDef, genModFuncs ... -> ... generates things. Once things start to get generated in gool I assume that order matters. Codervars, fstvars and fstdecl might be able to be Sets however codevars is a list of CodeChunks which are not ord
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also meDep
is used for printing. Should it be? I thought it was only for getting dependencies and its order shouldn't matter.
Drasil/code/drasil-lang/lib/Language/Drasil/ModelExpr/Extract.hs
Lines 87 to 89 in 864e138
-- | Get dependencies from an equation. | |
meDep :: ModelExpr -> [UID] | |
meDep = nub . meNames |
Since this is a WIP branch now, I changed the status to a Draft PR. |
@@ -468,7 +469,7 @@ callMapTransClosure = over callMap tClosure | |||
traceCalls :: Map QualifiedName [QualifiedName] -> [QualifiedName] -> | |||
[QualifiedName] | |||
traceCalls _ [] = [] | |||
traceCalls cm (c:cs) = nub $ c : traceCalls cm (nub $ cs ++ | |||
traceCalls cm (c:cs) = toList . fromList $ c : traceCalls cm ( cs ++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that if you remove the existing toList . fromList
s, this PR will be in a mergeable state, no? I think we should do that, since you're now going through the nub
d lists noted by stan
and figuring out which ones really need a nub
and which ones really want to be a different type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this example, you can but for nubSort toList . fromList
is used to remove all duplicates from the list
nubSort :: Ord a => [a] -> [a]
nubSort = toList . fromList . sort
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, we can just revert that one to nub
as it was before this PR. This way, we minimize the changes in the PR and put it in an immediately merge-able state with just the few changes from lists to Set
s.
After this PR, we can figure out which nub
s involve a list that should be a Set
or not, and go from there. However, since toList . fromList
sorts lists, I think that is also a clue that we can also change this area to using a Set
(since a Set
in Haskell is ordered already).
The order of |
@@ -192,4 +192,4 @@ getConstraints cm cs = concat $ mapMaybe (\c -> Map.lookup (c ^. uid) cm) cs | |||
-- | Get a list of 'CodeChunk's from a constraint. | |||
constraintvars :: ConstraintCE -> ChunkDB -> [CodeChunk] | |||
constraintvars (Range _ ri) m = | |||
map (codeChunk . varResolve m) $ nub $ eNamesRI ri | |||
map (codeChunk . varResolve m) $ nub $ toList $ eNamesRI ri |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since eNamesRI
now returns a Set
, isn't the nub
redundant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it has been removed
440a2dd
to
d7d093b
Compare
d7d093b
to
6353236
Compare
6353236
to
2496a8e
Compare
Sorry, why was this closed? Should we keep the branch or delete it? Was there anything we could salvage @NoahCardoso ? |
Yes anything of use was adding in #3930 |
Nub is slow and suggestion 209 pops up a lot. When working with lists that have
Ord
we can use toList and fromList from Data.Set. Nub has quadratic complexity because it checks each element of the list against every other element of the list. If you convert your list to a Set, you can avoid this quadratic complexity because Set does not allow duplicate elements.