Skip to content

Latest commit

 

History

History
1634 lines (1139 loc) · 79.4 KB

notes.org

File metadata and controls

1634 lines (1139 loc) · 79.4 KB

Contents

Code

These come in handy while coding.

Profiling

Why aren’t macros like these in some default package? Sure beats having to type (mapcar (lambda (it) (...it...)) list) over and over.

(defmacro it (&rest body)
  `(lambda (it)
     ,@body))
(defmacro mapit (seq &rest body)
  `(mapcar (lambda (it)
             ,@body)
           ,seq))

This makes it easy to profile code:

(defmacro profile-rifle (times &rest body)
  `(let (output)
     (dolist (p '("helm-" "org-" "string-" "s-" "buffer-" "append" "delq" "map" "list" "car" "save-" "outline-" "delete-dups" "sort" "line-" "nth" "concat" "char-to-string" "rx-" "goto-" "when" "search-" "re-"))
       (elp-instrument-package p))
     (dotimes (x ,times)
       ,@body)
     (elp-results)
     (elp-restore-all)
     (point-min)
     (forward-line 20)
     (delete-region (point) (point-max))
     (setq output (buffer-substring-no-properties (point-min) (point-max)))
     (kill-buffer)
     (delete-window)
     output))

Context-splitting

Prototype code, keeping for future reference.

(let* ((num-context-words 2)
       (needle "needle")
       (haystack "one two three needle four five six")
       (hay (s-split needle haystack))
       (left-hay (s-split-words (car hay)))
       (right-hay (s-split-words (nth 1 hay))))
  (concat "..."
          (s-join " " (subseq left-hay (- num-context-words)))
          " " needle " "
          (s-join " " (subseq right-hay 0 num-context-words))
          "..."))

;; Multiple needles
(let* ( (needles '("needle" "pin"))
        (haystack "one two three \" needle not pin four five six seven eight pin nine ten eleven twelve"))
  (cl-loop for needle in needles
           append (cl-loop for re = (rx-to-string `(and (repeat 1 ,helm-org-rifle-context-words (and (1+ (not space))
                                                                                                     (or (1+ space)
                                                                                                         word-boundary)))
                                                        (group (eval needle))
                                                        (repeat 1 ,helm-org-rifle-context-words (and (or word-boundary
                                                                                                         (1+ space))
                                                                                                     (1+ (not space))))))
                           for m = (string-match re haystack end)
                           for end = (match-end 1)
                           while m
                           collect (concat "..." (match-string-no-properties 0 haystack) "..."))))

Slow code that splits on word boundaries

This code splits on word boundaries, but it’s very slow. Profiling it showed the vast majority of the time was in string-match. I’m guessing the regexp is too complicated or unoptimized.

;; Reduce matching lines to matched word with context
(setq matched-words-with-context
      (cl-loop for line in (map 'list 'car matching-lines-in-node)
               append (cl-loop for token in input
                               for re = (rx-to-string
                                         `(and (repeat 0 ,helm-org-rifle-context-words
                                                       (and (1+ (not space))
                                                            (or (1+ space)
                                                                word-boundary)))
                                               (group (eval token))
                                               (repeat 0 ,helm-org-rifle-context-words
                                                       (and (or word-boundary
                                                                (1+ space))
                                                            (1+ (not space))))))

                               ;;  This one line uses about 95% of the runtime of this function
                               for m = (string-match re line end)

                               for end = (match-end 1)
                               when m
                               collect (match-string-no-properties 0 line))))

Faster version that cuts off mid-word

This version is much, much faster, but instead of matching on word boundaries, it just matches so-many characters before and after the token. It’s not quite as nice, but the speedup is worth it, and it seems good enough.

This is the version currently in-use.

(setq matched-words-with-context
                    (cl-loop for line in (map 'list 'car matching-lines-in-node)
                             append (cl-loop for token in input
                                             for re = (rx-to-string '(and (repeat 0 25 not-newline)
                                                                          (eval token)
                                                                          (repeat 0 25 not-newline)))
                                             for m = (string-match re line end)

                                             for end = (match-end 1)
                                             when m
                                             collect (match-string-no-properties 0 line))))

Fix it

  • State “DONE” from “TODO” [2016-04-01 Fri 22:55]
    Okay, it works now. Here’s hoping I don’t break it again.
  • State “TODO” from “TODO” [2016-04-01 Fri 19:03]

[2016-04-01 Fri 19:03] Somehow I broke it. Now to fix it…

I don’t understand why this loop isn’t working like I want it to:

(cl-loop with end
         for line in (mapcar 'car matching-lines-in-node)
         for token in input
         for re = (rx-to-string `(and (repeat 0 ,helm-org-rifle-context-characters not-newline)
                                      (eval token)
                                      (repeat 0 ,helm-org-rifle-context-characters not-newline)))
         for match = (string-match re line end)
         for end = (match-end 0)
         when match
         collect (match-string-no-properties 0 line))

From what I can tell from the manual, it should do what I want. Let’s try this:

(cl-loop for line in '("1" "2" "3")
         for word in '("a" "b" "c")
         collect (list (format "Line:%s Word:%s" line word)))

Well that does not behave like Python list-comps. So let’s try nested:

(cl-loop for line in '("1" "2" "3")
         collect (cl-loop for word in '("a" "b" "c")
                          collect (format "Line:%s Word:%s" line word)))

There. So this loop should work:

  (cl-loop with end
           for line in (mapcar 'car matching-lines-in-node)
for end = nil
           collect (cl-loop for token in input
                            for re = (rx-to-string `(and (repeat 0 ,helm-org-rifle-context-characters not-newline)
                                                         (eval token)
                                                         (repeat 0 ,helm-org-rifle-context-characters not-newline)))
                            for match = (string-match re line end)
                            for end = (match-end 0)
                            when match
                            collect (match-string-no-properties 0 line)))
(helm-org-rifle-get-candidates-in-buffer (get-file-buffer "~/org/inbox.org") "emacs :org:")

Hm…not quite. Well, this is the code from just before the commit that broke it:

(setq matched-words-with-context
      (cl-loop for line in (map 'list 'car matching-lines-in-node)
               append (cl-loop with end
                               for token in input
                               for re = (rx-to-string `(and (repeat 0 ,helm-org-rifle-context-characters not-newline)
                                                            (eval token)
                                                            (repeat 0 ,helm-org-rifle-context-characters not-newline)))
                               for match = (string-match re line end)
                               if match
                               do (setq end (match-end 0))
                               and collect (match-string-no-properties 0 line)

Profile with fix

(profile-rifle 10 (helm-org-rifle-get-candidates-in-buffer (find-file-noselect "~/org/inbox.org") "emacs helm !mail"))

Hm, that seems nearly twice as slow as before, compared to this. Let’s try without negation:

(profile-rifle 10 (helm-org-rifle-get-candidates-in-buffer (find-file-noselect "~/org/inbox.org") "emacs helm"))

Okay, that’s bad. But something is obviously wrong, because it’s calling rx-form and search-forward-regexp way too many times. Let’s see…

The problem is that the positive-re is matching anywhere, not just at word boundaries, so it’s matching way too many nodes. Well, that is a problem; I don’t know if it explains the entire slowdown.

For example, this matches overwhelming because of the helm in the middle:

"\\(\\(?:[ 	]+\\(:[[:alnum:]_@#%%:]+:\\)\\)?\\| \\)emacs\\(\\(?:[ 	]+\\(:[[:alnum:]_@#%%:]+:\\)\\)?\\| \\|$\\)\\|\\(\\(?:[ 	]+\\(:[[:alnum:]_@#%%:]+:\\)\\)?\\| \\)helm\\(\\(?:[ 	]+\\(:[[:alnum:]_@#%%:]+:\\)\\)?\\| \\|$\\)"

Okay, the problem now is that I changed helm-org-rifle-tags-re to fix tag matching, but that same regexp is used in helm-org-rifle-prep-token, and now that function is matching any token as a tag and giving the wrong result.

I do not understand why it’s doing that, because that regexp is only supposed to match tags

Okay, the other regexp that I kept commented out appears to match actual tags, as in it’s useful for testing whether a string is a tag:

(org-re ":\\([[:alnum:]_@#%:]+\\):[ \t]*$")

While this one appears to match tags in a document, potentially in a list of tags:

(org-re "\\(?:[ \t]+\\(:[[:alnum:]_@#%%:]+:\\)\\)?")

Okay, I fixed it, I had an if match instead of a while match in the matched-words-with-context loop.

Now to profile and compare with the pre-fix-context version:

Pre-context-fixed version: master @ 5c30f38

(profile-rifle 50 (helm-org-rifle-get-candidates-in-buffer (find-file-noselect "~/org/inbox.org") "emacs helm"))

Context-fixed version: 2b5b12a

[2016-04-02 Sat 00:14] Well, that’s definitely worse, although it’s still probably fast enough, because the elp instrumentation makes it a lot slower.

I’m also noticing that when I eval the buffer of the old version, and then the new one, and back and forth, it’s giving different results than when I start a new Emacs session before eval’ing each buffer. The content-fixed version is still slower, but it’s annoying that they are somehow interfering with each other…

Oh, I know what it probably is: defvar not changing already-defined vars. Gah, I wish there were a “developer mode” that would automatically treat defvar as setq! That might also be causing different results to be returned.

And on that note, notice that the old version is running org-heading-components 9350 times and the new one 9750 times (divided by 50 runs, of course). That means the newer one is returning more results. That’s probably a good thing–better than returning fewer results–but it’s still an annoying discrepancy.

Well, anyway, it seems that the new version is working properly, even if it is a bit slower. I can probably optimize it some from here by profiling it some more. And it’s probably still fast enough anyway. I’m going to commit these test results and go from there.

[2016-04-02 Sat 00:24] I just noticed that the new version has search-forward-regexp while the old shows re-search-forward. I guess I accidentally used one instead of the other. And I didn’t have re- in the profile-rifle macro, so it wasn’t being instrumented. But I can’t even find out what the difference between those two functions is. Their docstrings are identical, but re-search-forward says it’s “an interactive built-in function in `C source code’” and search-forward-regexp says it’s an “interactive built-in function”. If one were an alias for the other, wouldn’t it say so, like other functions do? And I just googled it, and I can’t even find any discussions disambiguating them.

Well, I guess I will change all the search-forward-regexp to re-search-forward and profile it again, now with re- instrumented…

Well, that made it a bit slower… and re-search-forward is running 1915 times per run, which seems like a lot. Well, just for fun, let’s see if search-forward-regexp is any different…

Well, seems about the same. Some other functions are calling re-search-forward. I guess I’ll stick to re-search-forward for consistency.

Let’s see if I can optimize this regexp, because it’s the one used for finding the next matching node:

(positive-re (mapconcat 'helm-org-rifle-prep-token input "\\|"))

Wait…I think I can’t do that, because each token has to be handled separately in case it’s a tag. At least, that’s the way I found that works.

I just realized something: because re- wasn’t instrumented when I profiled the pre-context-fix code, that probably made the test runs a lot faster. I should rerun that test now that I’ve instrumented re-:

Uh…that’s a lot slower…even slower than the context-fixed version. And it’s running re-search-forward about 1/3rd fewer times, yet it’s still slower. That means the context-fixed version is faster…yet it doesn’t feel faster… This is getting really confusing.

…Or not! I ran it again, and this time it was back to 0.38 seconds per run, instead of the 0.88 that it showed. So the old version is faster. Argh, I even restarted Emacs between runs, but the results are still not always consistent.

(Haha, if anyone reads this on GitHub, they’re going to be confused, because GitHub doesn’t display results blocks in their Org renderer.)

Back to testing the context-fixed version:

Maybe the problem is here:

(s-matches? re target)

In the pre-context-fix version, I’m using:

(s-contains? token target t)

I think I changed to the regexp version because the s-contains? version was doing substring matching, which I don’t want. Let’s switch it real quick just to see if that’s the problem:

Eh, it’s only about 20ms faster per run, although s-contains? is more than twice as fast as s-matches?. But it’s still such a short time that it doesn’t make much difference.

This is probably where the next-gen branch would be easier to optimize. Even if all the extra function calls took their toll, at least I could profile each one separately. With this, I see all those re-search-forward calls listed, but it’s hard to figure out why that’s making it slower than the pre-context-fix version.

Okay, I think I see what the problem is, or almost:

Pre-context-fix: re-search-forward 61250 3.4628969270 5.653...e-05 Post-context-fix: re-search-forward 78050 10.705968030 0.0001371680

The time per call to this function in the old version is much shorter, so the problem must be the regexp complexity. And that is a bit annoying, because I thought I was being careful to make it simpler, like by wrapping the whole regexp in the word-boundary matcher instead of each token in the or group.

It’s almost surely this one: (re-search-forward positive-re node-end t), because the other two are the negation one (which isn’t being called in this test), and the per-node matcher (re-search-forward positive-re nil t), which is only run once per partially-matching node, in the main loop, while the other one runs multiple times per partially-matching node. They both use the same regexp though. Maybe if I can optimize the regexp used in that one…

I’m not sure that I can, though, because IIRC I had to do it this way to avoid substring matching:

(positive-re (mapconcat 'helm-org-rifle-prep-token input "\\|"))

Maybe having each token wrapped with helm-org-rifle-prep-token is the problem, but I think if I change that, I’ll get substring matching, which I don’t want. Also there’s this: while before I thought I wasn’t getting substring matching, it might be that I actually was, but only for tokens after the first.

Sigh. I can see how having a testing framework for this would help a lot…

Well, I’m going to try a quick experiment: the faster version has this:

(setq matching-positions-in-node
      (or (cl-loop for token in all-tokens
                   do (goto-char node-beg)
                   while (re-search-forward (helm-org-rifle-prep-token token) node-end t)
                   when negations
                   when (cl-loop for negation in negations
                                 thereis (s-matches? negation
                                                     (buffer-substring-no-properties (line-beginning-position)
                                                                                     (line-end-position))))
                   return nil
                   collect (line-beginning-position) into result
                   do (end-of-line)
                   finally return (sort (delete-dups result) '<))
          ;; Negation found; skip node
          (throw 'negated (goto-char node-end))))

And the slower version has this:

(when (and negations
           (re-search-forward negations-re node-end t))
  (throw 'negated (goto-char node-end)))

(setq matching-positions-in-node
      (cl-loop initially (goto-char node-beg)
               while (re-search-forward positive-re node-end t)
               collect (line-beginning-position) into result
               do (end-of-line)
               finally return (sort (delete-dups result) '<)))

It’s hard for me to imagine how the first one is faster, even without negations, because it should be running more searches, about one for each token times the number of matching lines, rather than one for the number of matching lines. And helm-org-rifle-prep-token is being called…well it should be a lot of times, once per token per node, at least, so that should be much slower! But maybe the more complex regexp is that much slower, so that running more, simpler searches is faster. Let’s find out… one, ta-hoo-hoo, tha-ree…

(setq matching-positions-in-node
      (cl-loop for token in input
               do (goto-char node-beg)
               while (re-search-forward (helm-org-rifle-prep-token token) node-end t)
               collect (line-beginning-position) into result
               do (end-of-line)
               finally return (sort (delete-dups result) '<)))

Well, that’s basically the same. Even though helm-org-rifle-prep-token is being called 19,400 times now (whereas before it wasn’t even on the chart), the overall run is about the same speed. And re-search-forward is being called 110,600 times instead of 78,050 times, and that’s adding two seconds to the overall time, yet the overall time is only 1 second slower, and each run is only 0.02 seconds slower.

I really don’t know. It’s probably still acceptably fast, but I’m not happy that it’s 240 ms slower per run than it was before.

Wait…is it the context matching that’s slowing it down? That would seem to make sense, but I don’t see string-match or match-string-no-properties on the chart, which are called a lot in the context-getting part. Again, this is where the next-gen branch would be easier to profile, because that part would be in a separate function, which would show up on the benchmark.

Okay, so let’s try disabling the context-matching and see if that helps narrow it down.

Wow…nope. I set the context matches to a hardcoded string, and it actually took longer. That makes noooooo sense. I guess the context matching isn’t the problem.

Ok then, let’s see if avoiding substring matches is really the problem. Let’s change that back so that it does match substrings and see if it’s faster again:

Uh, before I do that… I see a discrepancy in the code:

(setq matching-positions-in-node
      (cl-loop initially (goto-char node-beg)
               while (re-search-forward positive-re node-end t)
               collect (line-beginning-position) into result
               do (end-of-line)
               finally return (sort (delete-dups result) '<)))

;; Get list of line-strings containing any token
;; (setq matching-lines-in-node
;;       (cl-loop for pos in matching-positions-in-node
;;                do (goto-char pos)
;;                ;; Get text of each matching line
;;                for string = (buffer-substring-no-properties (line-beginning-position)
;;                                                             (line-end-position))
;;                unless (org-at-heading-p) ; Leave headings out of list of matched lines
;;                ;; (DISPLAY . REAL) format for Helm
;;                collect `(,string . (,buffer ,pos))))
(setq matching-positions-in-node
      (cl-loop for token in input
               do (goto-char node-beg)
               while (re-search-forward (helm-org-rifle-prep-token token) node-end t)
               collect (line-beginning-position) into result
               do (end-of-line)
               finally return (sort (delete-dups result) '<)))

Somehow I put two of these loops in while commenting out the matching-lines-in-node part. So running that loop twice could explain the slowdown…but then how were any context lines being displayed at all? Wow…how did I manage to do that… Oh I think I see, when I was testing the other matching-positions-in-node loop, I commented out and replaced the wrong one. So…let’s fix that and profile again:

Okay, that is slightly faster, but this matches substrings, which I don’t want. So if I kept this, it would be a slight improvement over the current master in that it would fix the context matching while being a little bit slower.

I wonder if I could compromise and match substrings but only at the beginning of words (or after punctuation). That could be useful anyway, because it would avoid the “did I use a plural” problem. Let’s see if I can try it…

Wait, if I do that, it might mess up the tags matching that took so long to fix.

I wonder if I should separate out the tags matching. I already have it getting a list of tags in a separate string. If I removed tags-matching tokens from the input and matched them separately, maybe it would let me use a simpler regexp for everything else and avoid the prep function. I should probably make another branch to test that idea…sigh. And I don’t even know if that would improve performance. I’d have to first separate out the tags matching, then verify that it works properly, and then simplify the main positive-re regexp, and then see if it is faster.

I think I’m going to stop here. It seems to work properly right now: context-matching, tag-matching, avoids substring matches, and negation works. And it seems fast enough, even if it is slower than before. Maybe there is some combination of these changes that makes everything work at about the same speed as before, but I think trying to figure it out is too complicated with this big candidates-getting function. I think it would be better to settle on this code that works correctly, and then go back to the next-gen branch and try to improve that, which is structured in a simpler way.

[2016-04-02 Sat 02:21] I decided to test in the MELPA sandbox before merging with master and pushing, and it’s a good thing I did, because I discovered another weird bug: if the show-tags setting is off, the results are way off. Probably a simpleish logic error in the code somewhere…but I think at this point I should just remove that setting. As it is it’s off by default, and I wonder how many people have gotten bad results because of it and decided that this package is no good. I doubt anyone would want it off anyway, and it doesn’t seem to hurt performance. So let’s just remove that so it’s consistent…

Occur-like

Useful debugging code

This helps for debugging, in case I need it in the future:

(let ((inhibit-read-only t)
      (helm-org-rifle-show-full-entry t)
      (results-buffer (get-buffer-create helm-org-rifle-occur-results-buffer-name)))
  (with-current-buffer results-buffer
    (unless (eq major-mode 'org-mode)
      (read-only-mode)
      (visual-line-mode)
      (org-mode)
      (hi-lock-mode 1)
      (use-local-map helm-org-rifle-occur-keymap))
    (erase-buffer)
    (pop-to-buffer results-buffer))
  (helm-org-rifle-occur-process-input "today Dodie" (list (find-buffer-visiting "~/org/log.org")) results-buffer)
  (pop-to-buffer results-buffer))

Plans

[#A] Look into Helm updates

See emacs-helm/helm#1806 (comment)

SOMEDAY [#B] Look into using Sallet

  • State “SOMEDAY” from “TODO” [2018-04-16 Mon 13:30]

Not as full-featured as Helm, but might be an interesting alternative approach.

[#A] Use lexical-binding

[2017-10-30 Mon 18:37] As suggested by Matus Goljer:

I looked at your code a bit and it looks quite good. I would try to enable lexical binding, I’ve noticed that you depend on dynamic lookup somewhere: instead pass the data as arguments, it’s going to be much faster still (dynamic lookup can be awful slow)

I thought I had already done this, but apparently I forgot. Might be an easy, nearly free speed boost.

To-do keyword negation

I forgot to add this when I rewrote the input handling.

MAYBE async/deferred/concurrent?

  • State “MAYBE” from [2017-09-11 Mon 10:27]

Matus mentioned that he’s experimenting with emacs-deferred (which also has concurrent.el in it) for his Sallet project, that it’s working well so far. I wonder if I could use that to improve performance, maybe even use it with Helm (and/or Sallet eventually).

[2017-10-30 Mon 18:35] More discussion here.

Try inserting outline paths into separators/overlays in occur buffers

Using overlays should prevent Org itself from re-fontifying the paths.

[#C] Try org-map-entries

I don’t know why I didn’t realize this sooner, but org-map-entries could likely do much of the logic in helm-org-rifle--get-candidates-in-buffer. I don’t know for certain if it would be faster, but since it has optional caching, it might very well be. And it makes it easy to get inherited tags, properties, etc, and to run on regions, subtrees, etc. Might even completely handle tag matching for me. Very powerful. I should definitely try it, and if the performance is good enough, use it.

e.g. this code from swiper/counsel/ivy:

(org-map-entries
 (lambda ()
   (let* ((components (org-heading-components))
          (level (make-string
                  (if org-odd-levels-only
                      (nth 1 components)
                    (nth 0 components))
                  ?*))
          (todo      (nth 2 components))
          (priority  (nth 3 components))
          (text      (nth 4 components))
          (tags      (nth 5 components)))
     (list (mapconcat 'identity
                      (cl-remove-if 'null
                                    (list level todo
                                          (if priority (format "[#%c]" priority))
                                          text tags))
                      " ")
           (buffer-file-name)
           (point))))
 nil
 'agenda)
(org-map-entries FUNC &optional MATCH SCOPE &rest SKIP)

Call FUNC at each headline selected by MATCH in SCOPE.

FUNC is a function or a lisp form.  The function will be called without
arguments, with the cursor positioned at the beginning of the headline.
The return values of all calls to the function will be collected and
returned as a list.

The call to FUNC will be wrapped into a save-excursion form, so FUNC
does not need to preserve point.  After evaluation, the cursor will be
moved to the end of the line (presumably of the headline of the
processed entry) and search continues from there.  Under some
circumstances, this may not produce the wanted results.  For example,
if you have removed (e.g. archived) the current (sub)tree it could
mean that the next entry will be skipped entirely.  In such cases, you
can specify the position from where search should continue by making
FUNC set the variable ‘org-map-continue-from’ to the desired buffer
position.

MATCH is a tags/property/todo match as it is used in the agenda tags view.
Only headlines that are matched by this query will be considered during
the iteration.  When MATCH is nil or t, all headlines will be
visited by the iteration.

SCOPE determines the scope of this command.  It can be any of:

nil     The current buffer, respecting the restriction if any
tree    The subtree started with the entry at point
region  The entries within the active region, if any
region-start-level
        The entries within the active region, but only those at
        the same level than the first one.
file    The current buffer, without restriction
file-with-archives
        The current buffer, and any archives associated with it
agenda  All agenda files
agenda-with-archives
        All agenda files with any archive files associated with them
(file1 file2 ...)
        If this is a list, all files in the list will be scanned

The remaining args are treated as settings for the skipping facilities of
the scanner.  The following items can be given here:

  archive    skip trees with the archive tag
  comment    skip trees with the COMMENT keyword
  function or Emacs Lisp form:
             will be used as value for ‘org-agenda-skip-function’, so
             whenever the function returns a position, FUNC will not be
             called for that entry and search will continue from the
             position returned

If your function needs to retrieve the tags including inherited tags
at the *current* entry, you can use the value of the variable
‘org-scanner-tags’ which will be much faster than getting the value
with ‘org-get-tags-at’.  If your function gets properties with
‘org-entry-properties’ at the *current* entry, bind ‘org-trust-scanner-tags’
to t around the call to ‘org-entry-properties’ to get the same speedup.
Note that if your function moves around to retrieve tags and properties at
a *different* entry, you cannot use these techniques.

[#B] Clean up persistent-action buffers

Thierry showed me this example which I should be able to use:

(condition-case _err (helm :sources my-source <etc...>) (quit (delete-my-buffers-or-whatever)))

MAYBE [#B] Use text properties instead of plists for timestamping

Similar to this in Helm, text properties could be used to store timestamps for results in helm-org-rifle-get-candidates-in-buffer, and then it wouldn’t be necessary to transform the candidates list into a plist and back. Also, an arbitrary list of helper functions could be passed in and run on each node as the candidates list is built, making it easy to optionally record extra metadata.

[#B] Remove duplicate/overlapping results from occur buffer

Since entire entry contents are displayed by default in the occur commands, it should happen that some Org nodes may be displayed twice in the results buffer. i.e. given a subtree like:

* Emacs stuff
** Packages of interest
*** ace-window
*** Helm
**** helm-info-emacs command

A search for emacs would first return the entire Emacs stuff subtree, including all 4 child nodes. But it would also return the helm-info-emacs command node as a separate result since emacs appears in its heading.

Since the second result fits entirely inside the first result, the second should be discarded.

Alternatively, the whole command could be changed to only return each entry’s own text, i.e. not child headings. This seems like it might be more “correct,” but it also seems like a matter of preference: in the example above, if the user searches for emacs, should the ace-window node be displayed? It doesn’t mention emacs directly, but it is relevant to Emacs since it’s in that subtree.

This could probably be configurable without too much added complexity…

Search files instead of buffers

e.g search agenda files, or files in a directory. Maybe write a with-unopened-file macro (or something like that) to find-buffer-visiting or find-file-noselect, and close the buffer afterward if it wasn’t already open.

Match tags separately

This would probably make it simpler and faster. Rather than trying to match a tags token across the entire node, it could just be matched against the tags string. Could probably do away with the complex and confusing tags regexp matching and simplify the prep-token function.

Case-sensitive if caps are present

It would be easy to disable case-folding if caps are present in the search string.

Substring matching

Does searching for “solution” match this subheading?

(helm-org-rifle-get-candidates-in-buffer (get-file-buffer "test.org") "solution")

…No, it does not. That will probably need to be an option, customizable and/or with a prefix arg.

Test entry

Solutions

[#A] Weird heading-only, second-word substring matching

  • State “DONE” from “TODO” [2016-04-02 Sat 04:48]
    This seems to be fixed now.

From /u/washy9999:

incidentally, on the matter of searching for substrings… if i enter a single word to search for i get a results list. if i then start entering a second word helm filters the results for each character that i enter. so, i get substring searches for words after the first! (this is for headings…it gets more complicated if i do searches that return topic content.)

Hm, this is strange. I’ll have to check on it.

Broken again

Now it’s doing substring matching again. I specifically tested this earlier and it was working correctly, not matching substrings. Now it’s doing it again. What.

MAYBE Use grep to find matching lines

It might be faster, especially for unopened files, to use grep -b to get matching lines in a file, and then backtrack to find the node’s heading, and then search the node.

Look at how Deft searches files

It probably has some good techniques for doing it quickly.

MAYBE flx sorting

This swiper issue may have some good info about caching and such. It might be too slow for rifle, or at least it might be too slow with lots of results. Hmm…

UNDERWAY Match only headings

  • State “UNDERWAY” from “MAYBE” [2017-08-11 Fri 17:07]

It might be nice to only match against headings, but this is not as easy as it might seem. This whole package is made to search both headings and content.

This Org function might make this fairly easy: org-goto-local-search-headings

Underway in the heading-only-searches branch.

MAYBE Testing with Buttercup

Could be good for testing e.g. negation, to make sure I don’t break it.

Support new Helm with input-idle-delay

Thanks to Thierry’s help, this should help prevent flickering. This will be available in Helm 1.9.4 or commits after [2016-04-01 Fri].

Ideas

[#B] Testing

After reading about Emacs testing packages, it looks like the best way to test this package is with some combination of Assess, Buttercup, Ecukes, ERT, and Espuds. Espuds’s steps should help testing interactive things, like Helm (although this will still be difficult), and Buttercup should make unit testing easier, and Assess should help with everything. Buttercup is intended as an alternative to ERT, but ERT might be useful too.

[#B] Popup UI

With all the options we have now, we need a magit-popup-style UI, e.g. to temporarily enable an option like helm-org-rifle-test-against-path.

[#A] Use cache with org-get-outline-path

I’m not sure if we can, but if so, it should help performance.

MAYBE Use recoll for indexing

  • State “MAYBE” from [2018-08-17 Fri 07:53]

It’s in Debian/Ubuntu, and there’s already a helm-recoll package. Maybe support for Org files could even be added to Recoll so it could present results as Org nodes.

[#B] jump-to-next/prev-match command for occur buffers

It would be handy to have a built-in command to jump to the next match instance in the occur buffers, maybe something like M-g n. Suggested by washy99999.

[#B] Phrase matching

Don’t know how I overlooked this for this long. Shouldn’t be too hard to implement searching for phrases in quotes. Should probably match multiple spaces (but probably not newlines or tabs) between words; wouldn’t want an accidental double-spacebar press in the searched file to prevent a match.

MAYBE [#C] follow-mode

helm-follow-mode can be activated from within Helm already with C-c C-f, and on an individual-item basis with C-j, and anyone can define a custom command to set it themselves, but it might be worth having an argument to enable it too.

Use prefix arg to toggle full-path mode

Along the lines of:

(defun my/helm-org-rifle-with-full-paths ()
    (interactive)
    (let ((helm-org-rifle-show-path (not helm-org-rifle-show-path))
      (helm-org-rifle))))

Make Helm highlight all matches

Helm only seems to highlight the first match in each candidate.

MAYBE Timestamp searching

It would be interesting to be able to search for timestamps, e.g. for nodes timestamped on a certain day, or within a certain date range. Might be a bit slow, because it would require comparing every timestamp in every result, but if it’s what you need, then it would probably be usable and worth it.

MAYBE git grep support

By setting a custom xfuncname for a git repo containing org files (see man 5 gitattributes), git diff will display the org heading as the hunk header in its output. Then running git grep -W shows entire org entries that match. And git grep has boolean operators. And git grep is very fast. Plug these into an async Helm source and boom, lightning-fast searching of org files, even if they aren’t open in an Emacs buffer. Well, as long as the files are in a git repo–but you are storing your org files in a git repo, aren’t you? =)

MAYBE sift support

Sift sounds like it might be a perfect solution here, since it supports multi-line matching, replacements, etc.

MAYBE ripgrep support

ripgrep might also be useful, although I don’t think it supports multi-line yet.

[2020-11-26 Thu 02:20] It supports multiline search now, so it might be suitable now.

MAYBE async matching

It might be interesting to use emacs-async to do matching in files that aren’t already open in the current Emacs process. I’m not sure if it would be worth it, because even if it were faster in some cases for unopened files, it wouldn’t be faster compared to searching already opened files. And even though loading large files can be slow, once they are opened, the price is paid, and searching is faster; doing it in external Emacs processes would be slow every time, not just the first.

But there might be some cases where it would be helpful. It might be possible to do it without loading the files in Org in the other processes, and it might be helpful to do all the searching in one process instead of one for each file. For the case of opening many small files that don’t need to be frequently accessed, that the user doesn’t want to keep open, doing it in another process might actually be good.

But it might also be complicated to keep the search process open while the user is changing the query; and without doing that, a new search process would be started every time the user changed the query, which would mean loading the files all over again. So I’m not sure this idea would be generally useful.

UNDERWAY Non-substring matching

Currently matches are made against substrings, like most other commands in Helm. However, this might not always lead to the best results. For example, if someone were searching for “Sol”, referring to the sun, he probably wouldn’t want to match “solution” or “solvent” or “soliloquy”. But if someone were trying to dig up a note he made a while back about apple pie, did he write about “an apple pie” or “some apple pies”? Dessert hangs in the balance!

To solve this, matches could be made against word, punctuation, or symbol boundaries. However, this is less “Helm-like,” and it might not be what most users expect. So it would be good to make this a configurable default. A prefix could override the default, and/or it could be toggleable from within a Helm session.

Collapse overlapping context strings

Right now, if more than one term appears in the same range, parts of that range will show up more than once in the context. Not a big deal, but should be fixable.

MAYBE Further profiling

helm-org-rifle-get-candidates-in-buffer might be able to be optimized more with elp. But the “low-hanging fruit” is probably gone, and performance seems good.

MAYBE Regexp matching

It would be nice to have a regexp mode…maybe.

MAYBE Match limit

org-search-goto had a match limit. I removed it to simplify things, but it might still be useful, depending on how big one’s org files are. However, performance seems good now, so this probably isn’t needed.

[#C] always-show-entry-text truncates by 3 too many

s-truncate truncates and adds ..., which means that the chosen length of entry text gets reduced by 3. Could fix this by using a setter for the defcustom that adds 3.

References

[2020-01-04 Sat 09:04]

Bugs

[#A] Ensure inherited tags work

[2018-05-26 Sat 00:38] I’m not sure that it works correctly or is consistent. Need a test for it.

[#B] Helm overrides to-do keyword face in headings

When a search term is a to-do keyword, Helm overrides the face of the keyword in headings with the Helm highlight face. I’m not sure if this can be fixed outside of Helm. I wish we could remove the keyword from a list of terms that Helm would highlight, but there doesn’t seem to be such a list.

[#C] Negation-only error

If only a negation pattern is given, an error happens. Not a big deal, doesn’t interfere with anything, just change the pattern and it goes away.

[#B] Tag order matters

  • State “DONE” from “TODO” [2017-03-13 Mon 16:52]
    Fixed!

When matching multiple tags in a string, the order of the tags matters, e.g. :website:Emacs does not match entries that are tagged :Emacs:website: or :website:something:Emacs:. Not a big deal, but would be nice to fix it. I suppose it could be useful to have this behavior, because the tags can always be specified separately, but it might be unexpected for it to work this way.

Checklists

Stable release 1.6.0

Hmm, that seems like a long list. But I want stable releases to actually be stable.

Try to get someone else to test it

It’s been used for a while now.

Set Version: header

Use x.y.0, not x.y.

helm-org-rifle.el

README.org

Update changelog

Test in clean MELPA sandbox

Update test checklist from changelog for new features

Install

Test functionality:

Positive terms
Negation
TODO keywords
Priorities
Tags
Positive
Negative
Multiple tags
Multiple tags in a single string

e.g. :tag1:tag2:

PositiveNegative
Context
Ellipses customization
Searching with show-path enabled
helm-org-rifle-files
helm-org-rifle-directories

Tag, sign, and push tag

If a new minor version (not new patch version), make new x.x branch. Then tag the new branch, using x.x.0 for the first release in a minor version branch, not x.x.

GitHub release notes

Update version numbers for next pre-release

helm-org-rifle.el

README.org

Stable release 1.5.0

Hmm, that seems like a long list. But I want stable releases to actually be stable.

Try to get someone else to test it

Last MELPA release was on December 2, with a fix from a user. No problems since then, so I think it can be considered tested.

Set Version: header

Use x.y.0, not x.y.

helm-org-rifle.el

README.org

Update changelog

Test in clean MELPA sandbox

Update test checklist from changelog for new features

Nothing to do here AFAIK.

Install

Test functionality:

All Buttercup tests pass.

Tag, sign, and push tag

If a new minor version (not new patch version), make new x.x branch. Then tag the new branch, using x.x.0 for the first release in a minor version branch, not x.x.

GitHub release notes

Update version numbers for next pre-release

helm-org-rifle.el

README.org

Stable release 1.4.0

Hmm, that seems like a long list. But I want stable releases to actually be stable.

Try to get someone else to test it

It’s been 10 days since the last change to the code, and Z has said it’s working well.

Set Version: header

Use x.y.0, not x.y.

helm-org-rifle.el

README.org

Update changelog

Test in clean MELPA sandbox

CANCELED Update test checklist from changelog for new features

Install

CANCELED Test functionality:

The buttercup tests handle the important stuff, and the other stuff hasn’t changed, and I’ve tested it recently.

CANCELED Positive terms
CANCELED Negation
CANCELED TODO keywords
CANCELED Priorities
CANCELED Tags
CANCELED Positive
CANCELED Negative
CANCELED Multiple tags
CANCELED Multiple tags in a single string

e.g. :tag1:tag2:

CANCELED PositiveCANCELED Negative
CANCELED Context
CANCELED Ellipses customization
CANCELED Searching with show-path enabled
CANCELED helm-org-rifle-files
CANCELED helm-org-rifle-directories

Tag, sign, and push tag

If a new minor version (not new patch version), make new x.x branch. Then tag the new branch, using x.x.0 for the first release in a minor version branch, not x.x.

GitHub release notes

Stable release 1.3.0

Try to get someone else to test it

Minimal changes, been sitting in non-stable MELPA for a while, no complaints.

Set Version: header

Use x.y.0, not x.y.

helm-org-rifle.el

README.org

Update changelog

CANCELED Test in clean MELPA sandbox

Nothing’s changed that should affect this; only added two commands and they work.

CANCELED Update test checklist from changelog for new features

CANCELED Install

CANCELED Test functionality:

CANCELED Positive terms
CANCELED Negation
CANCELED TODO keywords
CANCELED Priorities
CANCELED Tags
CANCELED Positive
CANCELED Negative
CANCELED Multiple tags
CANCELED Multiple tags in a single string

e.g. :tag1:tag2:

CANCELED PositiveCANCELED Negative
CANCELED Context
CANCELED Ellipses customization
CANCELED Searching with show-path enabled
CANCELED helm-org-rifle-files
CANCELED helm-org-rifle-directories

Tag, sign, and push tag

If a new minor version (not new patch version), make new x.x branch. Then tag the new branch, using x.x.0 for the first release in a minor version branch, not x.x.

GitHub release notes

Stable release 1.2.0

Try to get someone else to test it

Got some good feedback from Jack and zeltak, seems to be working well.

Set Version: header

Use x.y.0, not x.y.

helm-org-rifle.el

README.org

Update changelog

Test in clean MELPA sandbox

Update test checklist from changelog for new features

Install

Test functionality:

Positive terms
Negation
TODO keywords
Priorities
Tags
Positive
Negative
Multiple tags
Multiple tags in a single string

e.g. :tag1:tag2:

PositiveNegative
Context
CANCELED Ellipses customization

Maybe in 1.3.

Searching with show-path enabled
helm-org-rifle-files
helm-org-rifle-directories

Tag, sign, and push tag

If a new minor version (not new patch version), make new x.x branch. Then tag the new branch, using x.x.0 for the first release in a minor version branch, not x.x.

GitHub release notes

Stable release 1.1

Hmm, that seems like a long list. But I want stable releases to actually be stable.

CANCELED Try to get someone else to test it

I tried.

Set Version: header

Update changelog

Test in clean MELPA sandbox

Update test checklist from changelog for new features

Install

Test functionality:

Positive terms
Negation
TODO keywords
Priorities
Tags
Positive
Negative
Multiple tags
Multiple tags in a single string

(:tag1:tag2:)

PositiveNegative
Context
CANCELED Ellipses customization

Pushing this back to 1.2.

Tag, sign, and push tag

Stable release template

Hmm, that seems like a long list. But I want stable releases to actually be stable.

Try to get someone else to test it

Set Version: header

Use x.y.0, not x.y.

helm-org-rifle.el

README.org

Update changelog

Test in clean MELPA sandbox

Update test checklist from changelog for new features

Install

Test functionality:

Positive terms
Negation
TODO keywords
Priorities
Tags
Positive
Negative
Multiple tags
Multiple tags in a single string

e.g. :tag1:tag2:

PositiveNegative
Context
Ellipses customization
Searching with show-path enabled
helm-org-rifle-files
helm-org-rifle-directories

Tag, sign, and push tag

If a new minor version (not new patch version), make new x.x branch. Then tag the new branch, using x.x.0 for the first release in a minor version branch, not x.x.

GitHub release notes

Update version numbers for next pre-release

helm-org-rifle.el

README.org