Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Indexable#each_slice(&) #14864

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

HertzDevil
Copy link
Contributor

@HertzDevil HertzDevil commented Aug 3, 2024

If the elements of an Enumerable are already stored contiguously in memory and each iteration returns a fresh Array (i.e. reuse is false or unset), it is faster to construct that returned array's contents with a single slice copy than inserting repeatedly. Benchmarks:

require "benchmark"

module Indexable(T)
  def each_slice2(count : Int, &)
    # this PR's implementation
  end
end

struct Slice(T)
  def each_slice3(count : Int, &)
    # this PR's implementation
  end
end

require "benchmark"

N = ENV["N"]?.try(&.split(',').map(&.to_i)) || [10000]
M = ENV["M"]?.try(&.split(',').map(&.to_i)) || [16]

x = [] of Void*

M.each do |m|
  N.each do |n|
    arr = Slice.new(n) { |i| Pointer(Void).new(i) }
    puts "n = #{n}, m = #{m}"
    Benchmark.ips(n) do |b|
      b.report("Enumerable (m = #{m})") do
        arr.each_slice(m) { |v| x = v }
      end

      b.report("Indexable (m = #{m})") do
        arr.each_slice2(m) { |v| x = v }
      end

      b.report("Slice (m = #{m})") do
        arr.each_slice3(m) { |v| x = v }
      end
    end
  end
end

Results: (raw output)

each_slice

Same graph with time divided by n: (the bottom ones have a greater m)

each_slice2

These results were taken on an x86-64 machine, where the Slice implementation starts to outmatch the Indexable one when m is around 5 to 8. On AArch64 this number is slightly higher, so for simplicity the heuristic uses SMALL_SLICE_SIZE = 16.

@straight-shoota
Copy link
Member

This looks great, however there is an issue with calling the optimized version: It only works when the reuse parameter is left out. If the call has an explicit reuse: false, it'll go to the unoptimized version.

Also I'm wondering if the two implementation for Indexable and Slice (including the forwards) should be two separate PRs. They seem to be pretty much independent of each other.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants