Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve coalesced reduction performance for tall and thin matrices (up to 2.6x faster) #2259

Merged
merged 8 commits into from
Apr 22, 2024

Commits on Apr 8, 2024

  1. Configuration menu
    Copy the full SHA
    859495a View commit details
    Browse the repository at this point in the history

Commits on Apr 9, 2024

  1. Use optimized vector warp reduce in coalescedReductionThinKernel for …

    …lower LSU utilization and coalesced global stores
    Nyrio committed Apr 9, 2024
    Configuration menu
    Copy the full SHA
    51dfdbd View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    673f804 View commit details
    Browse the repository at this point in the history
  3. Update copyright year

    Nyrio committed Apr 9, 2024
    Configuration menu
    Copy the full SHA
    102dc33 View commit details
    Browse the repository at this point in the history

Commits on Apr 10, 2024

  1. Configuration menu
    Copy the full SHA
    591a9e3 View commit details
    Browse the repository at this point in the history

Commits on Apr 16, 2024

  1. Configuration menu
    Copy the full SHA
    2b2d8af View commit details
    Browse the repository at this point in the history

Commits on Apr 17, 2024

  1. Configuration menu
    Copy the full SHA
    b553fb0 View commit details
    Browse the repository at this point in the history

Commits on Apr 22, 2024

  1. Configuration menu
    Copy the full SHA
    001c2dd View commit details
    Browse the repository at this point in the history