Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sparse dense bsr on cuda #255

Open
wants to merge 233 commits into
base: main
Choose a base branch
from
Open

Conversation

vinx13
Copy link
Collaborator

@vinx13 vinx13 commented Feb 3, 2021

No description provided.

@vinx13 vinx13 requested a review from junrushao February 3, 2021 20:01
Base automatically changed from master to backup February 21, 2021 13:41
@vinx13 vinx13 changed the base branch from backup to main March 3, 2021 03:09
@vinx13 vinx13 changed the base branch from main to backup March 3, 2021 03:10
@vinx13 vinx13 requested a review from yzhliu as a code owner March 3, 2021 03:36
@vinx13 vinx13 changed the base branch from backup to main March 3, 2021 03:36
@vinx13 vinx13 removed the request for review from yzhliu March 3, 2021 03:38
Hzfengsy and others added 20 commits March 26, 2021 11:29
rebased

[TIR][Schedule] fix reorder/buffer_flatten & finish CPU demo (#59)

[CPU DEMO] Update cpu gemm demo and fix bug (#58)

* [TIR][Schedule] introduce parallel and fix bugs for cpu demo

* [TIR][Schedule] update cpu demo

* [TIR][Schedule] fix lint

* [TIR][Schedule] fix

rebased

[TIR][Schedule] introduce reduction block and CPU demo (#53)

* [TIR] reduction : split_reduction

* [TIR] reduction : split_reduction

* [TIR] reduction : fuse_reduction

* [TIR] reduction : cpu demo

* [TIR] reduction : fix

* [TIR] reduction : pattern detect remains

* [TIR] reduction : pattern detect remains

* [TIR] reduction : pattern match done

* [TIR] reduction : fix lint

* [TIR] reduction : fix

* [TIR] reduction : fix

* [TIR] reduction : fix

* [TIR] reduction : fix

* [TIR] reduction : rebased

* [TIR] reduction : rebased

[TIR][Schedule] introduce cache_read cache_write (#54)

* [TIR][Schedule] introduce cache_read cache_write

* [TIR][Schedule] add more comments

* [TIR][Schedule] fix problem and add comments

* [TIR][Schedule] address comments

[TIR] schedule: introduce vectorize, unroll, loop validation (#47)

* [TIR] vectorize : basically complete

* [TIR] vectorize&unroll : update comments&unroll

* [TIR] vectorize&unroll : rebased

* [TIR] vectorize, unroll, cpu_demo: done

* [TIR] vectorize, unroll, cpu_demo: simplify

* [TIR] vectorize, unroll, cpu_demo: fix

* [TIR] reduction : rebased

* [TIR] reduction : fix

[TIR][Schedule] fix sref and scopes problem during replace and compute_at (#50)

* [TIR][Schedule] fix sref and scopes problem during replace and compute_at

* [TIR][Schedule] fix

* [TIR][Schedule] fix

[TIR][Refactor] move function to ScheduleNode

[TIR] Schedule: introduce primitive compute_at (#36)

* [TIR] Schedule: introduce primitive compute_at

* [TIR] Schedule: address comments

* [TIR] Schedule: address comments

* [TIR] Schedule: address comments

* [TIR] Schedule: add check to compute_at

* [TIR] Schedule: address comments

* [TIR] Schedule: address comments

[TIR] Schedule: introduce primitive reorder (#37)

* [Schedule] debug

* [TIR] Schedule: reorder, loop type detect remains

* [TIR] reorder complete

* [TIR] reorder complete

* [TIR] fix

* [TIR] reorder : rebased complete

* [TIR] reorder : fix container.h

* [TIR] reorder : fix

* [TIR] reorder : fix

* [TIR] reorder : fix

* [TIR] reorder : simplify

* [TIR] reorder : simplify

* [TIR] reorder : simplify

* [TIR] reorder : fix

* [TIR] reorder : fix

* [TIR] reorder : rebased

* [TIR] reorder : rebased

rebase

[TIR] Schedule: introduce BlockRealize and Block SRef reuse(#39)

* [TIR] BlockRealize: schedule refactor

* [TIR] BlockRealize: debug

* [TIR] BlockRealize finish

* [TIR] BlockRealize finish

* [TIR] BlockRealize fix

* [TIR] BlockRealize update test

* [TIR] BlockRealize: add loop var reuse

* [TIR] BlockRealize: add loop var reuse

* [TIR] BlockRealize: fix

* [TIR] BlockRealize: fix

* [TIR] BlockRealize: fix

* [TIR] BlockRealize: fix

* [TIR] BlockRealize: fix

* [TIR] BlockRealize: fix

* [TIR] BlockRealize: fix

* [TIR] BlockRealize: fix

* [TIR] BlockRealize: fix

* [TIR] BlockRealize: fix

[TIR] compare for module (#38)

* [TIR] compare for module

* [TIR] fix

* [TIR] fix

* [TIR] fix

* [TIR] fix

* [TIR] fix

* [TIR] fix

[Hybrid] Module init

[Hybrid] Module print

[Hybrid] Module print with meta

[Hybrid] adjust

[Hybrid] finished but without lint and comment check

[Hybrid] fix lint

[Hybrid] comments

[Hybrid] fix script decoration API

[Hybrid] using IRModule

[Hybrid] fix

[Hybrid] adjust API

[Hybrid] fix

[Hybrid] fix

[Hybrid] fix

[Hybrid] fix symbol table, adjust API, introduce meta_mutator and resolve import issue

[Hybrid] fix lint

[TIR] introduce pass BufferFlatten (#32)

* [TIR] introduce pass BufferFlatten

* [Tir] add comments & remove old TeLower

* [TIR] split GatherRegion and BufferFlatten to two Visitor/Mutator

* [TIR] address comments: Only consider stmt scope

* [TIR] BufferFlatten: address comments

* [TIR] BufferFlatten: fold BlockFlattener into BufferFlattener

* [TIR] BufferFlatten: add asserts

* [TIR] BufferFlatten: use Equal in testcase

* [TIR] Equal Pass: Enhanced the pass

* [TIR] Equal Pass: add comments

[Hybrid] refactor using Doc, introduce annotation, enhance parser (#28)

* [Hybrid] refactor printer, enhance parser

* [Hybrid] refactor

* [Hybrid] fix

* [Hybrid] fix

* [Hybrid] fix namespace issue

* [Hybrid] compare using Equal

[TIR] rebased

[TE] fix replace again and add primitive fuse and split (#27)

* [TE] add: schedule primitive fuse

* [TE] add: schedule primitive split

* [TE] address comments: add IRSubstitueInScope and other minor fix

* [TE] address comments: Enhance Equal api and fix split by nparts

* [TE] address comments

[Hybrid] introduce printer (#25)

* [Hybrid] substitute Block with SeqStmt, change block() syntax

* [Hybrid] add printer, type declare intrin

* [Hybrid] refactor

* [Hybrid] meta

* [Hybrid] refactor

* [Hybrid] macro

[TE] fix replace (#23)

* [TE] fix replace

* [TE] fix replace: add more tests

* [TE] fix replace: add more tests

[TE] rebased

[Hybrid] python syntax parser (#20)

* [Hybrid] python syntax parser

* [Hybrid] add a testcase

* [Hybrid] improve comments and fix bugs

* [Hybrid] improve comments, refactor __internal_assert, add new testcases

* [Hybrid] improve error report message, refactor intrin

* [Hybrid] separate ScopeEmitter from parser

* [Hybrid] refactor type check

* [Hybrid] refactor intrin

* [Hybrid] refactor intrin, allow register external functions with argument type checking, add a testcase

* [Hybrid] address comments, fix a bug in te/ir.h

* [Hybrid] remove type check

* [Hybrid] python syntax parser

* [Hybrid] add a testcase

* [Hybrid] improve comments and fix bugs

* [Hybrid] improve comments, refactor __internal_assert, add new testcases

* [Hybrid] improve error report message, refactor intrin

* [Hybrid] separate ScopeEmitter from parser

* [Hybrid] refactor type check

* [Hybrid] refactor intrin

* [Hybrid] refactor intrin, allow register external functions with argument type checking, add a testcase

* [Hybrid] address comments, fix a bug in te/ir.h

* [Hybrid] remove type check

* [Hybrid] refactor intrin, scope_handler, special_stmt

* [Hybrid] address comments

* [Hybrid] clean code, improve error reporting & testcase

* [Hybrid] clean code

* [Hybrid] clean code

[IR] introduce dependency graph and write map

[TE] refactor and clean codebase

[TE] refactor IR

[TE] introduce schedule, dependency graph and support fuse and split (#17)

* fix lint

* introduce dependency graph

* enable create schedule

* support get axes

* fix lint

* revert Set

* add schedule primitive fuse

* address comment

* support split

[IR] Introduce SeqStmt

add TeLower pass and enable to run Te IR (#15)

* add function data structure
add TeLower pass to transform Te to current IR
enable to run Te IR

* address comments

* unify terminology

TensorIR data structure init (#14)

* init te data structure

* finish printer and enhanced ir_builder

* address the comments

Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com>
* Re-implement: schedule creation, loop validation; Refactor: move creation code to schedule_create.cc, move validation code to schedule_validate.cc

* Update

* Add StmtSRef::GetStmt<SomeNode>() in preparation of removing DowncastPtr
* [AutoTIR] Add LoopTree

* Add ReprPrinter for IteratorNode

* Loops whose body isnt BlockRealize; More testcases

* doc
…(#78)

* [TIR][Hybrid] Extensible Parser & Life of Lowering & Concise Scoping

* [TIR][HYBRID] fix several issues

* [TIR][HYBRID] fix assert and let

* [TIR][HYBRID] fix store

* [TIR][Hybrid] improve hybrid type hierarchy

* [TIR][Hybrid] improve comments

* [TIR][Hybrid] update all the tests

* [TIR][Hybrid] improve typing class comments

* [TIR][Hybrid] improve comments

* [TIR][Hybrid] improve comments

* [TIR][Hybrid] move hybrid from tvm.tir to tvm namespace

* [TIR][Hybrid] support single point TensorRegion sugar

* [TIR][Hybrid] move string key to enum

* [TIR][Hybrid] fix enum
* [TIR][HYBRID] Grid Iteration

* [TIR][HYBRID] Grid Iteration fix

* [TIR][HYBRID] fix bugs in schedule::replace
* [TIR][Schedule] finish blockize

* [TIR][Schedule] finish native tensorize

* [TIR][Schedule] finish tensorize

* address comments

* address comments
* [TIR][Hybrid] concise block declaration

* [TIR][Hybrid] concise block declaration

* [TIR][Hybrid] fix
* [Schedule] GPU support and part of execution scope

* [Schedule] address comments

* [Schedule] add scope validate
MasterJH5574 and others added 14 commits April 2, 2021 12:14
…#351)

* [BugFix][MetaSchedule] Add preserve_unit_loop to ComputeAtAttr & ReverseComputeAtAttr

* [BugFix][MetaSchedule] Fix

* [BugFix][MetaSchedule] Add Serialize/Deserialize
Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com>

Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com>
Co-authored-by: Junru Shao <junrushao1994@gmail.com>
* [BugFix] Fix `LCACollector` in current BufferFlatten

* [BugFix] Update comment
* [TIR] Add primitive `SetScope` on C++ side

* [TIR] Add primitive `SetScope` on Python side

* [MetaSchedule] Add primitive `SetScope` on C++ side

* [MetaSchedule] Add primitive `SetScope` on Python side

* [MetaSchedule] Fix

* [TIR][Schedule] Fix primitive bug

* [TIR] Replace blocks as few as possible

* [TIR] Add unit test

* [TIR][MetaSchedule] Self-review and update.

* [TIR] Remove `VisitStmt_(const BufferRealize* node)`

* [TIR] Fuse both source&target of MatchBufferRegion
* add feature

* add first version layout rewrite

* add dense

* add extract_tasks api and modify tests

* add layout rewrite for batch matmul and conv2d winograd

* final commit before rebase

* pass compile

* fix compile warning

* fix

* fix

* remove useless fields and add doc

* fix doc

* change name

* update doc for dispatcher.py

* refactor relay_integration.py

* refactor the rest python files

* refactor

* refactor

* refactor

* refactor

* remove debug output

* pass unittest

* fix

* remove unused fields

* fix rebase bugs

* style issues

* apply black

* disable vm compiler

* fix bug

* address comment

Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com>
Co-authored-by: Junru Shao <junrushao1994@gmail.com>
* [MetaSchedule] Add the rule

* [MetaSchedule] compute_at v.s. reverse_compute_at....

* [MetaSchedule] Update post processor "RewriteReductionBlock"

* [Rebase] Fix ce

* [MetaSchedule] Fix post processor

* [MetaSchedule] Update the rule with NRM 90% align

* [MetaSchedule] Rule bug fix

* [MetaSchedule] Add `set_scope` and `bind blockIdx` in the rule. NRM aligned

* [MetaSchedule] Fix the case when LCA is a loop. Softmax aligned

* [MetaSchedule] Add documents

* [MetaSchedule] Self-review

* [BugFix] Fix print double precision

* [MetaSchedule] Remove redundant line in search_rule.py

* [MetaSchedule] Persist on 32 for threadIdx

* [MetaSchedule] Fix post-processor RewriteUnboundBlocks

Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants