Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Qualification tool: update SQL Df value used and look at jobs in SQL (#…
…5612) * QualificationTool. Add speedup information to AppSummaryInfo Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me> * address review comments Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me> * debug * check for dataset * change dataset check for all Signed-off-by: Thomas Graves <tgraves@apache.org> * test unsupported time * more changes * fix to string * fix including wholestage codegen * put unsupported Dur * calculate duration of non sql stages * change to get stage task time * combine * hooking up final output * initial scores changes * logging * logging * update factor * gturn off some logging: * debug * track execs without stages * Add in exec info output * fix output * add sorting * fix output * fix output sizes * use plan infos without execs removed * output children node ids * output stages info * cleanup * fix running app * fix stage header Signed-off-by: Thomas Graves <tgraves@apache.org> * Start removing unneeded fields * Update summary table * cleanup * fix sorting * update running app * update test * fix missing * fix more reporting to be based on supported execs * fixes * fix test * fix event processor calling base * fix df duration * debug * debug * fix double and int * fix double * fxi merge * fix double * fix divide 0 * fix double to 2 precision * fix formatting output * fix sorting * fix Running * move around sorting * remove logWarnings * update sorting * Add appId to execs report * update sxtages output * remove unused imports * fix running app * update tests * fix sorting * add estimated into to summary info * fix running * try using enum * fix recommendation to string * update to use recommended/strongly recommended * fix ecommendation * fix divide 0 * fix opportunity * fix up df task duration * debug * fix bug with estimated in csv * rearrange codce * cleanup * cleanup and start handling failures * handle failures and cleanup * fix output * change sorting * fixies * handle test having udf * change speedup of the *InPandas and arrow eval * fix test * fix ui after changing field names Signed-off-by: Ahmed Hussein (amahussein) <a@ahussein.me> * make ExecInfo regular class * fix commented out * fix more execinfo * update test schema * change what goes to DF * move to outer class * fix schema * fix limit option * remove extra header csv * match up test with csv * Fix not supported read formats * update results * 2 places for average speedup * update results * update results * comment out tests * update test * update operator scores * cleanup * test sql df times * Update more to use spark reported df duration Signed-off-by: Thomas Graves <tgraves@apache.org> * fix patch longest sql duration * fix task duration * try to calculate sql overhead * only use stages used in sql * remove some decimal checks * cleanup utils * calculate task time for stages in jobs in sql but not in execs * look at texecs without stages * change to account for sql ids without stage mapping * dedup * fix compil * fix set * update test results * fix merge * fix extra code * update csv exepcted results * move logging * aggreaget Wholestagecodegen and children stages * commonize code * change type to not be seq of set * update latest test results * update dsv2 results * remove logging * typo * Handle case sql duration > app duration * remove unused variable Co-authored-by: Ahmed Hussein (amahussein) <a@ahussein.me>
- Loading branch information