From 904f5e5212dcba604b0bca8c5004da38db4c2fe6 Mon Sep 17 00:00:00 2001 From: Mikhail Bautin Date: Sat, 15 Jan 2022 08:58:13 -0800 Subject: [PATCH] [#10942, #11033, #11034, #10964] Enable link-time optimization with Clang 12 Summary: Link-time optimization ( https://llvm.org/docs/LinkTimeOptimization.html ) allows for more aggressive optimizations, including inlining, compared to the shared library based model that we currently use. This diff enables link-time optimization for the Clang 12 Linuxbrew-based release build for the yb-tserver executable only, producing a binary that statically links all object files needed by yb-tserver, including those that are included in the yb_pgbackend library. Third-party libraries are being linked statically but they are not LTO-enabled yet. The linking of the final LTO-enabled binary is currently being done outside of the CMake build system, using the dependency_graph.py tool that can access the dependency graph of targets and object files, and therefore has all the information needed to construct the linker command line. This also gives us more flexibility customizing the linker command line compared to attempts to do this in the CMake build system. Moving this linking step to CMake may be a future project. Refactored the dependency_graph.py script into multiple modules: dependency_graph.py, dep_graph_common.py, source_files.py, as well as lto.py (with the new LTO logic). Also refactored master_main.cc and tablet_server_main.cc and extracted common initialization code to tserver/server_main_util.cc. It is in the tserver directory because the master code currently uses the tserver code. For building LTO-enabled binaries, we need to use LLVM's lld linker. It has issues with our distributed compilation framework ( https://github.com/yugabyte/yugabyte-db/issues/11034 ). Fixing this by always running LLD-enabled linking commands locally and not on a remote build worker. Various static initialization issues were identified as fixed as part of this work. If not fixed, these would result in the yb-tserver binary crashing immediately with a core dump. - In consensus_queue.cc, the RpcThrottleThresholdBytesValidator function for validating the rpc_throttle_threshold_bytes flag was trying to access other flags before they were fully initialized. Moved this validation to the main program. - The webserver_doc_root flag was calling yb::GetDefaultDocumentRoot() to determine its default value. Moved that default value determination to where the flag is being used. - [ https://github.com/yugabyte/yugabyte-db/issues/11033 ] The INTERNAL_TRACE_EVENT_ADD_SCOPED macro, when invoked during static initialization, led to a crash in std::string construction. Added a new atomic trace_events_enabled for enabling trace events and only turned it on after main() started executing. The INTERNAL_TRACE_EVENT_ADD_SCOPED is a no-op before trace_events_enabled is set to true. - [ https://github.com/yugabyte/yugabyte-db/issues/10964 ] The kGlobalTransactionTableName global constant of the YBTableName type relied on the statically initialized string constant, kGlobalTransactionsTableName, which turned out to be empty during initialization. As a result, the transaction status table could not be properly located. Changed kGlobalTransactionsTableName to be a `const char*`. In addition, in the LTO-enable build, it became apparent that some symbols were duplicated between the gperftools library and the gutil part of YugabyteDB code ( https://github.com/yugabyte/yugabyte-db/issues/10956 ): - AtomicOps_Internalx86CPUFeatures -- renamed to YbAtomicOps_Internalx86CPUFeatures - RunningOnValgrind -- renamed to YbRunningOnValgrind - ValgrindSlowdown -- renamed to YbValgrindSlowdown - base::internal::SpinLockDelay, base::internal::SpinLockWake -- added a top-level yb namespace To enable easily switching between regular and LTO binaries, we are updating yb-ctl to support YB_CTL_TSERVER_DAEMON_FILE_NAME and YB_CTL_MASTER_DAEMON_FILE_NAME environment variables. For example, by setting YB_CTL_TSERVER_DAEMON_FILE_NAME=yb-tserver-lto, you can tell yb-ctl to launch the tablet server using build/latest/bin/yb-tserver-lto. However, for the release package, the LTO-enabled yb-tserver executable will still be named yb-tserver, replacing the previous shared library based executable. Another tooling change in this diff is how we handle the `--no-tests` flag passed to `yb_build.sh`. That flag results in setting the YB_DO_NOT_BUILD_TESTS environment variable to 1, and our CMake scripts skip all the test targets. However, it is easy to forget to keep specifying that flag. In this diff, we are storing the variable BUILD_TESTS in CMake's build cache, and reuse it during future CMake runs, without the developer having to specify `--no-tests`. It can be reset by setting YB_DO_NOT_BUILD_TESTS=0. Test Plan: Jenkins ``` # ./yb_build.sh --clang12 release # build-support/tserver_lto.sh # ldd build/latest/bin/yb-tserver-lto linux-vdso.so.1 (0x00007fff535bf000) libm.so.6 => /opt/yb-build/brew/linuxbrew-20181203T161736v9/lib/libm.so.6 (0x00007f1b85b7d000) libgcc_s.so.1 => /opt/yb-build/brew/linuxbrew-20181203T161736v9/lib/libgcc_s.so.1 (0x00007f1b85966000) libc.so.6 => /opt/yb-build/brew/linuxbrew-20181203T161736v9/lib/libc.so.6 (0x00007f1b855ca000) /opt/yb-build/brew/linuxbrew-20181203T161736v9/lib/ld.so => /lib64/ld-linux-x86-64.so.2 (0x00007f1b85e80000) libdl.so.2 => /opt/yb-build/brew/linuxbrew-20181203T161736v9/lib/libdl.so.2 (0x00007f1b853c6000) libpthread.so.0 => /opt/yb-build/brew/linuxbrew-20181203T161736v9/lib/libpthread.so.0 (0x00007f1b851a9000) librt.so.1 => /opt/yb-build/brew/linuxbrew-20181203T161736v9/lib/librt.so.1 (0x00007f1b84fa1000) ``` The yb-tserver-lto is ~326 MiB. Microbenchmark -------------- The test was done on a dual-socket Xeon E5-2670 machine (16 cores total, 32 hyper-threads) running AlmaLinux 8.5. Details: https://gist.githubusercontent.com/mbautin/7f9784fb2ea4173539d2e2656cfe117f/raw Results (CassandraKeyValue workload): 78K ops/sec with GCC 5.5, 85K ops/sec with Clang 12 without LTO, 104K ops/sec with Clang 12 with LTO. Reviewers: sergei Reviewed By: sergei Subscribers: sergei, bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D14616 --- CMakeLists.txt | 25 +- build-support/common-build-env.sh | 54 +- .../compiler-wrappers/compiler-wrapper.sh | 36 +- build-support/jenkins/build-and-test.sh | 29 +- build-support/tserver_lto.sh | 13 + build-support/yb_release.py | 11 +- cmake_modules/YugabyteFunctions.cmake | 27 + codecheck.ini | 4 + ent/build-support/upload_package.sh | 5 +- python/yb/build_paths.py | 74 ++ python/yb/common_util.py | 37 + python/yb/dep_graph_common.py | 915 ++++++++++++++ python/yb/dependency_graph.py | 1101 ++--------------- python/yb/llvm_urls.py | 3 +- python/yb/lto.py | 386 ++++++ python/yb/release_util.py | 99 +- python/yb/source_files.py | 117 ++ python/yb/thirdparty_tool.py | 10 +- scripts/installation/bin/yb-ctl | 16 +- src/yb/common/transaction.cc | 2 +- src/yb/common/transaction.h | 2 +- src/yb/common/wire_protocol.h | 1 + src/yb/consensus/consensus_queue.cc | 17 +- src/yb/consensus/consensus_queue.h | 2 + src/yb/gutil/atomicops-internals-tsan.h | 2 +- src/yb/gutil/atomicops-internals-x86.cc | 22 +- src/yb/gutil/atomicops-internals-x86.h | 22 +- src/yb/gutil/dynamic_annotations.c | 12 +- src/yb/gutil/dynamic_annotations.h | 38 +- src/yb/gutil/once.cc | 10 +- src/yb/gutil/spinlock.cc | 5 +- src/yb/gutil/spinlock_internal.cc | 30 +- src/yb/gutil/spinlock_internal.h | 3 +- src/yb/gutil/spinlock_linux-inl.h | 4 +- src/yb/gutil/spinlock_posix-inl.h | 9 +- src/yb/gutil/spinlock_win32-inl.h | 6 + src/yb/gutil/sysinfo.cc | 2 +- src/yb/master/master_main.cc | 31 +- src/yb/server/webserver_options.cc | 7 +- src/yb/tserver/CMakeLists.txt | 1 + src/yb/tserver/server_main_util.cc | 79 ++ src/yb/tserver/server_main_util.h | 29 + src/yb/tserver/tablet_server_main.cc | 34 +- src/yb/util/debug/trace_event.h | 3 +- src/yb/util/debug/trace_event_impl.cc | 8 + src/yb/util/debug/trace_event_impl.h | 8 + src/yb/util/test_util.cc | 2 + src/yb/util/trace-test.cc | 13 + 48 files changed, 2193 insertions(+), 1173 deletions(-) create mode 100755 build-support/tserver_lto.sh create mode 100644 python/yb/build_paths.py create mode 100644 python/yb/dep_graph_common.py create mode 100644 python/yb/lto.py create mode 100644 python/yb/source_files.py create mode 100644 src/yb/tserver/server_main_util.cc create mode 100644 src/yb/tserver/server_main_util.h diff --git a/CMakeLists.txt b/CMakeLists.txt index d9c645bfb734..079222ac284e 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -35,7 +35,6 @@ cmake_minimum_required(VERSION 3.7.0) project(YugabyteDB) - if("$CACHE{YB_PCH_ON}" STREQUAL "") if("$ENV{YB_USE_PCH}" STREQUAL "1") set(YB_USE_PCH ON) @@ -102,17 +101,31 @@ set(YB_ALL_DEPS "" CACHE INTERNAL "All dependencies" FORCE) # This is used to let the add_executable wrapper know if we're adding a test. set(YB_ADDING_TEST_EXECUTABLE "FALSE" CACHE INTERNAL "" FORCE) -if ("$ENV{YB_DO_NOT_BUILD_TESTS}" STREQUAL "1") - message("Skipping building tests (YB_DO_NOT_BUILD_TESTS environment variable is set to 1)") - set(BUILD_TESTS OFF) -else() +if(NOT "$ENV{YB_DO_NOT_BUILD_TESTS}" STREQUAL "") + if("$ENV{YB_DO_NOT_BUILD_TESTS}" STREQUAL "1") + message("YB_DO_NOT_BUILD_TESTS is set to 1, will not build tests") + set(BUILD_TESTS OFF) + elseif("$ENV{YB_DO_NOT_BUILD_TESTS}" STREQUAL "0") + message("YB_DO_NOT_BUILD_TESTS is set to 0, will build tests") + set(BUILD_TESTS ON) + else() + message(FATAL_ERROR + "Invalid value of the YB_DO_NOT_BUILD_TESTS environment variable, expected 0 or 1 but " + "got '$ENV{YB_DO_NOT_BUILD_TESTS}'.") + endif() + set(BUILD_TESTS ${BUILD_TESTS} CACHE BOOL "Whether to build tests") +elseif("$CACHE{BUILD_TESTS}" STREQUAL "") set(BUILD_TESTS ON) + message("Will build tests by default") +else() + message("BUILD_TESTS from cache: ${BUILD_TESTS} (set YB_DO_NOT_BUILD_TESTS to 0 or 1 to change)") endif() parse_build_root_basename() if("${YB_BUILD_TYPE}" STREQUAL "") message(FATAL_ERROR "YB_BUILD_TYPE still not set after parse_build_root_basename") endif() + message("YB_BUILD_TYPE: ${YB_BUILD_TYPE}") message("CMAKE_MAKE_PROGRAM: ${CMAKE_MAKE_PROGRAM}") @@ -244,6 +257,7 @@ elseif (IS_GCC) else() message(FATAL_ERROR "Unknown compiler family: '${COMPILER_FAMILY}'.") endif() + message("THIRDPARTY_INSTRUMENTATION_TYPE=${THIRDPARTY_INSTRUMENTATION_TYPE}") # Make sure third-party dependency is up-to-date. @@ -301,6 +315,7 @@ endif() VALIDATE_COMPILER_TYPE() DETECT_BREW() +enable_lto_if_needed() message("Using COMPILER_FAMILY=${COMPILER_FAMILY}") diff --git a/build-support/common-build-env.sh b/build-support/common-build-env.sh index 6284afc676c5..9a14143b10fe 100644 --- a/build-support/common-build-env.sh +++ b/build-support/common-build-env.sh @@ -292,6 +292,11 @@ if [[ -n ${YB_LINUXBREW_DIR:-} ]]; then yb_linuxbrew_dir_origin=" (from environment)" fi +yb_llvm_toolchain_url_origin="" +if [[ -n ${YB_LLVM_TOOLCHAIN_URL:-} ]]; then + yb_llvm_toolchain_url_origin=" (from environment)" +fi + yb_llvm_toolchain_dir_origin="" if [[ -n ${YB_LLVM_TOOLCHAIN_DIR:-} ]]; then yb_llvm_toolchain_dir_origin=" (from environment)" @@ -1146,7 +1151,9 @@ save_var_to_file_in_build_dir() { if [[ ! -d $BUILD_ROOT ]]; then mkdir -p "$BUILD_ROOT" fi - echo "$value" >"$BUILD_ROOT/$file_name" + if ! echo "$value" >"$BUILD_ROOT/$file_name"; then + fatal "Could not save value '$value' to file '$BUILD_ROOT/$file_name'" + fi fi } @@ -1217,14 +1224,16 @@ download_thirdparty() { download_toolchain() { local toolchain_urls=() - if [[ -n ${YB_THIRDPARTY_URL:-} && ${YB_THIRDPARTY_URL##*/} == *linuxbrew* ]]; then - # TODO: get rid of this and always include linuxbrew_url.txt in the thirdparty archives that are - # built for Linuxbrew. - toolchain_urls+=( "https://github.com/yugabyte/brew-build/releases/download/\ -20181203T161736v9/linuxbrew-20181203T161736v9.tar.gz" ) - if [[ -f $BUILD_ROOT/llvm_url.txt ]]; then - toolchain_urls+=( "$(<"$BUILD_ROOT/llvm_url.txt")" ) - fi + local linuxbrew_url="" + if [[ -n ${YB_THIRDPARTY_DIR:-} && -f "$YB_THIRDPARTY_DIR/linuxbrew_url.txt" ]]; then + local linuxbrew_url_file_path="${YB_THIRDPARTY_DIR}/linuxbrew_url.txt" + linuxbrew_url="$(<"${linuxbrew_url_file_path}")" + elif [[ -n ${YB_THIRDPARTY_URL:-} && ${YB_THIRDPARTY_URL##*/} == *linuxbrew* || + -n ${YB_THIRDPARTY_DIR:-} && ${YB_THIRDPARTY_DIR##*/} == *linuxbrew* ]]; then + # TODO: get rid of the hard-coded URL below and always include linuxbrew_url.txt in the + # thirdparty archives that are built for Linuxbrew. + local linuxbrew_url="https://github.com/yugabyte/brew-build/releases/download/" + linuxbrew_url+="20181203T161736v9/linuxbrew-20181203T161736v9.tar.gz" else for file_name_part in linuxbrew toolchain; do local url_file_path="$YB_THIRDPARTY_DIR/${file_name_part}_url.txt" @@ -1234,6 +1243,14 @@ download_toolchain() { fi done fi + + if [[ -n ${linuxbrew_url:-} ]]; then + toolchain_urls+=( "$linuxbrew_url" ) + fi + if [[ -n ${YB_LLVM_TOOLCHAIN_URL:-} ]]; then + toolchain_urls+=( "${YB_LLVM_TOOLCHAIN_URL}" ) + fi + if [[ ${#toolchain_urls[@]} -eq 0 ]]; then return fi @@ -1250,7 +1267,7 @@ download_toolchain() { is_linuxbrew=true else fatal "Unable to determine the installation parent directory for the toolchain archive" \ - "named '$toolchain_url_basename'. Toolchain URL: '$toolchain_url'." + "named '$toolchain_url_basename'. Toolchain URL: '${toolchain_url}'." fi download_and_extract_archive "$toolchain_url" "$toolchain_dir_parent" @@ -1890,6 +1907,9 @@ log_thirdparty_and_toolchain_details() { if is_linux && [[ -n ${YB_LINUXBREW_DIR:-} ]]; then echo " YB_LINUXBREW_DIR: $YB_LINUXBREW_DIR$yb_linuxbrew_dir_origin" fi + if [[ -n ${YB_LLVM_TOOLCHAIN_URL:-} ]]; then + echo " YB_LLVM_TOOLCHAIN_URL: $YB_LLVM_TOOLCHAIN_URL$yb_llvm_toolchain_url_origin" + fi if [[ -n ${YB_LLVM_TOOLCHAIN_DIR:-} ]]; then echo " YB_LLVM_TOOLCHAIN_DIR: $YB_LLVM_TOOLCHAIN_DIR$yb_llvm_toolchain_dir_origin" fi @@ -2437,8 +2457,16 @@ set_prebuilt_thirdparty_url() { fatal "Could not automatically determine the third-party archive URL to download." fi log "Setting third-party URL to $YB_THIRDPARTY_URL" - save_var_to_file_in_build_dir "$YB_THIRDPARTY_URL" thirdparty_url.txt + + if [[ -f $llvm_url_file_path ]]; then + YB_LLVM_TOOLCHAIN_URL=$(<"$llvm_url_file_path") + export YB_LLVM_TOOLCHAIN_URL + yb_llvm_toolchain_url_origin=" (determined automatically based on the OS and compiler type)" + log "Setting LLVM toolchain URL to $YB_LLVM_TOOLCHAIN_URL" + save_var_to_file_in_build_dir "$YB_LLVM_TOOLCHAIN_URL" llvm_url.txt + fi + else log "YB_THIRDPARTY_URL is already set to '$YB_THIRDPARTY_URL', not trying to set it" \ "automatically." @@ -2553,6 +2581,10 @@ is_apple_silicon() { return 1 } +should_use_lto() { + using_linuxbrew && [[ "${YB_COMPILER_TYPE}" == "clang12" && "${build_type}" == "release" ]] +} + # ------------------------------------------------------------------------------------------------- # Initialization # ------------------------------------------------------------------------------------------------- diff --git a/build-support/compiler-wrappers/compiler-wrapper.sh b/build-support/compiler-wrappers/compiler-wrapper.sh index 9b7bd0a4adea..cd08511d7d6d 100755 --- a/build-support/compiler-wrappers/compiler-wrapper.sh +++ b/build-support/compiler-wrappers/compiler-wrapper.sh @@ -287,6 +287,10 @@ has_yb_c_files=false compiler_args_no_output=() analyzer_checkers_specified=false +linking=false +use_lld=false +lld_linking=false + while [[ $# -gt 0 ]]; do is_output_arg=false case "$1" in @@ -335,6 +339,9 @@ while [[ $# -gt 0 ]]; do -analyzer-checker=*) analyzer_checkers_specified=true ;; + -fuse-ld=lld) + use_lld=true + ;; esac if ! "$is_output_arg"; then compiler_args_no_output+=( "$1" ) @@ -342,9 +349,21 @@ while [[ $# -gt 0 ]]; do shift done -if [[ $output_file != *.o && ${#library_files[@]} -gt 0 ]]; then - input_files+=( "${library_files[@]}" ) - library_files=() +if [[ $output_file == *.o ]]; then + # Compiling. + linking=false +else + linking=true +fi + +if [[ $linking == "true" && $use_lld == "true" ]]; then + lld_linking=true +fi + +if [[ $YB_COMPILER_TYPE == clang* && $output_file == "jsonpath_gram.o" ]]; then + # To avoid this error: + # https://gist.github.com/mbautin/b943fb426bfead7388dde17ddb1b0fa7 + compiler_args+=( -Wno-error=implicit-fallthrough ) fi # ------------------------------------------------------------------------------------------------- @@ -362,9 +381,12 @@ if [[ $HOSTNAME == build-worker* ]]; then is_build_worker=true fi +# If linking with LLVM's lld linker, do it locally. +# See https://github.com/yugabyte/yugabyte-db/issues/11034 for more details. if [[ $local_build_only == "false" && ${YB_REMOTE_COMPILATION:-} == "1" && - $is_build_worker == "false" ]]; then + $is_build_worker == "false" && + $lld_linking == "false" ]]; then trap remote_build_exit_handler EXIT @@ -552,6 +574,12 @@ run_compiler_and_save_stderr() { # ------------------------------------------------------------------------------------------------- # Local build +if [[ $output_file =~ libyb_pgbackend* ]]; then + # We record the linker command used for the libyb_pgbackend library so we can use it when + # producing the LTO build for yb-tserver. + echo "${compiler_args[*]}" >"link_cmd_${output_file}.txt" +fi + if [[ ${build_type:-} == "asan" && $PWD == */postgres_build/src/backend/utils/adt && # Turn off UBSAN instrumentation in a number of PostgreSQL source files determined by the diff --git a/build-support/jenkins/build-and-test.sh b/build-support/jenkins/build-and-test.sh index 9cc36f5a48e1..dba57869faf2 100755 --- a/build-support/jenkins/build-and-test.sh +++ b/build-support/jenkins/build-and-test.sh @@ -725,6 +725,22 @@ export YB_GZIP_PER_TEST_METHOD_LOGS=1 export YB_GZIP_TEST_LOGS=1 export YB_DELETE_SUCCESSFUL_PER_TEST_METHOD_LOGS=1 +if should_use_lto; then + log "Using LTO" + ( + set -x + "$YB_SRC_ROOT/python/yb/dependency_graph.py" \ + --build-root "$BUILD_ROOT" \ + --file-regex "^.*/yb-tserver$" \ + --lto-output-suffix="" \ + link-whole-program + ls -l "$BUILD_ROOT/bin/yb-tserver" + ldd "$BUILD_ROOT/bin/yb-tserver" + ) +else + log "Not using LTO: YB_COMPILER_TYPE=${YB_COMPILER_TYPE}, build_type=${build_type}" +fi + if [[ $YB_COMPILE_ONLY != "1" ]]; then if spark_available; then if [[ $YB_BUILD_CPP == "1" || $YB_BUILD_JAVA == "1" ]]; then @@ -740,11 +756,14 @@ if [[ $YB_COMPILE_ONLY != "1" ]]; then test_conf_path="$BUILD_ROOT/test_conf.json" # YB_GIT_COMMIT_FOR_DETECTING_TESTS allows overriding the commit to use to detect the set # of tests to run. Useful when testing this script. - "$YB_SRC_ROOT/python/yb/dependency_graph.py" \ - --build-root "$BUILD_ROOT" \ - --git-commit "${YB_GIT_COMMIT_FOR_DETECTING_TESTS:-$current_git_commit}" \ - --output-test-config "$test_conf_path" \ - affected + ( + set -x + "$YB_SRC_ROOT/python/yb/dependency_graph.py" \ + --build-root "$BUILD_ROOT" \ + --git-commit "${YB_GIT_COMMIT_FOR_DETECTING_TESTS:-$current_git_commit}" \ + --output-test-config "$test_conf_path" \ + affected + ) run_tests_extra_args+=( "--test_conf" "$test_conf_path" ) unset test_conf_path fi diff --git a/build-support/tserver_lto.sh b/build-support/tserver_lto.sh new file mode 100755 index 000000000000..8d0efb14bd22 --- /dev/null +++ b/build-support/tserver_lto.sh @@ -0,0 +1,13 @@ +#!/usr/bin/env bash + +. "${BASH_SOURCE%/*}/common-build-env.sh" + +activate_virtualenv +set_pythonpath + +set -x +"$YB_SRC_ROOT/python/yb/dependency_graph.py" \ + --build-root "$YB_SRC_ROOT/build/release-clang12-linuxbrew-dynamic-ninja" \ + --file-regex "^.*/yb-tserver$" \ + link-whole-program \ + "$@" diff --git a/build-support/yb_release.py b/build-support/yb_release.py index e2ffb75c359f..378f309cd761 100755 --- a/build-support/yb_release.py +++ b/build-support/yb_release.py @@ -47,7 +47,7 @@ def main(): 'false if --build_target is specified, true otherwise.') parser.add_argument('--destination', help='Copy release to Destination folder.') parser.add_argument('--force', help='Skip prompts', action='store_true') - parser.add_argument('--commit', help='Custom specify a git commit.') + parser.add_argument('--commit', help='Specifies a custom git commit to use in archive name.') parser.add_argument('--skip_build', help='Skip building the code', action='store_true') parser.add_argument('--build_target', help='Target directory to put the YugaByte distribution into. This can ' @@ -221,8 +221,13 @@ def main(): # This points to the release manifest within the release_manager, and we are modifying that # directly. release_util = ReleaseUtil( - YB_SRC_ROOT, build_type, build_target, args.force, args.commit, build_root, - args.package_name) + repository=YB_SRC_ROOT, + build_type=build_type, + distribution_path=build_target, + force=args.force, + commit=args.commit, + build_root=build_root, + package_name=args.package_name) system = platform.system().lower() library_packager_args = dict( diff --git a/cmake_modules/YugabyteFunctions.cmake b/cmake_modules/YugabyteFunctions.cmake index c13fb38a4f22..c04878811796 100644 --- a/cmake_modules/YugabyteFunctions.cmake +++ b/cmake_modules/YugabyteFunctions.cmake @@ -729,3 +729,30 @@ function(parse_build_root_basename) "but CMAKE_SYSTEM_PROCESSOR is ${CMAKE_SYSTEM_PROCESSOR}") endif() endfunction() + +macro(enable_lto_if_needed) + if(NOT DEFINED COMPILER_FAMILY) + message(FATAL_ERROR "COMPILER_FAMILY not defined") + endif() + if(NOT DEFINED USING_LINUXBREW) + message(FATAL_ERROR "USING_LINUXBREW not defined") + endif() + if(NOT DEFINED YB_BUILD_TYPE) + message(FATAL_ERROR "YB_BUILD_TYPE not defined") + endif() + + if("${YB_BUILD_TYPE}" STREQUAL "release" AND + "${COMPILER_FAMILY}" STREQUAL "clang" AND + "${YB_COMPILER_TYPE}" STREQUAL "clang12" AND + USING_LINUXBREW AND + NOT APPLE) + message("Enabling full LTO and lld linker") + ADD_CXX_FLAGS("-flto=full -fuse-ld=lld") + else() + message("Not enabling LTO: " + "YB_BUILD_TYPE=${YB_BUILD_TYPE}, " + "COMPILER_FAMILY=${COMPILER_FAMILY}, " + "USING_LINUXBREW=${USING_LINUXBREW}, " + "APPLE=${APPLE}") + endif() +endmacro() diff --git a/codecheck.ini b/codecheck.ini index 2e6a7e83dc49..e9436429d4b1 100644 --- a/codecheck.ini +++ b/codecheck.ini @@ -38,11 +38,15 @@ included_regex_list = ^python/yb/command_util[.]py$ ^python/yb/common_util[.]py$ ^python/yb/compile_commands[.]py$ + ^python/yb/dep_graph_common[.]py$ ^python/yb/dependency_graph[.]py$ ^python/yb/library_packager[.]py$ ^python/yb/linuxbrew[.]py$ + ^python/yb/lto[.]py$ ^python/yb/os_detection[.]py$ + ^python/yb/release_util[.]py$ ^python/yb/run_pvs_studio_analyzer[.]py$ + ^python/yb/source_files[.]py$ ^python/yb/thirdparty_tool[.]py$ ^python/yb/tool_base[.]py$ ^python/yb/yb_dist_tests[.]py$ diff --git a/ent/build-support/upload_package.sh b/ent/build-support/upload_package.sh index 2b7188dc5209..b8e84b0efbe2 100644 --- a/ent/build-support/upload_package.sh +++ b/ent/build-support/upload_package.sh @@ -100,8 +100,9 @@ upload_package() { return fi - if is_linux && [[ $YB_COMPILER_TYPE != "gcc" ]]; then - log "Skipping package upload for a non-gcc build on Linux (compiler type: $YB_COMPILER_TYPE)" + if is_linux && [[ $YB_COMPILER_TYPE != "gcc" && + $YB_COMPILER_TYPE != "clang12" ]]; then + log "Skipping package upload on Linux (compiler type: $YB_COMPILER_TYPE)" return fi fi diff --git a/python/yb/build_paths.py b/python/yb/build_paths.py new file mode 100644 index 000000000000..28a82b460601 --- /dev/null +++ b/python/yb/build_paths.py @@ -0,0 +1,74 @@ +# Copyright (c) Yugabyte, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except +# in compliance with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software distributed under the License +# is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express +# or implied. See the License for the specific language governing permissions and limitations +# under the License. + +import os + +from yb.common_util import read_file + +from typing import Optional, Set + + +class BuildPaths: + build_root: str + llvm_path: Optional[str] + thirdparty_path: Optional[str] + linuxbrew_path: Optional[str] + + file_existence_checking_cache: Set[str] + + def _read_path_file(self, path_file_name: str) -> Optional[str]: + path_file_path = os.path.join(self.build_root, path_file_name) + if not os.path.exists(path_file_path): + return None + + dir_path = read_file(path_file_path).strip() + if not os.path.exists(dir_path): + raise IOError( + "Path contained in file %s does not exist or is not a directory: '%s'" % ( + path_file_path, dir_path)) + return dir_path + + def __init__(self, build_root: str) -> None: + self.build_root = build_root + + if not os.path.exists(build_root): + raise IOError("Build root directory does not exist: %s" % build_root) + + self.llvm_path = self._read_path_file('llvm_path.txt') + self.thirdparty_path = self._read_path_file('thirdparty_path.txt') + self.linuxbrew_path = self._read_path_file('linuxbrew_path.txt') + + self.file_existence_checking_cache = set() + + def get_llvm_path(self) -> str: + assert self.llvm_path is not None + return self.llvm_path + + def get_linuxbrew_path(self) -> str: + assert self.linuxbrew_path is not None + return self.linuxbrew_path + + def get_thirdparty_path(self) -> str: + assert self.thirdparty_path is not None + return self.thirdparty_path + + def _check_file_existence(self, file_path: str) -> None: + if file_path in self.file_existence_checking_cache: + return + if not os.path.exists(file_path): + raise IOError("File does not exist: %s" % file_path) + self.file_existence_checking_cache.add(file_path) + + def get_llvm_tool_path(self, tool_name: str) -> str: + tool_path = os.path.join(self.get_llvm_path(), 'bin', tool_name) + self._check_file_existence(tool_path) + return tool_path diff --git a/python/yb/common_util.py b/python/yb/common_util.py index a78386457b71..987be774388b 100644 --- a/python/yb/common_util.py +++ b/python/yb/common_util.py @@ -432,3 +432,40 @@ def arg_str_to_bool(v: Any) -> bool: if v.lower() in ('no', 'false', 'f', 'n', '0'): return False raise argparse.ArgumentTypeError('Boolean value expected, got: %s' % v) + + +g_home_dir: Optional[str] = None + + +def get_home_dir() -> str: + global g_home_dir + if g_home_dir is not None: + return g_home_dir + g_home_dir = os.path.realpath(os.path.expanduser('~')) + return g_home_dir + + +def get_relative_path_or_none(abs_path: str, relative_to: str) -> Optional[str]: + """ + If the given path starts with another directory path, return the relative path, or else None. + """ + if not relative_to.endswith('/'): + relative_to += '/' + if abs_path.startswith(relative_to): + return abs_path[len(relative_to):] + return None + + +K = TypeVar('K') +V = TypeVar('V') + + +def append_to_list_in_dict(dest: Dict[K, List[V]], key: K, new_item: V) -> None: + """ + Append the given element to the list that the given key in the given dict points to. If the key + does not point to anything, create a new list at that key. + """ + if key in dest: + dest[key].append(new_item) + else: + dest[key] = [new_item] diff --git a/python/yb/dep_graph_common.py b/python/yb/dep_graph_common.py new file mode 100644 index 000000000000..f668f0d5263d --- /dev/null +++ b/python/yb/dep_graph_common.py @@ -0,0 +1,915 @@ +# Copyright (c) Yugabyte, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except +# in compliance with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software distributed under the License +# is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express +# or implied. See the License for the specific language governing permissions and limitations +# under the License. + +import fnmatch +import logging +import os +import re +import json +import platform + +from enum import Enum +from typing import Any, Set, FrozenSet, List, Optional, Dict, Union, Iterable + +from yb.common_util import ( + get_build_type_from_build_root, + is_ninja_build_root, + ensure_yb_src_root_from_build_root, + get_home_dir, + get_relative_path_or_none, + group_by, + append_to_list_in_dict, +) + + +def make_extensions(exts_without_dot: List[str]) -> List[str]: + """ + Make a list of extensions with leading dots from a list of extensions without leading dots. + """ + for ext in exts_without_dot: + assert not ext.startswith('.') + + return ['.' + ext for ext in exts_without_dot] + + +LIBRARY_FILE_EXTENSIONS_NO_DOT = ['so', 'dylib'] +LIBRARY_FILE_EXTENSIONS = make_extensions(LIBRARY_FILE_EXTENSIONS_NO_DOT) + +SOURCE_FILE_EXTENSIONS = make_extensions( + ['c', 'cc', 'cpp', 'cxx', 'h', 'hpp', 'hxx', 'proto', 'l', 'y']) + +LIBRARY_FILE_NAME_RE = re.compile(r'^lib(.*)[.](?:%s)$' % '|'.join(LIBRARY_FILE_EXTENSIONS_NO_DOT)) +PROTO_LIBRARY_FILE_NAME_RE = re.compile(r'^lib(.*)_proto[.](?:%s)$' % '|'.join( + LIBRARY_FILE_EXTENSIONS_NO_DOT)) + +TEST_FILE_SUFFIXES = ['_test', '-test', '_itest', '-itest'] + +EXECUTABLE_FILE_NAME_RE = re.compile(r'^[a-zA-Z0-9_.-]+$') +PROTO_OUTPUT_FILE_NAME_RE = re.compile(r'^([a-zA-Z_0-9-]+)[.]pb[.](h|cc)$') + +# Ignore some special-case CMake targets that do not have a one-to-one match with executables or +# libraries. +IGNORED_CMAKE_TARGETS = ['gen_version_info', 'latest_symlink', 'postgres'] + +DYLIB_SUFFIX = '.dylib' if platform.system() == 'Darwin' else '.so' + + +class NodeType(Enum): + """ + We classify nodes in the dependency graph into multiple types. + """ + + # A special value used to match any node type. + ANY = 'any' + + SOURCE = 'source' + LIBRARY = 'library' + TEST = 'test' + OBJECT = 'object' + EXECUTABLE = 'executable' + OTHER = 'other' + + def __str__(self) -> str: + return self.value + + def __lt__(self, other: Any) -> bool: + assert isinstance(other, NodeType) + return self.value < other.value + + def __eq__(self, other: Any) -> bool: + assert isinstance(other, NodeType) + return self.value == other.value + + def __hash__(self) -> int: + return hash(self.value) + + +def is_object_file(path: str) -> bool: + return path.endswith('.o') + + +def ends_with_any_of(path: str, exts: List[str]) -> bool: + """ + Returns true if the given path ends with any of the given extensions. + """ + for ext in exts: + if path.endswith(ext): + return True + return False + + +def is_shared_library(path: str) -> bool: + return ( + ends_with_any_of(path, LIBRARY_FILE_EXTENSIONS) and + not os.path.basename(path) in LIBRARY_FILE_EXTENSIONS and + not path.startswith('-')) + + +def get_node_type_by_path(path: str) -> NodeType: + """ + >>> get_node_type_by_path('my_source_file.cc').value + 'source' + >>> get_node_type_by_path('my_library.so').value + 'library' + >>> get_node_type_by_path('/bin/bash').value + 'executable' + >>> get_node_type_by_path('my_object_file.o').value + 'object' + >>> get_node_type_by_path('tests-integration/some_file').value + 'test' + >>> get_node_type_by_path('tests-integration/some_file.txt').value + 'other' + >>> get_node_type_by_path('something/my-test').value + 'test' + >>> get_node_type_by_path('something/my_test').value + 'test' + >>> get_node_type_by_path('something/my-itest').value + 'test' + >>> get_node_type_by_path('something/my_itest').value + 'test' + >>> get_node_type_by_path('some-dir/some_file').value + 'other' + """ + if ends_with_any_of(path, SOURCE_FILE_EXTENSIONS): + return NodeType.SOURCE + + if (ends_with_any_of(path, LIBRARY_FILE_EXTENSIONS) or + any([ext + '.' in path for ext in LIBRARY_FILE_EXTENSIONS])): + return NodeType.LIBRARY + + if (ends_with_any_of(path, TEST_FILE_SUFFIXES) or + (os.path.basename(os.path.dirname(path)).startswith('tests-') and + '.' not in os.path.basename(path))): + return NodeType.TEST + + if is_object_file(path): + return NodeType.OBJECT + + if os.path.exists(path) and os.access(path, os.X_OK): + # This will only work if the code has been fully built. + return NodeType.EXECUTABLE + + return NodeType.OTHER + + +class DepGraphConf: + verbose: bool + build_root: str + is_ninja: bool + build_type: str + yb_src_root: str + src_dir_path: str + ent_src_dir_path: str + rel_path_base_dirs: Set[str] + incomplete_build: bool + file_regex: Optional[str] + src_dir_paths: List[str] + + # Extra arguments to pass to yb_build.sh. + build_args: Optional[str] + + def __init__( + self, + verbose: bool, + build_root: str, + incomplete_build: bool, + file_regex: Optional[str], + file_name_glob: Optional[str], + build_args: Optional[str]) -> None: + self.verbose = verbose + self.build_root = os.path.realpath(build_root) + self.is_ninja = is_ninja_build_root(self.build_root) + + self.build_type = get_build_type_from_build_root(self.build_root) + self.yb_src_root = ensure_yb_src_root_from_build_root(self.build_root) + self.src_dir_path = os.path.join(self.yb_src_root, 'src') + self.ent_src_dir_path = os.path.join(self.yb_src_root, 'ent', 'src') + self.rel_path_base_dirs = {self.build_root, os.path.join(self.src_dir_path, 'yb')} + self.incomplete_build = incomplete_build + + self.file_regex = None + if file_regex: + self.file_regex = file_regex + elif file_name_glob: + self.file_regex = fnmatch.translate('*/' + file_name_glob) + + self.src_dir_paths = [self.src_dir_path, self.ent_src_dir_path] + + for dir_path in self.src_dir_paths: + if not os.path.isdir(dir_path): + raise RuntimeError("Directory does not exist, or is not a directory: %s" % dir_path) + self.build_args = build_args + + +class Node: + """ + A node in the dependency graph. Could be a source file, a header file, an object file, a + dynamic library, or an executable. + """ + + path: str + deps: Set['Node'] + reverse_deps: Set['Node'] + node_type: NodeType + dep_graph: 'DependencyGraph' + conf: 'DepGraphConf' + source_str: Optional[str] + is_proto_lib: bool + link_cmd: Optional[List[str]] + _cached_proto_lib_deps: Optional[List['Node']] + _cached_containing_binaries: Optional[List['Node']] + _cached_cmake_target: Optional[str] + + def __init__(self, path: str, dep_graph: 'DependencyGraph', source_str: Optional[str]) -> None: + # Sometimes we might want to make yb-tserver and other binaries in the bin directory + # symlinks to LTO-enabled binaries. That is why we don't want to use realpath here. + path = os.path.abspath(path) + self.path = path + + # Other nodes that this node depends on. + self.deps = set() + + # Nodes that depend on this node. + self.reverse_deps = set() + + self.node_type = get_node_type_by_path(path) + self.dep_graph = dep_graph + self.conf = dep_graph.conf + self.source_str = source_str + + self.is_proto_lib = bool( + self.node_type == NodeType.LIBRARY and + PROTO_LIBRARY_FILE_NAME_RE.match(os.path.basename(self.path))) + + self._cached_proto_lib_deps = None + self._cached_containing_binaries = None + self._cached_cmake_target = None + self._has_no_cmake_target = False + + self.link_cmd = None + + def add_dependency(self, dep: 'Node') -> None: + assert self is not dep + self.deps.add(dep) + dep.reverse_deps.add(self) + + def __eq__(self, other: Any) -> bool: + if not isinstance(other, Node): + return False + + return self.path == other.path + + def __hash__(self) -> int: + return hash(self.path) + + def is_source(self) -> bool: + return self.node_type == NodeType.SOURCE + + def validate_existence(self) -> None: + if not os.path.exists(self.path) and not self.dep_graph.conf.incomplete_build: + raise RuntimeError( + "Path does not exist: '{}'. This node was found in: {}. " + "The build might be incomplete or the dependency graph might be stale.".format( + self.path, + self.source_str)) + + def get_pretty_path(self) -> str: + for prefix, alias in [(self.conf.build_root, '$BUILD_ROOT'), + (self.conf.yb_src_root, '$YB_SRC_ROOT'), + (get_home_dir(), '~')]: + if self.path.startswith(prefix + '/'): + return alias + '/' + self.path[len(prefix) + 1:] + + return self.path + + def path_rel_to_build_root(self) -> Optional[str]: + return get_relative_path_or_none(self.path, self.conf.build_root) + + def path_rel_to_src_root(self) -> Optional[str]: + return get_relative_path_or_none(self.path, self.conf.yb_src_root) + + def __str__(self) -> str: + return "Node(\"{}\", type={}, {} deps, {} rev deps)".format( + self.get_pretty_path(), self.node_type, len(self.deps), len(self.reverse_deps)) + + def __repr__(self) -> str: + return self.__str__() + + def get_cmake_target(self) -> Optional[str]: + """ + @return the CMake target based on the current node's path. E.g. this would be "master" + for the "libmaster.so" library, "yb-master" for the "yb-master" executable. + """ + if self._cached_cmake_target: + return self._cached_cmake_target + if self._has_no_cmake_target: + return None + + if self.path.endswith('.proto'): + path = self.path + names = [] + while path != '/' and path != self.conf.yb_src_root: + names.append(os.path.basename(path)) + path = os.path.dirname(path) + + # This needs to be consistent with the following CMake code snippet: + # + # set(TGT_NAME "gen_${PROTO_REL_TO_YB_SRC_ROOT}") + # string(REPLACE "@" "_" TGT_NAME ${TGT_NAME}) + # string(REPLACE "." "_" TGT_NAME ${TGT_NAME}) + # string(REPLACE "-" "_" TGT_NAME ${TGT_NAME}) + # string(REPLACE "/" "_" TGT_NAME ${TGT_NAME}) + # + # (see FindProtobuf.cmake and FindYRPC.cmake). + # + # "/" cannot appear in the resulting string, so no need to replace it with "_". + target = re.sub('[@.-]', '_', '_'.join(['gen'] + names[::-1])) + + if self.conf.verbose: + logging.info("Associating protobuf file '{}' with CMake target '{}'".format( + self.path, target)) + self._cached_cmake_target = target + return target + + basename = os.path.basename(self.path) + m = LIBRARY_FILE_NAME_RE.match(basename) + if m: + target = m.group(1) + self._cached_cmake_target = target + return target + m = EXECUTABLE_FILE_NAME_RE.match(basename) + if m: + self._cached_cmake_target = basename + return basename + self._has_no_cmake_target = True + return None + + def get_containing_binaries(self) -> List['Node']: + """ + Returns nodes (usually one node) corresponding to executables or dynamic libraries that the + given object file is compiled into. + """ + if self.path.endswith('.cc'): + cc_o_rev_deps = [rev_dep for rev_dep in self.reverse_deps + if rev_dep.path.endswith('.cc.o')] + if len(cc_o_rev_deps) != 1: + raise RuntimeError( + "Could not identify exactly one object file reverse dependency of node " + "%s. All set of reverse dependencies: %s" % (self, self.reverse_deps)) + return cc_o_rev_deps[0].get_containing_binaries() + + if self.node_type != NodeType.OBJECT: + raise ValueError( + "get_containing_binaries can only be called on nodes of type 'object'. Node: " + + str(self)) + + if self._cached_containing_binaries: + return self._cached_containing_binaries + + binaries = [] + for rev_dep in self.reverse_deps: + if rev_dep.node_type in [NodeType.LIBRARY, NodeType.TEST, NodeType.EXECUTABLE]: + binaries.append(rev_dep) + if len(binaries) > 1: + logging.warning( + "Node %s is linked into multiple libraries: %s. Might be worth checking.", + self, binaries) + + self._cached_containing_binaries = binaries + return binaries + + def get_recursive_deps( + self, + skip_node_types: Union[Set[NodeType], FrozenSet[NodeType]] = frozenset() + ) -> Set['Node']: + """ + Returns a set of all dependencies that this node depends on. + skip_node_types specifies the set of categories of nodes, except the initial node, to stop + the recursive search at. + """ + + recursive_deps: Set[Node] = set() + visited: Set[Node] = set() + + def walk(node: Node, is_initial: bool) -> None: + if not is_initial: + # Only skip the given subset of node types for nodes other than the initial node. + if node.node_type in skip_node_types: + return + # Also, never add the initial node to the result. + recursive_deps.add(node) + for dep in node.deps: + if dep not in recursive_deps: + walk(dep, is_initial=False) + + walk(self, is_initial=True) + return recursive_deps + + def get_proto_lib_deps(self) -> List['Node']: + if self._cached_proto_lib_deps is None: + self._cached_proto_lib_deps = [ + dep for dep in self.get_recursive_deps() if dep.is_proto_lib + ] + return self._cached_proto_lib_deps + + def get_containing_proto_lib(self) -> Optional['Node']: + """ + For a .pb.cc file node, return the node corresponding to the containing protobuf library. + """ + if not self.path.endswith('.pb.cc.o'): + return None + + containing_binaries: List[Node] = self.get_containing_binaries() + + if len(containing_binaries) != 1: + logging.info("Reverse deps:\n %s" % ("\n ".join( + [str(dep) for dep in self.reverse_deps]))) + raise RuntimeError( + "Invalid number of proto libs for %s. Expected 1 but found %d: %s" % ( + self, + len(containing_binaries), + containing_binaries)) + return containing_binaries[0] + + def get_proto_gen_cmake_target(self) -> Optional[str]: + """ + For .pb.{h,cc} nodes this will return a CMake target of the form + gen_..., e.g. gen_src_yb_common_wire_protocol. + """ + rel_path = self.path_rel_to_build_root() + if not rel_path: + return None + + basename = os.path.basename(rel_path) + match = PROTO_OUTPUT_FILE_NAME_RE.match(basename) + if not match: + return None + return '_'.join( + ['gen'] + + os.path.dirname(rel_path).split('/') + + [match.group(1), 'proto'] + ) + + def set_link_command(self, link_cmd: List[str]) -> None: + assert self.link_cmd is None + self.link_cmd = link_cmd + + def as_json(self) -> Dict[str, Any]: + d: Dict[str, Any] = dict( + path=self.path + ) + if self.link_cmd: + d['link_cmd'] = self.link_cmd + return d + + def update_from_json(self, json_data: Dict[str, Any]) -> None: + if 'link_cmd' in json_data: + self.link_cmd = json_data['link_cmd'] + + +class CMakeDepGraph: + """ + A light-weight class representing the dependency graph of CMake targets imported from the + yb_cmake_deps.txt file that we generate in our top-level CMakeLists.txt. This dependency graph + does not have any nodes for source files and object files. + """ + + build_root: str + cmake_targets: Set[str] + cmake_deps_path: str + cmake_deps: Dict[str, Set[str]] + + def __init__(self, build_root: str) -> None: + self.build_root = build_root + self.cmake_deps_path = os.path.join(self.build_root, 'yb_cmake_deps.txt') + self._load() + + def _load(self) -> None: + logging.info("Loading dependencies between CMake targets from '{}'".format( + self.cmake_deps_path)) + self.cmake_deps = {} + self.cmake_targets = set() + with open(self.cmake_deps_path) as cmake_deps_file: + for line in cmake_deps_file: + line = line.strip() + if not line: + continue + items = [item.strip() for item in line.split(':')] + if len(items) != 2: + raise RuntimeError( + "Expected to find two items when splitting line on ':', found {}:\n{}", + len(items), line) + lhs, rhs = items + if lhs in IGNORED_CMAKE_TARGETS: + continue + cmake_dep_set = self._get_cmake_dep_set_of(lhs) + + for cmake_dep in rhs.split(';'): + if cmake_dep in IGNORED_CMAKE_TARGETS: + continue + cmake_dep_set.add(cmake_dep) + + for cmake_target, cmake_target_deps in self.cmake_deps.items(): + adding_targets = [cmake_target] + list(cmake_target_deps) + self.cmake_targets.update(set(adding_targets)) + logging.info("Found {} CMake targets in '{}'".format( + len(self.cmake_targets), self.cmake_deps_path)) + + def _get_cmake_dep_set_of(self, target: str) -> Set[str]: + """ + Get CMake dependencies of the given target. What is returned is a mutable set. Modifying + this set modifies this CMake dependency graph. + """ + deps = self.cmake_deps.get(target) + if deps is None: + deps = set() + self.cmake_deps[target] = deps + self.cmake_targets.add(target) + return deps + + def add_dependency(self, from_target: str, to_target: str) -> None: + if from_target in IGNORED_CMAKE_TARGETS or to_target in IGNORED_CMAKE_TARGETS: + return + self._get_cmake_dep_set_of(from_target).add(to_target) + self.cmake_targets.add(to_target) + + def get_recursive_cmake_deps(self, cmake_target: str) -> Set[str]: + result: Set[str] = set() + visited: Set[str] = set() + + def walk(cur_target: str, add_this_target: bool = True) -> None: + if cur_target in visited: + return + visited.add(cur_target) + if add_this_target: + result.add(cur_target) + for dep in self.cmake_deps.get(cur_target, []): + walk(dep) + + walk(cmake_target, add_this_target=False) + return result + + +class DependencyGraph: + + conf: DepGraphConf + node_by_path: Dict[str, Node] + nodes_by_basename: Optional[Dict[str, Set[Node]]] + cmake_dep_graph: Optional[CMakeDepGraph] + + canonicalization_cache: Dict[str, str] = {} + + def __init__(self, + conf: DepGraphConf, + json_data: Optional[List[Dict[str, Any]]] = None) -> None: + """ + @param json_data optional results of JSON parsing + """ + self.conf = conf + self.node_by_path = {} + if json_data: + self.init_from_json(json_data) + self.nodes_by_basename = None + self.cmake_dep_graph = None + + def get_cmake_dep_graph(self) -> CMakeDepGraph: + if self.cmake_dep_graph: + return self.cmake_dep_graph + + cmake_dep_graph = CMakeDepGraph(self.conf.build_root) + for node in self.get_nodes(): + proto_lib = node.get_containing_proto_lib() + if proto_lib and self.conf.verbose: + logging.info("node: %s, proto lib: %s", node, proto_lib) + + self.cmake_dep_graph = cmake_dep_graph + return cmake_dep_graph + + def find_node(self, + path: str, + must_exist: bool = True, + source_str: Optional[str] = None) -> Node: + assert source_str is None or not must_exist + path = os.path.abspath(path) + node = self.node_by_path.get(path) + if node: + return node + if must_exist: + raise RuntimeError( + ("Node not found by path: '{}' (expected to already have this node in our " + "graph, not adding).").format(path)) + node = Node(path, self, source_str) + self.node_by_path[path] = node + return node + + def find_or_create_node(self, path: str, source_str: Optional[str] = None) -> Node: + """ + Finds a node with the given path or creates it if it does not exist. + @param source_str a string description of how we came up with this node's path + """ + + canonical_path = self.canonicalization_cache.get(path) + if not canonical_path: + canonical_path = os.path.realpath(path) + if canonical_path.startswith(self.conf.build_root + '/'): + canonical_path = self.conf.build_root + '/' + \ + canonical_path[len(self.conf.build_root) + 1:] + self.canonicalization_cache[path] = canonical_path + + return self.find_node(canonical_path, must_exist=False, source_str=source_str) + + def create_node_from_json(self, node_json_data: Dict[str, Any]) -> Node: + node = self.find_or_create_node(node_json_data['path']) + node.update_from_json(node_json_data) + return node + + def init_from_json(self, json_nodes: List[Dict[str, Any]]) -> None: + id_to_node = {} + id_to_dep_ids = {} + for node_json in json_nodes: + node_id = node_json['id'] + id_to_node[node_id] = self.create_node_from_json(node_json) + id_to_dep_ids[node_id] = node_json.get('deps') or [] + for node_id, dep_ids in id_to_dep_ids.items(): + node = id_to_node[node_id] + for dep_id in dep_ids: + node.add_dependency(id_to_node[dep_id]) + + def find_nodes_by_regex(self, regex_str: str) -> List[Node]: + filter_re = re.compile(regex_str) + return [node for node in self.get_nodes() if filter_re.match(node.path)] + + def find_nodes_by_basename(self, basename: str) -> Optional[Set[Node]]: + if not self.nodes_by_basename: + # We are lazily initializing the basename -> node map, and any changes to the graph + # after this point will not get reflected in it. This is OK as we're only using this + # function after the graph has been built. + self.nodes_by_basename = { + k: set(v) + for k, v in group_by( + self.get_nodes(), + lambda node: os.path.basename(node.path)).items() + } + assert self.nodes_by_basename is not None + return self.nodes_by_basename.get(basename) + + def find_affected_nodes( + self, + start_nodes: Set[Node], + requested_node_type: NodeType = NodeType.ANY) -> Set[Node]: + if self.conf.verbose: + logging.info("Starting with the following initial nodes:") + for node in start_nodes: + logging.info(" {}".format(node)) + + results = set() + visited = set() + + def dfs(node: Node) -> None: + if ((requested_node_type == NodeType.ANY or + node.node_type == requested_node_type) and node not in start_nodes): + results.add(node) + if node in visited: + return + visited.add(node) + for dep in node.reverse_deps: + dfs(dep) + + for node in start_nodes: + dfs(node) + + return results + + def affected_basenames_by_basename_for_test( + self, basename: str, node_type: NodeType = NodeType.ANY) -> Set[str]: + nodes_for_basename = self.find_nodes_by_basename(basename) + if not nodes_for_basename: + self.dump_debug_info() + raise RuntimeError( + "No nodes found for file name '{}' (total number of nodes: {})".format( + basename, len(self.node_by_path))) + return set([os.path.basename(node.path) + for node in self.find_affected_nodes(nodes_for_basename, node_type)]) + + def save_as_json(self, output_path: str) -> None: + """ + Converts the dependency graph into a JSON representation, where every node is given an id, + so that dependencies are represented concisely. + """ + with open(output_path, 'w') as output_file: + next_node_id = [1] # Use a mutable object so we can modify it from closure. + path_to_id: Dict[str, int] = {} + output_file.write("[") + + def get_node_id(node: Node) -> int: + node_id = path_to_id.get(node.path) + if not node_id: + node_id = next_node_id[0] + path_to_id[node.path] = node_id + next_node_id[0] = node_id + 1 + return node_id + + is_first = True + for node_path, node in self.node_by_path.items(): + node_json = node.as_json() + dep_ids = [get_node_id(dep) for dep in node.deps] + node_json.update(id=get_node_id(node)) + if dep_ids: + node_json.update(deps=dep_ids) + if not is_first: + output_file.write(",\n") + is_first = False + output_file.write(json.dumps(node_json)) + output_file.write("\n]\n") + + logging.info("Saved dependency graph to '{}'".format(output_path)) + + def validate_node_existence(self) -> None: + logging.info("Validating existence of build artifacts") + for node in self.get_nodes(): + node.validate_existence() + + def get_nodes(self) -> Iterable[Node]: + return self.node_by_path.values() + + def dump_debug_info(self) -> None: + logging.info("Dumping all graph nodes for debugging ({} nodes):".format( + len(self.node_by_path))) + for node in sorted(self.get_nodes(), key=lambda node: str(node)): + logging.info(node) + + def _add_proto_generation_deps(self) -> None: + """ + Add dependencies of .pb.{h,cc} files on the corresponding .proto file. We do that by + finding .proto and .pb.{h,cc} nodes in the graph independently and matching them + based on their path relative to the source root. + + Additionally, we are inferring dependencies between binaries (protobuf libs or in some + cases other libraries or even tests) that use a .pb.cc file on the CMake target that + generates these files (e.g. gen_src_yb_rocksdb_db_version_edit_proto). We add these + inferred dependencies to the separate CMake dependency graph. + """ + proto_node_by_rel_path: Dict[str, Node] = {} + pb_h_cc_nodes_by_rel_path: Dict[str, List[Node]] = {} + + cmake_dep_graph = self.get_cmake_dep_graph() + assert cmake_dep_graph is not None + + for node in self.get_nodes(): + basename = os.path.basename(node.path) + if node.path.endswith('.proto'): + proto_rel_path = node.path_rel_to_src_root() + if proto_rel_path: + proto_rel_path = proto_rel_path[:-6] + if proto_rel_path.startswith('ent/'): + # Remove the 'ent/' prefix because there is no separate 'ent/' prefix + # in the build directory. + proto_rel_path = proto_rel_path[4:] + if proto_rel_path in proto_node_by_rel_path: + raise RuntimeError( + "Multiple .proto nodes found that share the same " + "relative path to source root: %s and %s" % ( + proto_node_by_rel_path[proto_rel_path], node + )) + + proto_node_by_rel_path[proto_rel_path] = node + else: + match = PROTO_OUTPUT_FILE_NAME_RE.match(basename) + if match: + proto_output_rel_path = node.path_rel_to_build_root() + if proto_output_rel_path: + append_to_list_in_dict( + pb_h_cc_nodes_by_rel_path, + os.path.join( + os.path.dirname(proto_output_rel_path), + match.group(1) + ), + node) + if node.path.endswith('.pb.cc'): + proto_gen_cmake_target = node.get_proto_gen_cmake_target() + for containing_binary in node.get_containing_binaries(): + containing_cmake_target = containing_binary.get_cmake_target() + assert containing_cmake_target is not None + proto_gen_cmake_target = node.get_proto_gen_cmake_target() + assert proto_gen_cmake_target is not None + cmake_dep_graph.add_dependency( + containing_cmake_target, + proto_gen_cmake_target + ) + + for rel_path in proto_node_by_rel_path: + if rel_path not in pb_h_cc_nodes_by_rel_path: + raise ValueError( + "For relative path %s, found a proto file (%s) but no .pb.{h,cc} files" % + (rel_path, proto_node_by_rel_path[rel_path])) + + for rel_path in pb_h_cc_nodes_by_rel_path: + if rel_path not in proto_node_by_rel_path: + raise ValueError( + "For relative path %s, found .pb.{h,cc} files (%s) but no .proto" % + (rel_path, pb_h_cc_nodes_by_rel_path[rel_path])) + + # This is what we've verified above in two directions separately. + assert(set(proto_node_by_rel_path.keys()) == set(pb_h_cc_nodes_by_rel_path.keys())) + + for rel_path in proto_node_by_rel_path.keys(): + proto_node = proto_node_by_rel_path[rel_path] + # .pb.{h,cc} files need to depend on the .proto file they were generated from. + for pb_h_cc_node in pb_h_cc_nodes_by_rel_path[rel_path]: + pb_h_cc_node.add_dependency(proto_node) + + def _check_for_circular_dependencies(self) -> None: + logging.info("Checking for circular dependencies") + visited = set() + stack = [] + + def walk(node: Node) -> None: + if node in visited: + return + try: + stack.append(node) + visited.add(node) + for dep in node.deps: + if dep in visited: + if dep in stack: + raise RuntimeError("Circular dependency loop found: %s", stack) + return + walk(dep) # type: ignore + finally: + stack.pop() + + for node in self.get_nodes(): + walk(node) # type: ignore + + logging.info("No circular dependencies found -- this is good (visited %d nodes)", + len(visited)) + + def validate_proto_deps(self) -> None: + """ + Make sure that code that depends on protobuf-generated files also depends on the + corresponding protobuf library target. + """ + + logging.info("Validating dependencies on protobuf-generated headers") + + # TODO: only do this during graph generation. + self._add_proto_generation_deps() + self._check_for_circular_dependencies() + + # For any .pb.h file, we want to find all .cc.o files that depend on it, meaning that they + # directly or indirectly include that .pb.h file. + # + # Then we want to look at the containing executable or library of the .cc.o file and make + # sure it has a CMake dependency (direct or indirect) on the protobuf generation target of + # the form gen_src_yb_common_redis_protocol_proto. + # + # However, we also need to get the CMake target name corresponding to the binary + # containing the .cc.o file. + + proto_dep_errors = [] + + for node in self.get_nodes(): + if not node.path.endswith('.pb.cc.o'): + continue + source_deps = [dep for dep in node.deps if dep.path.endswith('.cc')] + if len(source_deps) != 1: + raise ValueError( + "Could not identify a single source dependency of node %s. Found: %s. " % + (node, source_deps)) + source_dep = source_deps[0] + pb_h_path = source_dep.path[:-3] + '.h' + pb_h_node = self.node_by_path[pb_h_path] + proto_gen_target = pb_h_node.get_proto_gen_cmake_target() + + for rev_dep in pb_h_node.reverse_deps: + if rev_dep.path.endswith('.cc.o'): + containing_binaries: Optional[List[Node]] = rev_dep.get_containing_binaries() + assert containing_binaries is not None + for binary in containing_binaries: + binary_cmake_target: Optional[str] = binary.get_cmake_target() + assert binary_cmake_target is not None + + recursive_cmake_deps = self.get_cmake_dep_graph().get_recursive_cmake_deps( + binary_cmake_target) + if proto_gen_target not in recursive_cmake_deps: + proto_dep_errors.append( + "CMake target %s does not depend directly or indirectly on target " + "%s but uses the header file %s. Recursive cmake deps of %s: %s" % + (binary_cmake_target, proto_gen_target, pb_h_path, + binary_cmake_target, recursive_cmake_deps) + ) + if proto_dep_errors: + for error_msg in proto_dep_errors: + logging.error("Protobuf dependency error: %s", error_msg) + raise RuntimeError( + "Found targets that use protobuf-generated header files but do not declare the " + "dependency explicitly. See the log messages above.") diff --git a/python/yb/dependency_graph.py b/python/yb/dependency_graph.py index 0c5c087e9ada..f048a0f31c9f 100755 --- a/python/yb/dependency_graph.py +++ b/python/yb/dependency_graph.py @@ -1,5 +1,18 @@ #!/usr/bin/env python3 +# Copyright (c) Yugabyte, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except +# in compliance with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software distributed under the License +# is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express +# or implied. See the License for the specific language governing permissions and limitations +# under the License. + + """ Build a dependency graph of sources, object files, libraries, and binaries. Compute the set of tests that might be affected by changes to the given set of source files. @@ -33,585 +46,61 @@ is_ninja_build_root, shlex_join, read_file, + arg_str_to_bool, ) from yb.command_util import mkdir_p +from yb.source_files import ( + SourceFileCategory, + get_file_category, + CATEGORIES_NOT_CAUSING_RERUN_OF_ALL_TESTS, +) +from yb.dep_graph_common import ( + DepGraphConf, + NodeType, + Node, + DependencyGraph, + CMakeDepGraph, + is_object_file, + is_shared_library, + DYLIB_SUFFIX, +) from yugabyte_pycommon import WorkDirContext # type: ignore +from yb.lto import link_whole_program -def make_extensions(exts_without_dot: List[str]) -> List[str]: - for ext in exts_without_dot: - assert not ext.startswith('.') - - return ['.' + ext for ext in exts_without_dot] - - -def ends_with_any_of(path: str, exts: List[str]) -> bool: - for ext in exts: - if path.endswith(ext): - return True - return False - - -def get_relative_path_or_none(abs_path: str, relative_to: str) -> Optional[str]: - """ - If the given path starts with another directory path, return the relative path, or else None. - """ - if not relative_to.endswith('/'): - relative_to += '/' - if abs_path.startswith(relative_to): - return abs_path[len(relative_to):] - return None - - -SOURCE_FILE_EXTENSIONS = make_extensions(['c', 'cc', 'cpp', 'cxx', 'h', 'hpp', 'hxx', 'proto', - 'l', 'y']) -LIBRARY_FILE_EXTENSIONS_NO_DOT = ['so', 'dylib'] -LIBRARY_FILE_EXTENSIONS = make_extensions(LIBRARY_FILE_EXTENSIONS_NO_DOT) -TEST_FILE_SUFFIXES = ['_test', '-test', '_itest', '-itest'] -LIBRARY_FILE_NAME_RE = re.compile(r'^lib(.*)[.](?:%s)$' % '|'.join( - LIBRARY_FILE_EXTENSIONS_NO_DOT)) -PROTO_LIBRARY_FILE_NAME_RE = re.compile(r'^lib(.*)_proto[.](?:%s)$' % '|'.join( - LIBRARY_FILE_EXTENSIONS_NO_DOT)) -EXECUTABLE_FILE_NAME_RE = re.compile(r'^[a-zA-Z0-9_.-]+$') -PROTO_OUTPUT_FILE_NAME_RE = re.compile(r'^([a-zA-Z_0-9-]+)[.]pb[.](h|cc)$') - -# Ignore some special-case CMake targets that do not have a one-to-one match with executables or -# libraries. -IGNORED_CMAKE_TARGETS = ['gen_version_info', 'latest_symlink', 'postgres'] - -LIST_DEPS_CMD = 'deps' -LIST_REVERSE_DEPS_CMD = 'rev-deps' -LIST_AFFECTED_CMD = 'affected' -SELF_TEST_CMD = 'self-test' -DEBUG_DUMP_CMD = 'debug-dump' - -COMMANDS = [ - LIST_DEPS_CMD, - LIST_REVERSE_DEPS_CMD, - LIST_AFFECTED_CMD, - SELF_TEST_CMD, - DEBUG_DUMP_CMD, -] - -COMMANDS_NOT_NEEDING_TARGET_SET = [SELF_TEST_CMD, DEBUG_DUMP_CMD] - -HOME_DIR = os.path.realpath(os.path.expanduser('~')) - -# As of August 2019, there is nothing in the "bin", "managed" and "www" directories that -# is being used by tests. -# If that changes, this needs to be updated. Note that the "bin" directory here is the -# yugabyte/bin directory in the source tree, not the "bin" directory under the build -# directory, so it only has scripts and not yb-master / yb-tserver binaries. -DIRECTORIES_THAT_DO_NOT_AFFECT_TESTS = [ - 'architecture', - 'bin', - 'cloud', - 'community', - 'docs', - 'managed', - 'sample', - 'www', -] -CATEGORY_DOES_NOT_AFFECT_TESTS = 'does_not_affect_tests' - -# File changes in any category other than these will cause all tests to be re-run. Even though -# changes to Python code affect the test framework, we can consider this Python code to be -# reasonably tested already, by running doctests, this script, the packaging script, etc. We can -# remove "python" from this whitelist in the future. -CATEGORIES_NOT_CAUSING_RERUN_OF_ALL_TESTS = set( - ['java', 'c++', 'python', CATEGORY_DOES_NOT_AFFECT_TESTS]) - -DYLIB_SUFFIX = '.dylib' if platform.system() == 'Darwin' else '.so' - - -class NodeType(Enum): - # A special value used to match any node type. - ANY = 'any' - - SOURCE = 'source' - LIBRARY = 'library' - TEST = 'test' - OBJECT = 'object' - EXECUTABLE = 'executable' - OTHER = 'other' +class Command(Enum): + DEPS = 'deps' + REVERSE_DEPS = 'rev-deps' + AFFECTED = 'affected' + SELF_TEST = 'self-test' + DEBUG_DUMP = 'debug-dump' + LINK_WHOLE_PROGRAM = 'link-whole-program' def __str__(self) -> str: return self.value def __lt__(self, other: Any) -> bool: - assert other.isinstance(NodeType) + assert isinstance(other, Command) return self.value < other.value def __eq__(self, other: Any) -> bool: - assert isinstance(other, NodeType) + assert isinstance(other, Command) return self.value == other.value -def is_object_file(path: str) -> bool: - return path.endswith('.o') - - -def is_shared_library(path: str) -> bool: - return ( - ends_with_any_of(path, LIBRARY_FILE_EXTENSIONS) and - not os.path.basename(path) in LIBRARY_FILE_EXTENSIONS and - not path.startswith('-')) - - -K = TypeVar('K') -V = TypeVar('V') - - -def append_to_list_in_dict(dest: Dict[K, List[V]], key: K, new_item: V) -> None: - if key in dest: - dest[key].append(new_item) - else: - dest[key] = [new_item] - - -def get_node_type_by_path(path: str) -> NodeType: - """ - >>> get_node_type_by_path('my_source_file.cc').value - 'source' - >>> get_node_type_by_path('my_library.so').value - 'library' - >>> get_node_type_by_path('/bin/bash').value - 'executable' - >>> get_node_type_by_path('my_object_file.o').value - 'object' - >>> get_node_type_by_path('tests-integration/some_file').value - 'test' - >>> get_node_type_by_path('tests-integration/some_file.txt').value - 'other' - >>> get_node_type_by_path('something/my-test').value - 'test' - >>> get_node_type_by_path('something/my_test').value - 'test' - >>> get_node_type_by_path('something/my-itest').value - 'test' - >>> get_node_type_by_path('something/my_itest').value - 'test' - >>> get_node_type_by_path('some-dir/some_file').value - 'other' - """ - if ends_with_any_of(path, SOURCE_FILE_EXTENSIONS): - return NodeType.SOURCE - - if (ends_with_any_of(path, LIBRARY_FILE_EXTENSIONS) or - any([ext + '.' in path for ext in LIBRARY_FILE_EXTENSIONS])): - return NodeType.LIBRARY - - if (ends_with_any_of(path, TEST_FILE_SUFFIXES) or - (os.path.basename(os.path.dirname(path)).startswith('tests-') and - '.' not in os.path.basename(path))): - return NodeType.TEST - - if is_object_file(path): - return NodeType.OBJECT - - if os.path.exists(path) and os.access(path, os.X_OK): - # This will only work if the code has been fully built. - return NodeType.EXECUTABLE - - return NodeType.OTHER - - -class Node: - """ - A node in the dependency graph. Could be a source file, a header file, an object file, a - dynamic library, or an executable. - """ - - path: str - deps: Set['Node'] - reverse_deps: Set['Node'] - node_type: NodeType - dep_graph: 'DependencyGraph' - conf: 'Configuration' - source_str: Optional[str] - is_proto_lib: bool - link_cmd: Optional[List[str]] - _cached_proto_lib_deps: Optional[List['Node']] - _cached_containing_binaries: Optional[List['Node']] - _cached_cmake_target: Optional[str] - - def __init__(self, path: str, dep_graph: 'DependencyGraph', source_str: Optional[str]) -> None: - path = os.path.realpath(path) - self.path = path - - # Other nodes that this node depends on. - self.deps = set() - - # Nodes that depend on this node. - self.reverse_deps = set() - - self.node_type = get_node_type_by_path(path) - self.dep_graph = dep_graph - self.conf = dep_graph.conf - self.source_str = source_str - - self.is_proto_lib = bool( - self.node_type == NodeType.LIBRARY and - PROTO_LIBRARY_FILE_NAME_RE.match(os.path.basename(self.path))) - - self._cached_proto_lib_deps = None - self._cached_containing_binaries = None - self._cached_cmake_target = None - self._has_no_cmake_target = False - - self.link_cmd = None - - def add_dependency(self, dep: 'Node') -> None: - assert self is not dep - self.deps.add(dep) - dep.reverse_deps.add(self) - - def __eq__(self, other: Any) -> bool: - if not isinstance(other, Node): - return False - - return self.path == other.path - - def __hash__(self) -> int: - return hash(self.path) - - def is_source(self) -> bool: - return self.node_type == NodeType.SOURCE - - def validate_existence(self) -> None: - if not os.path.exists(self.path) and not self.dep_graph.conf.incomplete_build: - raise RuntimeError( - "Path does not exist: '{}'. This node was found in: {}".format( - self.path, self.source_str)) - - def get_pretty_path(self) -> str: - for prefix, alias in [(self.conf.build_root, '$BUILD_ROOT'), - (self.conf.yb_src_root, '$YB_SRC_ROOT'), - (HOME_DIR, '~')]: - if self.path.startswith(prefix + '/'): - return alias + '/' + self.path[len(prefix) + 1:] - - return self.path - - def path_rel_to_build_root(self) -> Optional[str]: - return get_relative_path_or_none(self.path, self.conf.build_root) - - def path_rel_to_src_root(self) -> Optional[str]: - return get_relative_path_or_none(self.path, self.conf.yb_src_root) - - def __str__(self) -> str: - return "Node(\"{}\", type={}, {} deps, {} rev deps)".format( - self.get_pretty_path(), self.node_type, len(self.deps), len(self.reverse_deps)) - - def __repr__(self) -> str: - return self.__str__() - - def get_cmake_target(self) -> Optional[str]: - """ - @return the CMake target based on the current node's path. E.g. this would be "master" - for the "libmaster.so" library, "yb-master" for the "yb-master" executable. - """ - if self._cached_cmake_target: - return self._cached_cmake_target - if self._has_no_cmake_target: - return None - - if self.path.endswith('.proto'): - path = self.path - names = [] - while path != '/' and path != self.conf.yb_src_root: - names.append(os.path.basename(path)) - path = os.path.dirname(path) - - # This needs to be consistent with the following CMake code snippet: - # - # set(TGT_NAME "gen_${PROTO_REL_TO_YB_SRC_ROOT}") - # string(REPLACE "@" "_" TGT_NAME ${TGT_NAME}) - # string(REPLACE "." "_" TGT_NAME ${TGT_NAME}) - # string(REPLACE "-" "_" TGT_NAME ${TGT_NAME}) - # string(REPLACE "/" "_" TGT_NAME ${TGT_NAME}) - # - # (see FindProtobuf.cmake and FindYRPC.cmake). - # - # "/" cannot appear in the resulting string, so no need to replace it with "_". - target = re.sub('[@.-]', '_', '_'.join(['gen'] + names[::-1])) - - if self.conf.verbose: - logging.info("Associating protobuf file '{}' with CMake target '{}'".format( - self.path, target)) - self._cached_cmake_target = target - return target - - basename = os.path.basename(self.path) - m = LIBRARY_FILE_NAME_RE.match(basename) - if m: - target = m.group(1) - self._cached_cmake_target = target - return target - m = EXECUTABLE_FILE_NAME_RE.match(basename) - if m: - self._cached_cmake_target = basename - return basename - self._has_no_cmake_target = True - return None - - def get_containing_binaries(self) -> List['Node']: - """ - Returns nodes (usually one node) corresponding to executables or dynamic libraries that the - given object file is compiled into. - """ - if self.path.endswith('.cc'): - cc_o_rev_deps = [rev_dep for rev_dep in self.reverse_deps - if rev_dep.path.endswith('.cc.o')] - if len(cc_o_rev_deps) != 1: - raise RuntimeError( - "Could not identify exactly one object file reverse dependency of node " - "%s. All set of reverse dependencies: %s" % (self, self.reverse_deps)) - return cc_o_rev_deps[0].get_containing_binaries() - - if self.node_type != NodeType.OBJECT: - raise ValueError( - "get_containing_binaries can only be called on nodes of type 'object'. Node: " + - str(self)) - - if self._cached_containing_binaries: - return self._cached_containing_binaries - - binaries = [] - for rev_dep in self.reverse_deps: - if rev_dep.node_type in [NodeType.LIBRARY, NodeType.TEST, NodeType.EXECUTABLE]: - binaries.append(rev_dep) - if len(binaries) > 1: - logging.warning( - "Node %s is linked into multiple libraries: %s. Might be worth checking.", - self, binaries) - - self._cached_containing_binaries = binaries - return binaries - - def get_recursive_deps( - self, - skip_node_types: Set[NodeType] = set()) -> Set['Node']: - """ - Returns a set of all dependencies that this node depends on. - skip_node_types specifies the set of categories of nodes, except the initial node, to stop - the recursive search at. - """ - - recursive_deps: Set[Node] = set() - visited: Set[Node] = set() - - def walk(node: Node, is_initial: bool) -> None: - if not is_initial: - # Only skip the given subset of node types for nodes other than the initial node. - if node.node_type in skip_node_types: - return - # Also, never add the initial node to the result. - recursive_deps.add(node) - for dep in node.deps: - if dep not in recursive_deps: - walk(dep, is_initial=False) - - walk(self, is_initial=True) - return recursive_deps - - def get_proto_lib_deps(self) -> List['Node']: - if self._cached_proto_lib_deps is None: - self._cached_proto_lib_deps = [ - dep for dep in self.get_recursive_deps() if dep.is_proto_lib - ] - return self._cached_proto_lib_deps - - def get_containing_proto_lib(self) -> Optional['Node']: - """ - For a .pb.cc file node, return the node corresponding to the containing protobuf library. - """ - if not self.path.endswith('.pb.cc.o'): - return None - - containing_binaries: List[Node] = self.get_containing_binaries() - - if len(containing_binaries) != 1: - logging.info("Reverse deps:\n %s" % ("\n ".join( - [str(dep) for dep in self.reverse_deps]))) - raise RuntimeError( - "Invalid number of proto libs for %s. Expected 1 but found %d: %s" % ( - self, - len(containing_binaries), - containing_binaries)) - return containing_binaries[0] - - def get_proto_gen_cmake_target(self) -> Optional[str]: - """ - For .pb.{h,cc} nodes this will return a CMake target of the form - gen_..., e.g. gen_src_yb_common_wire_protocol. - """ - rel_path = self.path_rel_to_build_root() - if not rel_path: - return None - - basename = os.path.basename(rel_path) - match = PROTO_OUTPUT_FILE_NAME_RE.match(basename) - if not match: - return None - return '_'.join( - ['gen'] + - os.path.dirname(rel_path).split('/') + - [match.group(1), 'proto'] - ) - - def set_link_command(self, link_cmd: List[str]) -> None: - assert self.link_cmd is None - self.link_cmd = link_cmd - - def as_json(self) -> Dict[str, Any]: - d: Dict[str, Any] = dict( - path=self.path - ) - if self.link_cmd: - d['link_cmd'] = self.link_cmd - return d - - def update_from_json(self, json_data: Dict[str, Any]) -> None: - if 'link_cmd' in json_data: - self.link_cmd = json_data['link_cmd'] +COMMANDS_NOT_NEEDING_TARGET_SET = [Command.SELF_TEST, Command.DEBUG_DUMP] def set_to_str(items: Set[Any]) -> str: return ",\n".join(sorted(list(items))) -def is_abs_path(path: str) -> bool: - return path.startswith('/') - - -class Configuration: - args: argparse.Namespace - verbose: bool - build_root: str - is_ninja: bool - build_type: str - yb_src_root: str - src_dir_path: str - ent_src_dir_path: str - rel_path_base_dirs: Set[str] - incomplete_build: bool - file_regex: str - src_dir_paths: List[str] - - def __init__(self, args: argparse.Namespace) -> None: - self.args = args - self.verbose = args.verbose - self.build_root = os.path.realpath(args.build_root) - self.is_ninja = is_ninja_build_root(self.build_root) - - self.build_type = get_build_type_from_build_root(self.build_root) - self.yb_src_root = ensure_yb_src_root_from_build_root(self.build_root) - self.src_dir_path = os.path.join(self.yb_src_root, 'src') - self.ent_src_dir_path = os.path.join(self.yb_src_root, 'ent', 'src') - self.rel_path_base_dirs = set([self.build_root, os.path.join(self.src_dir_path, 'yb')]) - self.incomplete_build = args.incomplete_build - - self.file_regex = args.file_regex - if not self.file_regex and args.file_name_glob: - self.file_regex = fnmatch.translate('*/' + args.file_name_glob) - - self.src_dir_paths = [self.src_dir_path, self.ent_src_dir_path] - - for dir_path in self.src_dir_paths: - if not os.path.isdir(dir_path): - raise RuntimeError("Directory does not exist, or is not a directory: %s" % dir_path) - - -class CMakeDepGraph: - """ - A light-weight class representing the dependency graph of CMake targets imported from the - yb_cmake_deps.txt file that we generate in our top-level CMakeLists.txt. This dependency graph - does not have any nodes for source files and object files. - """ - - build_root: str - cmake_targets: Set[str] - cmake_deps_path: str - cmake_deps: Dict[str, Set[str]] - - def __init__(self, build_root: str) -> None: - self.build_root = build_root - self.cmake_deps_path = os.path.join(self.build_root, 'yb_cmake_deps.txt') - self._load() - - def _load(self) -> None: - logging.info("Loading dependencies between CMake targets from '{}'".format( - self.cmake_deps_path)) - self.cmake_deps = {} - self.cmake_targets = set() - with open(self.cmake_deps_path) as cmake_deps_file: - for line in cmake_deps_file: - line = line.strip() - if not line: - continue - items = [item.strip() for item in line.split(':')] - if len(items) != 2: - raise RuntimeError( - "Expected to find two items when splitting line on ':', found {}:\n{}", - len(items), line) - lhs, rhs = items - if lhs in IGNORED_CMAKE_TARGETS: - continue - cmake_dep_set = self._get_cmake_dep_set_of(lhs) - - for cmake_dep in rhs.split(';'): - if cmake_dep in IGNORED_CMAKE_TARGETS: - continue - cmake_dep_set.add(cmake_dep) - - for cmake_target, cmake_target_deps in self.cmake_deps.items(): - adding_targets = [cmake_target] + list(cmake_target_deps) - self.cmake_targets.update(set(adding_targets)) - logging.info("Found {} CMake targets in '{}'".format( - len(self.cmake_targets), self.cmake_deps_path)) - - def _get_cmake_dep_set_of(self, target: str) -> Set[str]: - """ - Get CMake dependencies of the given target. What is returned is a mutable set. Modifying - this set modifies this CMake dependency graph. - """ - deps = self.cmake_deps.get(target) - if deps is None: - deps = set() - self.cmake_deps[target] = deps - self.cmake_targets.add(target) - return deps - - def add_dependency(self, from_target: str, to_target: str) -> None: - if from_target in IGNORED_CMAKE_TARGETS or to_target in IGNORED_CMAKE_TARGETS: - return - self._get_cmake_dep_set_of(from_target).add(to_target) - self.cmake_targets.add(to_target) - - def get_recursive_cmake_deps(self, cmake_target: str) -> Set[str]: - result: Set[str] = set() - visited: Set[str] = set() - - def walk(cur_target: str, add_this_target: bool = True) -> None: - if cur_target in visited: - return - visited.add(cur_target) - if add_this_target: - result.add(cur_target) - for dep in self.cmake_deps.get(cur_target, []): - walk(dep) - - walk(cmake_target, add_this_target=False) - return result - - class DependencyGraphBuilder: """ Builds a dependency graph from the contents of the build directory. Each node of the graph is a file (an executable, a dynamic library, or a source file). """ - conf: Configuration + conf: DepGraphConf compile_dirs: Set[str] compile_commands: Dict[str, Any] useful_base_dirs: Set[str] @@ -620,7 +109,7 @@ class DependencyGraphBuilder: dep_graph: 'DependencyGraph' cmake_dep_graph: CMakeDepGraph - def __init__(self, conf: Configuration) -> None: + def __init__(self, conf: DepGraphConf) -> None: self.conf = conf self.compile_dirs = set() self.useful_base_dirs = set() @@ -715,7 +204,7 @@ def match_cmake_targets_with_files(self) -> None: self.cmake_target_to_node[cmake_target] = list(nodes)[0] if unmatched_cmake_targets: - logging.warn( + logging.warning( "These CMake targets do not have any associated files: %s", sorted(unmatched_cmake_targets)) @@ -731,7 +220,7 @@ def match_cmake_targets_with_files(self) -> None: node.add_dependency(self.cmake_target_to_node[cmake_target_dep]) def resolve_rel_path(self, rel_path: str) -> Optional[str]: - if is_abs_path(rel_path): + if os.path.isabs(rel_path): return rel_path if rel_path in self.unresolvable_rel_paths: @@ -760,7 +249,7 @@ def resolve_rel_path(self, rel_path: str) -> Optional[str]: return resolved def resolve_dependent_rel_path(self, rel_path: str) -> str: - if is_abs_path(rel_path): + if os.path.isabs(rel_path): return rel_path if is_object_file(rel_path): return os.path.join(self.conf.build_root, rel_path) @@ -888,7 +377,7 @@ def parse_link_command(self, link_command: str, link_txt_path: str) -> None: return raise RuntimeError("Could not find output path for link command: %s" % link_command) - if not is_abs_path(output_path): + if not os.path.isabs(output_path): output_path = os.path.abspath(os.path.join(base_dir, output_path)) output_node = self.dep_graph.find_or_create_node(output_path, source_str=link_txt_path) output_node.validate_existence() @@ -904,19 +393,31 @@ def parse_link_command(self, link_command: str, link_txt_path: str) -> None: def build(self) -> 'DependencyGraph': compile_commands_path = os.path.join(self.conf.build_root, 'compile_commands.json') cmake_deps_path = os.path.join(self.conf.build_root, 'yb_cmake_deps.txt') - if not os.path.exists(compile_commands_path) or not os.path.exists(cmake_deps_path): - + run_build = False + for file_path in [compile_commands_path, cmake_deps_path]: + if not os.path.exists(file_path): + logging.info("File %s does not exist, will re-run the build.", file_path) + run_build = True + if run_build: # This is mostly useful during testing. We don't want to generate the list of compile # commands by default because it takes a while, so only generate it on demand. os.environ['YB_EXPORT_COMPILE_COMMANDS'] = '1' mkdir_p(self.conf.build_root) - subprocess.check_call( - [os.path.join(self.conf.yb_src_root, 'yb_build.sh'), - self.conf.build_type, - '--cmake-only', - '--no-rebuild-thirdparty', - '--build-root', self.conf.build_root]) + build_cmd = [ + os.path.join(self.conf.yb_src_root, 'yb_build.sh'), + self.conf.build_type, + '--cmake-only', + '--no-rebuild-thirdparty', + '--build-root', self.conf.build_root + ] + if self.conf.build_args: + # This is not ideal because it does not give a way to specify arguments with + # embedded spaces but is OK for our needs here. + logging.info("Running the build: %s", shlex_join(build_cmd)) + build_cmd.extend(self.conf.build_args.strip().split()) + + subprocess.check_call(build_cmd) logging.info("Loading compile commands from '{}'".format(compile_commands_path)) with open(compile_commands_path) as commands_file: @@ -940,362 +441,6 @@ def build(self) -> 'DependencyGraph': return self.dep_graph -class DependencyGraph: - - conf: Configuration - node_by_path: Dict[str, Node] - nodes_by_basename: Optional[Dict[str, Set[Node]]] - cmake_dep_graph: Optional[CMakeDepGraph] - - canonicalization_cache: Dict[str, str] = {} - - def __init__(self, - conf: Configuration, - json_data: Optional[List[Dict[str, Any]]] = None) -> None: - """ - @param json_data optional results of JSON parsing - """ - self.conf = conf - self.node_by_path = {} - if json_data: - self.init_from_json(json_data) - self.nodes_by_basename = None - self.cmake_dep_graph = None - - def get_cmake_dep_graph(self) -> CMakeDepGraph: - if self.cmake_dep_graph: - return self.cmake_dep_graph - - cmake_dep_graph = CMakeDepGraph(self.conf.build_root) - for node in self.get_nodes(): - proto_lib = node.get_containing_proto_lib() - if proto_lib and self.conf.verbose: - logging.info("node: %s, proto lib: %s", node, proto_lib) - - self.cmake_dep_graph = cmake_dep_graph - return cmake_dep_graph - - def find_node(self, - path: str, - must_exist: bool = True, - source_str: Optional[str] = None) -> Node: - assert source_str is None or not must_exist - path = os.path.abspath(path) - node = self.node_by_path.get(path) - if node: - return node - if must_exist: - raise RuntimeError( - ("Node not found by path: '{}' (expected to already have this node in our " - "graph, not adding).").format(path)) - node = Node(path, self, source_str) - self.node_by_path[path] = node - return node - - def find_or_create_node(self, path: str, source_str: Optional[str] = None) -> Node: - """ - Finds a node with the given path or creates it if it does not exist. - @param source_str a string description of how we came up with this node's path - """ - - canonical_path = self.canonicalization_cache.get(path) - if not canonical_path: - canonical_path = os.path.realpath(path) - if canonical_path.startswith(self.conf.build_root + '/'): - canonical_path = self.conf.build_root + '/' + \ - canonical_path[len(self.conf.build_root) + 1:] - self.canonicalization_cache[path] = canonical_path - - return self.find_node(canonical_path, must_exist=False, source_str=source_str) - - def create_node_from_json(self, node_json_data: Dict[str, Any]) -> Node: - node = self.find_or_create_node(node_json_data['path']) - node.update_from_json(node_json_data) - return node - - def init_from_json(self, json_nodes: List[Dict[str, Any]]) -> None: - id_to_node = {} - id_to_dep_ids = {} - for node_json in json_nodes: - node_id = node_json['id'] - id_to_node[node_id] = self.create_node_from_json(node_json) - id_to_dep_ids[node_id] = node_json.get('deps') or [] - for node_id, dep_ids in id_to_dep_ids.items(): - node = id_to_node[node_id] - for dep_id in dep_ids: - node.add_dependency(id_to_node[dep_id]) - - def find_nodes_by_regex(self, regex_str: str) -> List[Node]: - filter_re = re.compile(regex_str) - return [node for node in self.get_nodes() if filter_re.match(node.path)] - - def find_nodes_by_basename(self, basename: str) -> Optional[Set[Node]]: - if not self.nodes_by_basename: - # We are lazily initializing the basename -> node map, and any changes to the graph - # after this point will not get reflected in it. This is OK as we're only using this - # function after the graph has been built. - self.nodes_by_basename = { - k: set(v) - for k, v in group_by( - self.get_nodes(), - lambda node: os.path.basename(node.path)).items() - } - assert self.nodes_by_basename is not None - return self.nodes_by_basename.get(basename) - - def find_affected_nodes( - self, - start_nodes: Set[Node], - requested_node_type: NodeType = NodeType.ANY) -> Set[Node]: - if self.conf.verbose: - logging.info("Starting with the following initial nodes:") - for node in start_nodes: - logging.info(" {}".format(node)) - - results = set() - visited = set() - - def dfs(node: Node) -> None: - if ((requested_node_type == NodeType.ANY or - node.node_type == requested_node_type) and node not in start_nodes): - results.add(node) - if node in visited: - return - visited.add(node) - for dep in node.reverse_deps: - dfs(dep) - - for node in start_nodes: - dfs(node) - - return results - - def affected_basenames_by_basename_for_test( - self, basename: str, node_type: NodeType = NodeType.ANY) -> Set[str]: - nodes_for_basename = self.find_nodes_by_basename(basename) - if not nodes_for_basename: - self.dump_debug_info() - raise RuntimeError( - "No nodes found for file name '{}' (total number of nodes: {})".format( - basename, len(self.node_by_path))) - return set([os.path.basename(node.path) - for node in self.find_affected_nodes(nodes_for_basename, node_type)]) - - def save_as_json(self, output_path: str) -> None: - """ - Converts the dependency graph into a JSON representation, where every node is given an id, - so that dependencies are represented concisely. - """ - with open(output_path, 'w') as output_file: - next_node_id = [1] # Use a mutable object so we can modify it from closure. - path_to_id: Dict[str, int] = {} - output_file.write("[") - - def get_node_id(node: Node) -> int: - node_id = path_to_id.get(node.path) - if not node_id: - node_id = next_node_id[0] - path_to_id[node.path] = node_id - next_node_id[0] = node_id + 1 - return node_id - - is_first = True - for node_path, node in self.node_by_path.items(): - node_json = node.as_json() - dep_ids = [get_node_id(dep) for dep in node.deps] - node_json.update(id=get_node_id(node)) - if dep_ids: - node_json.update(deps=dep_ids) - if not is_first: - output_file.write(",\n") - is_first = False - output_file.write(json.dumps(node_json)) - output_file.write("\n]\n") - - logging.info("Saved dependency graph to '{}'".format(output_path)) - - def validate_node_existence(self) -> None: - logging.info("Validating existence of build artifacts") - for node in self.get_nodes(): - node.validate_existence() - - def get_nodes(self) -> Iterable[Node]: - return self.node_by_path.values() - - def dump_debug_info(self) -> None: - logging.info("Dumping all graph nodes for debugging ({} nodes):".format( - len(self.node_by_path))) - for node in sorted(self.get_nodes(), key=lambda node: str(node)): - logging.info(node) - - def _add_proto_generation_deps(self) -> None: - """ - Add dependencies of .pb.{h,cc} files on the corresponding .proto file. We do that by - finding .proto and .pb.{h,cc} nodes in the graph independently and matching them - based on their path relative to the source root. - - Additionally, we are inferring dependencies between binaries (protobuf libs or in some - cases other libraries or even tests) that use a .pb.cc file on the CMake target that - generates these files (e.g. gen_src_yb_rocksdb_db_version_edit_proto). We add these - inferred dependencies to the separate CMake dependency graph. - """ - proto_node_by_rel_path: Dict[str, Node] = {} - pb_h_cc_nodes_by_rel_path: Dict[str, List[Node]] = {} - - cmake_dep_graph = self.get_cmake_dep_graph() - assert cmake_dep_graph is not None - - for node in self.get_nodes(): - basename = os.path.basename(node.path) - if node.path.endswith('.proto'): - proto_rel_path = node.path_rel_to_src_root() - if proto_rel_path: - proto_rel_path = proto_rel_path[:-6] - if proto_rel_path.startswith('ent/'): - # Remove the 'ent/' prefix because there is no separate 'ent/' prefix - # in the build directory. - proto_rel_path = proto_rel_path[4:] - if proto_rel_path in proto_node_by_rel_path: - raise RuntimeError( - "Multiple .proto nodes found that share the same " - "relative path to source root: %s and %s" % ( - proto_node_by_rel_path[proto_rel_path], node - )) - - proto_node_by_rel_path[proto_rel_path] = node - else: - match = PROTO_OUTPUT_FILE_NAME_RE.match(basename) - if match: - proto_output_rel_path = node.path_rel_to_build_root() - if proto_output_rel_path: - append_to_list_in_dict( - pb_h_cc_nodes_by_rel_path, - os.path.join( - os.path.dirname(proto_output_rel_path), - match.group(1) - ), - node) - if node.path.endswith('.pb.cc'): - proto_gen_cmake_target = node.get_proto_gen_cmake_target() - for containing_binary in node.get_containing_binaries(): - containing_cmake_target = containing_binary.get_cmake_target() - assert containing_cmake_target is not None - proto_gen_cmake_target = node.get_proto_gen_cmake_target() - assert proto_gen_cmake_target is not None - cmake_dep_graph.add_dependency( - containing_cmake_target, - proto_gen_cmake_target - ) - - for rel_path in proto_node_by_rel_path: - if rel_path not in pb_h_cc_nodes_by_rel_path: - raise ValueError( - "For relative path %s, found a proto file (%s) but no .pb.{h,cc} files" % - (rel_path, proto_node_by_rel_path[rel_path])) - - for rel_path in pb_h_cc_nodes_by_rel_path: - if rel_path not in proto_node_by_rel_path: - raise ValueError( - "For relative path %s, found .pb.{h,cc} files (%s) but no .proto" % - (rel_path, pb_h_cc_nodes_by_rel_path[rel_path])) - - # This is what we've verified above in two directions separately. - assert(set(proto_node_by_rel_path.keys()) == set(pb_h_cc_nodes_by_rel_path.keys())) - - for rel_path in proto_node_by_rel_path.keys(): - proto_node = proto_node_by_rel_path[rel_path] - # .pb.{h,cc} files need to depend on the .proto file they were generated from. - for pb_h_cc_node in pb_h_cc_nodes_by_rel_path[rel_path]: - pb_h_cc_node.add_dependency(proto_node) - - def _check_for_circular_dependencies(self) -> None: - logging.info("Checking for circular dependencies") - visited = set() - stack = [] - - def walk(node: Node) -> None: - if node in visited: - return - try: - stack.append(node) - visited.add(node) - for dep in node.deps: - if dep in visited: - if dep in stack: - raise RuntimeError("Circular dependency loop found: %s", stack) - return - walk(dep) # type: ignore - finally: - stack.pop() - - for node in self.get_nodes(): - walk(node) # type: ignore - - logging.info("No circular dependencies found -- this is good (visited %d nodes)", - len(visited)) - - def validate_proto_deps(self) -> None: - """ - Make sure that code that depends on protobuf-generated files also depends on the - corresponding protobuf library target. - """ - - logging.info("Validating dependencies on protobuf-generated headers") - - # TODO: only do this during graph generation. - self._add_proto_generation_deps() - self._check_for_circular_dependencies() - - # For any .pb.h file, we want to find all .cc.o files that depend on it, meaning that they - # directly or indirectly include that .pb.h file. - # - # Then we want to look at the containing executable or library of the .cc.o file and make - # sure it has a CMake dependency (direct or indirect) on the protobuf generation target of - # the form gen_src_yb_common_redis_protocol_proto. - # - # However, we also need to get the CMake target name corresponding to the binary - # containing the .cc.o file. - - proto_dep_errors = [] - - for node in self.get_nodes(): - if not node.path.endswith('.pb.cc.o'): - continue - source_deps = [dep for dep in node.deps if dep.path.endswith('.cc')] - if len(source_deps) != 1: - raise ValueError( - "Could not identify a single source dependency of node %s. Found: %s. " % - (node, source_deps)) - source_dep = source_deps[0] - pb_h_path = source_dep.path[:-3] + '.h' - pb_h_node = self.node_by_path[pb_h_path] - proto_gen_target = pb_h_node.get_proto_gen_cmake_target() - - for rev_dep in pb_h_node.reverse_deps: - if rev_dep.path.endswith('.cc.o'): - containing_binaries: Optional[List[Node]] = rev_dep.get_containing_binaries() - assert containing_binaries is not None - for binary in containing_binaries: - binary_cmake_target: Optional[str] = binary.get_cmake_target() - assert binary_cmake_target is not None - - recursive_cmake_deps = self.get_cmake_dep_graph().get_recursive_cmake_deps( - binary_cmake_target) - if proto_gen_target not in recursive_cmake_deps: - proto_dep_errors.append( - "CMake target %s does not depend directly or indirectly on target " - "%s but uses the header file %s. Recursive cmake deps of %s: %s" % - (binary_cmake_target, proto_gen_target, pb_h_path, - binary_cmake_target, recursive_cmake_deps) - ) - if proto_dep_errors: - for error_msg in proto_dep_errors: - logging.error("Protobuf dependency error: %s", error_msg) - raise RuntimeError( - "Found targets that use protobuf-generated header files but do not declare the " - "dependency explicitly. See the log messages above.") - - class DependencyGraphTest(unittest.TestCase): dep_graph: Optional[DependencyGraph] = None @@ -1407,50 +552,6 @@ def run_self_test(dep_graph: DependencyGraph) -> None: sys.exit(1) -def get_file_category(rel_path: str) -> str: - """ - Categorize file changes so that we can decide what tests to run. - - @param rel_path: path relative to the source root (not to the build root) - - >>> get_file_category('src/postgres/src/backend/executor/execScan.c') - 'postgres' - """ - if os.path.isabs(rel_path): - raise IOError("Relative path expected, got an absolute path: %s" % rel_path) - basename = os.path.basename(rel_path) - - if rel_path.split(os.sep)[0] in DIRECTORIES_THAT_DO_NOT_AFFECT_TESTS: - return CATEGORY_DOES_NOT_AFFECT_TESTS - - if rel_path == 'yb_build.sh': - # The main build script is being run anyway, so we hope that most issues will come out at - # that stage. - return CATEGORY_DOES_NOT_AFFECT_TESTS - - if basename == 'CMakeLists.txt' or basename.endswith('.cmake'): - return 'cmake' - - if rel_path.startswith('src/postgres'): - return 'postgres' - - logging.info('rel_path=%s', rel_path) - if rel_path.startswith('src/') or rel_path.startswith('ent/src/'): - return 'c++' - - if rel_path.startswith('python/'): - return 'python' - - # Top-level subdirectories that map one-to-one to categories. - for category_dir in ['java', 'thirdparty']: - if rel_path.startswith(category_dir + '/'): - return category_dir - - if rel_path.startswith('build-support/'): - return 'build_scripts' - return 'other' - - def main() -> None: parser = argparse.ArgumentParser( description='A tool for working with the dependency graph') @@ -1482,7 +583,8 @@ def main() -> None: required=True, help='E.g. /build/debug-gcc-dynamic-community') parser.add_argument('command', - choices=COMMANDS, + type=Command, + choices=list(Command), help='Command to perform') parser.add_argument('--output-test-config', help='Output a "test configuration file", which is a JSON containing the ' @@ -1492,6 +594,23 @@ def main() -> None: action='store_true', help='Skip checking for file existence. Allows using the tool after ' 'build artifacts have been deleted.') + parser.add_argument('--build-args', + help='Extra arguments to pass to yb_build.sh. The build is invoked e.g. ' + 'if the compilation database file is missing.') + parser.add_argument('--link-cmd-out-file', + help='For the %s command, write the linker arguments (one per line ) ' + 'to the given file.') + parser.add_argument('--lto-output-suffix', + default="-lto", + help='The suffix to append to LTO-enabled binaries produced by ' + 'the %s command' % Command.LINK_WHOLE_PROGRAM.value) + parser.add_argument( + '--run-linker', + help='Whether to actually run the linker. Setting this to false might be useful when ' + 'debugging, combined with --link-cmd-out-file.', + type=arg_str_to_bool, + default=True) + args = parser.parse_args() if args.file_regex and args.file_name_glob: @@ -1507,14 +626,20 @@ def main() -> None: raise RuntimeError( "Neither of --file-regex, --file-name-glob, --git-{diff,commit}, or " "--rebuild-graph are specified, and the command is not one of: " + - ", ".join(COMMANDS_NOT_NEEDING_TARGET_SET)) + ", ".join([cmd.value for cmd in COMMANDS_NOT_NEEDING_TARGET_SET])) log_level = logging.INFO logging.basicConfig( level=log_level, format="[%(filename)s:%(lineno)d] %(asctime)s %(levelname)s: %(message)s") - conf = Configuration(args) + conf = DepGraphConf( + verbose=args.verbose, + build_root=args.build_root, + incomplete_build=args.incomplete_build, + file_regex=args.file_regex, + file_name_glob=args.file_name_glob, + build_args=args.build_args) if conf.file_regex and args.git_diff: raise RuntimeError( "--git-diff is incompatible with --file-{regex,name-glob}") @@ -1543,10 +668,10 @@ def main() -> None: # Commands that do not require an "initial set of targets" # --------------------------------------------------------------------------------------------- - if cmd == SELF_TEST_CMD: + if cmd == Command.SELF_TEST: run_self_test(dep_graph) return - elif cmd == DEBUG_DUMP_CMD: + if cmd == Command.DEBUG_DUMP: dep_graph.dump_debug_info() return @@ -1554,7 +679,7 @@ def main() -> None: # Figure out the initial set of targets based on a git commit, a regex, etc. # --------------------------------------------------------------------------------------------- - updated_categories = set() + updated_categories: Set[SourceFileCategory] = set() file_changes = [] initial_nodes: Iterable[Node] if args.git_diff: @@ -1599,21 +724,37 @@ def main() -> None: for file_path in file_changes ] - file_changes_by_category = group_by(file_changes, get_file_category) + if cmd == Command.LINK_WHOLE_PROGRAM: + link_whole_program( + dep_graph=dep_graph, + initial_nodes=initial_nodes, + link_cmd_out_file=args.link_cmd_out_file, + run_linker=args.run_linker, + lto_output_suffix=args.lto_output_suffix) + return + + file_changes_by_category: Dict[SourceFileCategory, List[str]] = group_by( + file_changes, get_file_category) + + # Same as file_changes_by_category, but with string values of categories instead of enum + # elements. + file_changes_by_category_str: Dict[str, List[str]] = {} for category, changes in file_changes_by_category.items(): logging.info("File changes in category '%s':", category) for change in sorted(changes): logging.info(" %s", change) + file_changes_by_category_str[category.value] = changes + updated_categories = set(file_changes_by_category.keys()) results: Set[Node] = set() - if cmd == LIST_AFFECTED_CMD: + if cmd == Command.AFFECTED: results = dep_graph.find_affected_nodes(set(initial_nodes), args.node_type) - elif cmd == LIST_DEPS_CMD: + elif cmd == Command.DEPS: for node in initial_nodes: results.update(node.deps) - elif cmd == LIST_REVERSE_DEPS_CMD: + elif cmd == Command.REVERSE_DEPS: for node in initial_nodes: results.update(node.reverse_deps) else: @@ -1645,8 +786,8 @@ def main() -> None: user_said_all_cpp_tests = get_bool_env_var('YB_RUN_ALL_CPP_TESTS') user_said_all_java_tests = get_bool_env_var('YB_RUN_ALL_JAVA_TESTS') - cpp_files_changed = 'c++' in updated_categories - java_files_changed = 'java' in updated_categories + cpp_files_changed = SourceFileCategory.CPP in updated_categories + java_files_changed = SourceFileCategory.JAVA in updated_categories yb_master_or_tserver_changed = bool(affected_basenames & set(['yb-master', 'yb-tserver'])) run_cpp_tests = select_all_tests_for_now or cpp_files_changed or user_said_all_cpp_tests @@ -1665,7 +806,7 @@ def main() -> None: else: logging.info( "All tests should be run based on file changes in these categories: {}".format( - ', '.join(sorted(unsafe_categories)))) + ', '.join(sorted([category.value for category in unsafe_categories])))) else: if run_cpp_tests: if user_said_all_cpp_tests: @@ -1690,7 +831,7 @@ def main() -> None: test_conf = dict( run_cpp_tests=run_cpp_tests, run_java_tests=run_java_tests, - file_changes_by_category=file_changes_by_category + file_changes_by_category=file_changes_by_category_str ) if test_filter_re: test_conf.update(test_filter_re=test_filter_re) @@ -1714,7 +855,7 @@ def main() -> None: logging.info("Wrote a test configuration to {}".format(args.output_test_config)) else: # For ad-hoc command-line use, mostly for testing and sanity-checking. - for node in sorted(results, key=lambda node: [node.node_type, node.path]): + for node in sorted(results, key=lambda node: [node.node_type.value, node.path]): print(node) logging.info("Found {} results".format(len(results))) diff --git a/python/yb/llvm_urls.py b/python/yb/llvm_urls.py index f4c06f12f21f..dd1224e458d8 100644 --- a/python/yb/llvm_urls.py +++ b/python/yb/llvm_urls.py @@ -61,4 +61,5 @@ def get_llvm_url(compiler_type: str) -> Optional[str]: raise ValueError("Ambiguous LLVM URLs: %s" % candidate_urls) if not candidate_urls: return None - return candidate_urls[0] + candidate_url = candidate_urls[0] + return candidate_url diff --git a/python/yb/lto.py b/python/yb/lto.py new file mode 100644 index 000000000000..03f58850d785 --- /dev/null +++ b/python/yb/lto.py @@ -0,0 +1,386 @@ +# Copyright (c) Yugabyte, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except +# in compliance with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software distributed under the License +# is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express +# or implied. See the License for the specific language governing permissions and limitations +# under the License. + +import os +import time +import subprocess +import logging + +from typing import Iterable, Set, List, Optional, Tuple +from yb.dep_graph_common import ( + DependencyGraph, + Node, + NodeType, + DYLIB_SUFFIX, +) + +from yb.common_util import ( + read_file, + write_file, + shlex_join, +) +from yb.build_paths import BuildPaths + +from yugabyte_pycommon import WorkDirContext # type: ignore + +from pathlib import Path + + +SKIPPED_ARGS: Set[str] = { + '-fPIC', + '-lm', + '-lrt', + '-ldl', + '-latomic', + '-lcrypt', + '-pthread', + '-flto=thin', + '-flto=full', + '-Werror', + '-Wall', +} + + +def get_static_lib_paths(thirdparty_path: str) -> List[str]: + static_lib_paths = [] + for thirdparty_subdir in ['common', 'uninstrumented']: + for root, dirs, files in os.walk( + os.path.join(thirdparty_path, 'installed', thirdparty_subdir)): + for file_name in files: + if file_name.endswith('.a'): + static_lib_path = os.path.join(root, file_name) + static_lib_paths.append(static_lib_path) + if not static_lib_paths: + raise ValueError( + "Did not find any static libraries at third-party path %s" % thirdparty_path) + return static_lib_paths + + +def get_yb_pgbackend_link_cmd(build_root: str) -> Tuple[str, List[str]]: + """ + Returns a tuple containing the Postgres backend build directory and a list representation + of the linker command line for the yb_pgbackend library. + """ + pg_backend_build_dir = os.path.join(build_root, 'postgres_build', 'src', 'backend') + if not os.path.exists(pg_backend_build_dir): + raise IOError("Directory does not exist: %s" % pg_backend_build_dir) + + prefix_str = 'link_cmd_libyb_pgbackend.so.' + matched_files = [ + os.fspath(p) for p in Path(pg_backend_build_dir).glob('%s*' % prefix_str) + ] + if len(matched_files) != 1: + raise ValueError( + "Looking for the build command for the yb_pgbackend library, failed to find exactly " + "one file starting with %s in %s. Got %s" % ( + prefix_str, pg_backend_build_dir, matched_files)) + return pg_backend_build_dir, read_file(matched_files[0]).strip().split() + + +class LinkCommand: + """ + A wrapper around an array to manipulate a linker command line. + """ + args: List[str] + + def __init__(self, initial_args: List[str] = []) -> None: + self.args = list(initial_args) + + def append(self, arg: str) -> None: + self.args.append(arg) + + def extend(self, more_args: List[str]) -> None: + self.args.extend(more_args) + + def add_new_arg(self, new_arg: str) -> None: + should_skip = ( + new_arg in SKIPPED_ARGS or + new_arg.startswith('--gcc-toolchain') + ) + + if should_skip: + logging.info("Skipping %s", new_arg) + return + + self.append(new_arg) + + def contains(self, arg: str) -> bool: + return arg in self.args + + +def process_original_link_cmd(original_link_cmd: List[str]) -> List[str]: + """ + Remove unneeded parts of the original linker command. + """ + if len(original_link_cmd) < 6: + raise ValueError("Original linker command is too short: %s" % shlex_join(original_link_cmd)) + assert original_link_cmd[0] == ':' + assert original_link_cmd[1] == '&&' + assert original_link_cmd[2].endswith('/compiler-wrappers/c++') + assert original_link_cmd[-2] == '&&' + assert original_link_cmd[-1] == ':' + + return original_link_cmd[3:-2] + + +def is_yb_library(rel_path: str) -> bool: + return rel_path.startswith(('lib/', 'postgres/lib/')) and rel_path.endswith(DYLIB_SUFFIX) + + +class LinkHelper: + """ + Helps with creating and running a custom linking command for YugabyteDB outside of the build + system. + """ + dep_graph: DependencyGraph + initial_node: Node + + # Arguments to the linker in the original command that produces the initial_node target. + # Does not include the compiler driver executable. + original_link_args: List[str] + + build_root: str + build_paths: BuildPaths + llvm_path: str + thirdparty_path: str + clang_cpp_path: str + + # Dependency graph nodes corresponding to the object files present in the original linker + # command. Used for deduplication. + obj_file_graph_nodes: Set[Node] + + new_args: LinkCommand + + # Build directory of the Postgres backend. + pg_backend_build_dir: str + + # The command for linking the yb_pgbackend library. + yb_pgbackend_link_cmd: List[str] + + lto_output_suffix: Optional[str] + + # Populated by consume_original_link_cmd. + final_output_name: str + + def __init__( + self, + dep_graph: DependencyGraph, + initial_node: Node, + lto_output_suffix: Optional[str]) -> None: + self.dep_graph = dep_graph + self.initial_node = initial_node + + self.build_root = self.dep_graph.conf.build_root + self.build_paths = BuildPaths(self.build_root) + + self.llvm_path = self.build_paths.get_llvm_path() + self.thirdparty_path = self.build_paths.get_thirdparty_path() + self.clang_cpp_path = self.build_paths.get_llvm_tool_path('clang++') + + assert initial_node.link_cmd + self.original_link_args = process_original_link_cmd(initial_node.link_cmd) + self.static_lib_paths = get_static_lib_paths(self.thirdparty_path) + self.new_args = LinkCommand([self.clang_cpp_path]) + self.obj_file_graph_nodes = set() + self.pg_backend_build_dir, self.yb_pgbackend_link_cmd = get_yb_pgbackend_link_cmd( + self.build_root) + + self.lto_output_suffix = lto_output_suffix + + def convert_to_static_lib(self, arg: str) -> Optional[str]: + """ + Given an argument to the original linker command, try to interpret it as a library, either + specified as a shared library path, or using -l... syntax, and return the corresponding + static library path if available. + """ + if arg.startswith('/') and arg.endswith('.so'): + arg_static_prefix = arg[:-3] + + static_found = False + for suffix in ['.a', '-s.a']: + arg_static = arg_static_prefix + suffix + if os.path.exists(arg_static): + logging.info("Using static library %s instead of shared library %s", + arg_static, arg) + return arg_static + logging.info("Did not find static library corresponding to %s", arg) + + if arg.startswith('-l'): + static_found = False + logging.info("Looking for static lib for: %s", arg) + lib_name = arg[2:] + for static_lib_path in self.static_lib_paths: + static_lib_basename = os.path.basename(static_lib_path) + if (static_lib_basename == 'lib' + lib_name + '.a' or + static_lib_basename == 'lib' + lib_name + '-s.a'): + logging.info("Found static lib for %s: %s", lib_name, static_lib_path) + return static_lib_path + logging.info("Did not find a static lib for %s", lib_name) + + if arg.endswith('.so') or '.so.' in arg: + logging.info("Still using a shared library: %s", arg) + + return None + + def process_arg(self, arg: str) -> None: + if arg in SKIPPED_ARGS: + logging.info("Skipping argument: %s", arg) + return + + new_arg = self.convert_to_static_lib(arg) + if new_arg: + if not self.new_args.contains(new_arg): + self.new_args.add_new_arg(new_arg) + else: + self.new_args.add_new_arg(arg) + + def consume_original_link_cmd(self) -> None: + """ + Goes over the original linker command and reuses some of its arguments for the new command. + """ + with WorkDirContext(self.build_root): + expect_output_name = False + output_name: Optional[str] = None + for arg in self.original_link_args: + if arg == '-o': + expect_output_name = True + continue + if expect_output_name: + if output_name: + raise ValueError( + "Found multiple output names in the original link command: " + "%s and %s" % (output_name, arg)) + output_name = arg + expect_output_name = False + continue + expect_output_name = False + + if is_yb_library(arg): + logging.info("Skipping YB library: %s", arg) + continue + + if arg.endswith('.cc.o'): + # E.g. tablet_server_main.cc.o. + # Remember this node for later deduplication. + self.obj_file_graph_nodes.add(self.dep_graph.find_node(os.path.realpath(arg))) + + self.process_arg(arg) + + if not output_name: + raise ValueError("Did not find an output name in the original link command") + self.final_output_name = os.path.abspath(output_name) + logging.info("Final output file name: %s", self.final_output_name) + if self.lto_output_suffix is not None: + self.final_output_name += self.lto_output_suffix + self.new_args.extend(['-o', self.final_output_name]) + + def add_leaf_object_files(self) -> None: + """ + Goes over all the object files that the original node transitively depends on, and adds + them to the link command if they have not already been added. + """ + + transitive_deps = self.initial_node.get_recursive_deps( + skip_node_types=set([NodeType.EXECUTABLE])) + with WorkDirContext(self.build_root): + # Sort nodes by path for determinism. + for node in sorted(list(transitive_deps), key=lambda dep: dep.path): + if node in self.obj_file_graph_nodes: + # Dedup .cc.o files already existing on the command line. + continue + + if (node.node_type == NodeType.OBJECT and + os.path.basename(os.path.dirname(node.path)) != 'yb_common_base.dir'): + self.new_args.add_new_arg(node.path) + + for arg in self.yb_pgbackend_link_cmd: + if arg.endswith('.o'): + if os.path.basename(arg) == 'main_cpp_wrapper.cc.o': + # TOOD: why is this file even linked into libyb_pgbackend? + continue + self.new_args.append(os.path.join(self.pg_backend_build_dir, arg)) + continue + if (arg.startswith('-l') and + not self.new_args.contains(arg) and + not arg.startswith('-lyb_')): + self.process_arg(arg) + + def add_final_args(self) -> None: + self.new_args.extend([ + '-L%s' % os.path.join(self.build_root, 'postgres', 'lib'), + '-l:libpgcommon.a', + '-l:libpgport.a', + '-l:libpq.a', + '-fwhole-program', + '-Wl,-v', + '-nostdlib++', + '-flto=full', + ]) + + for lib_name in ['libc++.a', 'libc++abi.a']: + self.new_args.append(os.path.join( + self.thirdparty_path, 'installed', 'uninstrumented', 'libcxx', 'lib', lib_name)) + + with WorkDirContext(self.build_root): + self.write_link_cmd_file(self.final_output_name + '_lto_link_cmd_args.txt') + + def run_linker(self) -> None: + with WorkDirContext(self.build_root): + start_time_sec = time.time() + logging.info("Running linker") + try: + subprocess.check_call(self.new_args.args) + except subprocess.CalledProcessError as ex: + # Avoid printing the extremely long command line. + logging.error("Linker returned exit code %d", ex.returncode) + elapsed_time_sec = time.time() - start_time_sec + logging.info("Linking finished in %.1f sec", elapsed_time_sec) + + def write_link_cmd_file(self, out_path: str) -> None: + logging.info("Writing the linker command line (one argument per line) to %s", + os.path.abspath(out_path)) + write_file('\n'.join(self.new_args.args), out_path) + + +def link_whole_program( + dep_graph: DependencyGraph, + initial_nodes: Iterable[Node], + link_cmd_out_file: Optional[str], + run_linker: bool, + lto_output_suffix: Optional[str]) -> None: + initial_node_list = list(initial_nodes) + assert len(initial_node_list) == 1 + initial_node = initial_node_list[0] + + link_helper = LinkHelper( + dep_graph=dep_graph, + initial_node=initial_node, + lto_output_suffix=lto_output_suffix + ) + + # We stop recursive traversal at executables because those are just code generators + # (protoc-gen-insertions, protoc-gen-yrpc, bfql_codegen, bfpg_codegen). + + conf = dep_graph.conf + + new_args = link_helper.new_args + + link_helper.consume_original_link_cmd() + link_helper.add_leaf_object_files() + link_helper.add_final_args() + + with WorkDirContext(conf.build_root): + if link_cmd_out_file: + link_helper.write_link_cmd_file(link_cmd_out_file) + if not run_linker: + return + link_helper.run_linker() diff --git a/python/yb/release_util.py b/python/yb/release_util.py index 9e79ef29bf74..877f3f0d1d69 100644 --- a/python/yb/release_util.py +++ b/python/yb/release_util.py @@ -1,5 +1,5 @@ """ -Copyright (c) YugaByte, Inc. +Copyright (c) Yugabyte, Inc. This module provides utilities for generating and publishing release. """ @@ -12,12 +12,18 @@ import shutil import sys import re -import distro +import distro # type: ignore from subprocess import call, check_output from xml.dom import minidom from yb.command_util import run_program, mkdir_p, copy_deep -from yb.common_util import get_thirdparty_dir, is_macos +from yb.common_util import ( + get_thirdparty_dir, + is_macos, + get_compiler_type_from_build_root, +) + +from typing import Dict, Any, Optional, cast, List RELEASE_MANIFEST_NAME = "yb_release_manifest.json" RELEASE_VERSION_FILE = "version.txt" @@ -26,23 +32,53 @@ class ReleaseUtil(object): """Packages a YugaByte package with the appropriate file naming schema.""" - def __init__(self, repository, build_type, distribution_path, force, commit, build_root, - package_name): + release_manifest: Dict[str, Any] + base_version: str + + repository: str + build_type: str + distribution_path: str + force: bool + commit: str + build_root: str + package_name: str + + def __init__( + self, + repository: str, + build_type: str, + distribution_path: str, + force: bool, + commit: Optional[str], + build_root: str, + package_name: str) -> None: + """ + :param repository: the path to YugabyteDB repository (also known as YB_SRC_ROOT). + :param build_type: build type such as "release". + :param distribution_path: the directory where to place the resulting archive. + :param force: whether to skip the prompt in case there are local uncommitted changes. + :param commit: the Git commit SHA1 to use. If not specified, it is autodetected. + :param build_root: the build root directory corresponding to the build type. + :param package_name: the name of the top-level section of yb_release_manifest.json, such as + "all" or "cli", specifying the set of files to include. + """ self.repo = repository self.build_type = build_type self.build_path = os.path.join(self.repo, 'build') self.distribution_path = distribution_path self.force = force self.commit = commit or ReleaseUtil.get_head_commit_hash() - self.base_version = None - with open(os.path.join(self.repo, RELEASE_VERSION_FILE)) as v: + + base_version = None + with open(os.path.join(self.repo, RELEASE_VERSION_FILE)) as version_file: # Remove any build number in the version.txt. - self.base_version = v.read().split("-")[0] - assert self.base_version is not None, \ + base_version = version_file.read().split("-")[0] + assert base_version is not None, \ 'Unable to read {0} file'.format(RELEASE_VERSION_FILE) + self.base_version = base_version - with open(os.path.join(self.repo, RELEASE_MANIFEST_NAME)) as f: - self.release_manifest = json.load(f)[package_name] + with open(os.path.join(self.repo, RELEASE_MANIFEST_NAME)) as release_manifest_file: + self.release_manifest = json.load(release_manifest_file)[package_name] assert self.release_manifest is not None, \ 'Unable to read {0} file'.format(RELEASE_MANIFEST_NAME) self.build_root = build_root @@ -52,13 +88,13 @@ def __init__(self, repository, build_type, distribution_path, force, commit, bui logging.info("Java project version from pom.xml: {}".format(self.java_project_version)) self._rewrite_manifest() - def get_release_manifest(self): + def get_release_manifest(self) -> Dict[str, Any]: return self.release_manifest - def get_seed_executable_patterns(self): - return self.release_manifest['bin'] + def get_seed_executable_patterns(self) -> List[str]: + return cast(List[str], self.release_manifest['bin']) - def expand_value(self, old_value): + def expand_value(self, old_value: str) -> str: """ Expand old_value with the following changes: - Replace ${project.version} with the Java version from pom.xml. @@ -82,7 +118,7 @@ def expand_value(self, old_value): old_value, new_value)) return new_value - def _rewrite_manifest(self): + def _rewrite_manifest(self) -> None: """ Rewrite the release manifest expanding values using expand_value function. """ @@ -94,7 +130,7 @@ def _rewrite_manifest(self): for i in range(len(values)): values[i] = self.expand_value(values[i]) - def repo_expand_path(self, path): + def repo_expand_path(self, path: str) -> str: """ If path is relative treat it as a path within repo and make it absolute. """ @@ -102,7 +138,7 @@ def repo_expand_path(self, path): path = os.path.join(self.repo, path) return path - def create_distribution(self, distribution_dir): + def create_distribution(self, distribution_dir: str) -> None: """This method would read the release_manifest and traverse through the build directory and copy necessary files/symlinks into the distribution_dir Args: @@ -127,7 +163,7 @@ def create_distribution(self, distribution_dir): os.path.join(current_dest_dir, os.path.basename(file_path))) logging.info("Created the distribution at '{}'".format(distribution_dir)) - def update_manifest(self, distribution_dir): + def update_manifest(self, distribution_dir: str) -> None: for release_subdir in ['bin']: if release_subdir in self.release_manifest: del self.release_manifest[release_subdir] @@ -145,10 +181,10 @@ def update_manifest(self, distribution_dir): json.dumps(self.release_manifest, indent=2, sort_keys=True)) @staticmethod - def get_head_commit_hash(): - return check_output(["git", "rev-parse", "HEAD"]).strip().decode() + def get_head_commit_hash() -> str: + return check_output(["git", "rev-parse", "HEAD"]).strip().decode('utf-8') - def get_release_file(self): + def get_release_file(self) -> str: """ This method does couple of checks before generating the release file name. - Checks if there are local uncommitted changes. @@ -159,7 +195,15 @@ def get_release_file(self): Returns: (string): Release file path. """ - release_name = "{}-{}-{}".format(self.base_version, self.commit, self.build_type) + components: List[str] = [ + self.base_version, + self.commit, + self.build_type + ] + compiler_type = get_compiler_type_from_build_root(self.build_root) + if compiler_type != 'gcc': + components.append(compiler_type) + release_name = "-".join(components) system = platform.system().lower() if system == "linux": @@ -169,7 +213,10 @@ def get_release_file(self): release_name, system, platform.machine().lower()) return os.path.join(self.build_path, release_file_name) - def generate_release(self): + def generate_release(self) -> str: + """ + Generates a release package and returns the path to the release file. + """ yugabyte_folder_prefix = "yugabyte-{}".format(self.base_version) tmp_parent_dir = self.distribution_path + '.tmp_for_tar_gz' os.mkdir(tmp_parent_dir) @@ -182,7 +229,7 @@ def generate_release(self): tmp_distribution_dir = os.path.join(tmp_parent_dir, yugabyte_folder_prefix) shutil.move(self.distribution_path, tmp_distribution_dir) - def change_permissions(mode): + def change_permissions(mode: str) -> None: logging.info( "Changing permissions recursively on directory '%s': %s", tmp_distribution_dir, mode) @@ -206,7 +253,7 @@ def change_permissions(mode): os.rmdir(tmp_parent_dir) -def check_for_local_changes(): +def check_for_local_changes() -> None: is_dirty = False if check_output(["git", "diff", "origin/master"]).strip(): logging.error("Local changes exists. This shouldn't be an official release.") diff --git a/python/yb/source_files.py b/python/yb/source_files.py new file mode 100644 index 000000000000..cbecc1092108 --- /dev/null +++ b/python/yb/source_files.py @@ -0,0 +1,117 @@ +# Copyright (c) Yugabyte, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except +# in compliance with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software distributed under the License +# is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express +# or implied. See the License for the specific language governing permissions and limitations +# under the License. + +import os + +from enum import Enum +from typing import Any, Set + + +# As of August 2019, there is nothing in the "bin", "managed" and "www" directories that +# is being used by tests. +# If that changes, this needs to be updated. Note that the "bin" directory here is the +# yugabyte/bin directory in the source tree, not the "bin" directory under the build +# directory, so it only has scripts and not yb-master / yb-tserver binaries. +DIRECTORIES_THAT_DO_NOT_AFFECT_TESTS = [ + 'architecture', + 'bin', + 'cloud', + 'community', + 'docs', + 'managed', + 'sample', + 'www', +] + + +class SourceFileCategory(Enum): + """ + We classify source files in code changes into these categories. + """ + + CMAKE = 'cmake' + POSTGRES = 'postgres' + CPP = 'c++' + JAVA = 'java' + PYTHON = 'python' + DOES_NOT_AFFECT_TESTS = 'does_not_affect_tests' + BUILD_SCRIPTS = 'build_scripts' + OTHER = 'other' + + def __str__(self) -> str: + return self.value + + def __lt__(self, other: Any) -> bool: + assert isinstance(other, SourceFileCategory) + return self.value < other.value + + def __eq__(self, other: Any) -> bool: + assert isinstance(other, SourceFileCategory) + return self.value == other.value + + def __hash__(self) -> int: + return hash(self.value) + + +# File changes in any category other than these will cause all tests to be re-run. Even though +# changes to Python code affect the test framework, we can consider this Python code to be +# reasonably tested already, by running doctests, this script, the packaging script, etc. We can +# remove "python" from this whitelist in the future. +CATEGORIES_NOT_CAUSING_RERUN_OF_ALL_TESTS: Set[SourceFileCategory] = { + SourceFileCategory.JAVA, + SourceFileCategory.CPP, + SourceFileCategory.PYTHON, + SourceFileCategory.DOES_NOT_AFFECT_TESTS +} + + +def get_file_category(rel_path: str) -> SourceFileCategory: + """ + Categorize file changes so that we can decide what tests to run. + + @param rel_path: path relative to the source root (not to the build root) + + >>> get_file_category('src/postgres/src/backend/executor/execScan.c').value + 'postgres' + + """ + if os.path.isabs(rel_path): + raise IOError("Relative path expected, got an absolute path: %s" % rel_path) + basename = os.path.basename(rel_path) + + if rel_path.split(os.sep)[0] in DIRECTORIES_THAT_DO_NOT_AFFECT_TESTS: + return SourceFileCategory.DOES_NOT_AFFECT_TESTS + + if rel_path == 'yb_build.sh': + # The main build script is being run anyway, so we hope that most issues will come out at + # that stage. + return SourceFileCategory.DOES_NOT_AFFECT_TESTS + + if basename == 'CMakeLists.txt' or basename.endswith('.cmake'): + return SourceFileCategory.CMAKE + + if rel_path.startswith('src/postgres'): + return SourceFileCategory.POSTGRES + + if rel_path.startswith('src/') or rel_path.startswith('ent/src/'): + return SourceFileCategory.CPP + + if rel_path.startswith('python/'): + return SourceFileCategory.PYTHON + + if rel_path.startswith('java/'): + return SourceFileCategory.JAVA + + if rel_path.startswith('build-support/'): + return SourceFileCategory.BUILD_SCRIPTS + + return SourceFileCategory.OTHER diff --git a/python/yb/thirdparty_tool.py b/python/yb/thirdparty_tool.py index a3a3d8d8fd08..96dec828a9dd 100644 --- a/python/yb/thirdparty_tool.py +++ b/python/yb/thirdparty_tool.py @@ -615,11 +615,19 @@ def main() -> None: if args.save_thirdparty_url_to_file: make_parent_dir(args.save_thirdparty_url_to_file) write_file(thirdparty_url, args.save_thirdparty_url_to_file) - if args.save_llvm_url_to_file and thirdparty_release.is_linuxbrew: + if (args.save_llvm_url_to_file and + thirdparty_release.compiler_type.startswith('clang') and + thirdparty_release.is_linuxbrew): llvm_url = get_llvm_url(thirdparty_release.compiler_type) if llvm_url is not None: + logging.info(f"Download URL for the LLVM toolchain: {llvm_url}") make_parent_dir(args.save_llvm_url_to_file) write_file(llvm_url, args.save_llvm_url_to_file) + else: + logging.info("Could not determine LLVM URL for compiler type %s" % + thirdparty_release.compiler_type) + else: + logging.info("Not a Linuxbrew URL, not saving LLVM URL to file") if __name__ == '__main__': diff --git a/scripts/installation/bin/yb-ctl b/scripts/installation/bin/yb-ctl index 55ddd98c075a..4e8428b51ddd 100755 --- a/scripts/installation/bin/yb-ctl +++ b/scripts/installation/bin/yb-ctl @@ -219,6 +219,14 @@ def validate_daemon_type(daemon_type): def get_binary_name_for_daemon_type(daemon_type): + binary_name = None + # Allow the user to use binaries of their choice for yb-master and/or yb-tserver. + if daemon_type == DAEMON_TYPE_MASTER: + binary_name = os.getenv('YB_CTL_MASTER_DAEMON_FILE_NAME') + elif daemon_type == DAEMON_TYPE_TSERVER: + binary_name = os.getenv('YB_CTL_TSERVER_DAEMON_FILE_NAME') + if binary_name is not None: + return binary_name return "yb-{}".format(daemon_type) @@ -318,8 +326,8 @@ def is_yugabyte_db_installation_dir(top_dir): return False return has_rel_paths( top_dir, [ - 'bin/yb-master', - 'bin/yb-tserver', + 'bin/' + get_binary_name_for_daemon_type(DAEMON_TYPE_MASTER), + 'bin/' + get_binary_name_for_daemon_type(DAEMON_TYPE_TSERVER), 'postgres/bin/postgres' ]) @@ -1225,8 +1233,8 @@ class ClusterControl: # Regex to get process based on daemon type and bind address. Note the explicit space after # rpc_bind_addresses: This is to be able to distinguish between bind addresses with the # same prefix like 127.0.0.1, 127.0.0.11, 127.0.0.12. - return "yb-{} .* --rpc_bind_addresses {} ".format( - daemon_id.daemon_type, + return "{} .* --rpc_bind_addresses {} ".format( + get_binary_name_for_daemon_type(daemon_id.daemon_type), self.options.get_ip_address(daemon_id)) def get_pid(self, daemon_id): diff --git a/src/yb/common/transaction.cc b/src/yb/common/transaction.cc index 717250a9dc9f..ebb804b6f21b 100644 --- a/src/yb/common/transaction.cc +++ b/src/yb/common/transaction.cc @@ -28,7 +28,7 @@ namespace yb { YB_STRONGLY_TYPED_UUID_IMPL(TransactionId); -const std::string kGlobalTransactionsTableName = "transactions"; +const char* kGlobalTransactionsTableName = "transactions"; const std::string kMetricsSnapshotsTableName = "metrics"; const std::string kTransactionTablePrefix = "transactions_"; diff --git a/src/yb/common/transaction.h b/src/yb/common/transaction.h index f3aa9aedf5da..7d5f5b08574c 100644 --- a/src/yb/common/transaction.h +++ b/src/yb/common/transaction.h @@ -320,7 +320,7 @@ std::ostream& operator<<(std::ostream& out, const TransactionMetadata& metadata) MonoDelta TransactionRpcTimeout(); CoarseTimePoint TransactionRpcDeadline(); -extern const std::string kGlobalTransactionsTableName; +extern const char* kGlobalTransactionsTableName; extern const std::string kMetricsSnapshotsTableName; extern const std::string kTransactionTablePrefix; diff --git a/src/yb/common/wire_protocol.h b/src/yb/common/wire_protocol.h index d23f01d9b848..e39c17499e6a 100644 --- a/src/yb/common/wire_protocol.h +++ b/src/yb/common/wire_protocol.h @@ -48,6 +48,7 @@ #include "yb/util/net/net_fwd.h" #include "yb/util/status_ec.h" #include "yb/util/type_traits.h" +#include "yb/util/result.h" namespace yb { diff --git a/src/yb/consensus/consensus_queue.cc b/src/yb/consensus/consensus_queue.cc index 941941c15871..1e43c3d02dc9 100644 --- a/src/yb/consensus/consensus_queue.cc +++ b/src/yb/consensus/consensus_queue.cc @@ -137,8 +137,6 @@ static bool RpcThrottleThresholdBytesValidator(const char* flagname, int32_t val } // namespace DECLARE_int32(rpc_throttle_threshold_bytes); -__attribute__((unused)) -DEFINE_validator(rpc_throttle_threshold_bytes, &RpcThrottleThresholdBytesValidator); namespace yb { namespace consensus { @@ -1607,5 +1605,20 @@ Result PeerMessageQueue::TEST_GetLastOpIdWithType( return log_cache_.TEST_GetLastOpIdWithType(max_allowed_index, op_type); } +Status ValidateFlags() { + // Normally we would have used + // DEFINE_validator(rpc_throttle_threshold_bytes, &RpcThrottleThresholdBytesValidator); + // right after defining the rpc_throttle_threshold_bytes flag. However, this leads to a segfault + // in the LTO-enabled build, presumably due to indeterminate order of static initialization. + // Instead, we invoke this function from master/tserver main() functions when static + // initialization is already finished. + if (!RpcThrottleThresholdBytesValidator( + "rpc_throttle_threshold_bytes", FLAGS_rpc_throttle_threshold_bytes)) { + return STATUS(InvalidArgument, "Flag validation failed"); + } + + return Status::OK(); +} + } // namespace consensus } // namespace yb diff --git a/src/yb/consensus/consensus_queue.h b/src/yb/consensus/consensus_queue.h index 2b62b1696524..f5b6d73ff038 100644 --- a/src/yb/consensus/consensus_queue.h +++ b/src/yb/consensus/consensus_queue.h @@ -618,6 +618,8 @@ class PeerMessageQueueObserver { virtual ~PeerMessageQueueObserver() {} }; +Status ValidateFlags(); + } // namespace consensus } // namespace yb diff --git a/src/yb/gutil/atomicops-internals-tsan.h b/src/yb/gutil/atomicops-internals-tsan.h index b463b9704e8f..aafd8d8f1220 100644 --- a/src/yb/gutil/atomicops-internals-tsan.h +++ b/src/yb/gutil/atomicops-internals-tsan.h @@ -39,7 +39,7 @@ struct AtomicOps_x86CPUFeatureStruct { bool has_sse2; // Processor has SSE2. }; BASE_EXPORT extern struct AtomicOps_x86CPUFeatureStruct - AtomicOps_Internalx86CPUFeatures; + YbAtomicOps_Internalx86CPUFeatures; #define ATOMICOPS_COMPILER_BARRIER() __asm__ __volatile__("" : : : "memory") diff --git a/src/yb/gutil/atomicops-internals-x86.cc b/src/yb/gutil/atomicops-internals-x86.cc index 771696b366a4..44ae73777201 100644 --- a/src/yb/gutil/atomicops-internals-x86.cc +++ b/src/yb/gutil/atomicops-internals-x86.cc @@ -70,14 +70,14 @@ // Set the flags so that code will run correctly and conservatively // until InitGoogle() is called. -struct AtomicOps_x86CPUFeatureStruct AtomicOps_Internalx86CPUFeatures = { +struct AtomicOps_x86CPUFeatureStruct YbAtomicOps_Internalx86CPUFeatures = { false, // bug can't exist before process spawns multiple threads false, // no SSE2 false, // no cmpxchg16b }; -// Initialize the AtomicOps_Internalx86CPUFeatures struct. -static void AtomicOps_Internalx86CPUFeaturesInit() { +// Initialize the YbAtomicOps_Internalx86CPUFeatures struct. +static void YbAtomicOps_Internalx86CPUFeaturesInit() { uint32 eax; uint32 ebx; uint32 ecx; @@ -109,29 +109,29 @@ static void AtomicOps_Internalx86CPUFeaturesInit() { if (strcmp(vendor, "AuthenticAMD") == 0 && // AMD family == 15 && 32 <= model && model <= 63) { - AtomicOps_Internalx86CPUFeatures.has_amd_lock_mb_bug = true; + YbAtomicOps_Internalx86CPUFeatures.has_amd_lock_mb_bug = true; } else { - AtomicOps_Internalx86CPUFeatures.has_amd_lock_mb_bug = false; + YbAtomicOps_Internalx86CPUFeatures.has_amd_lock_mb_bug = false; } // edx bit 26 is SSE2 which we use to tell use whether we can use mfence - AtomicOps_Internalx86CPUFeatures.has_sse2 = ((edx >> 26) & 1); + YbAtomicOps_Internalx86CPUFeatures.has_sse2 = ((edx >> 26) & 1); // ecx bit 13 indicates whether the cmpxchg16b instruction is supported - AtomicOps_Internalx86CPUFeatures.has_cmpxchg16b = ((ecx >> 13) & 1); + YbAtomicOps_Internalx86CPUFeatures.has_cmpxchg16b = ((ecx >> 13) & 1); VLOG(1) << "vendor " << vendor << " family " << family << " model " << model << " amd_lock_mb_bug " << - AtomicOps_Internalx86CPUFeatures.has_amd_lock_mb_bug << - " sse2 " << AtomicOps_Internalx86CPUFeatures.has_sse2 << - " cmpxchg16b " << AtomicOps_Internalx86CPUFeatures.has_cmpxchg16b; + YbAtomicOps_Internalx86CPUFeatures.has_amd_lock_mb_bug << + " sse2 " << YbAtomicOps_Internalx86CPUFeatures.has_sse2 << + " cmpxchg16b " << YbAtomicOps_Internalx86CPUFeatures.has_cmpxchg16b; } // AtomicOps initialisation routine for external use. void AtomicOps_x86CPUFeaturesInit() { - AtomicOps_Internalx86CPUFeaturesInit(); + YbAtomicOps_Internalx86CPUFeaturesInit(); } #endif diff --git a/src/yb/gutil/atomicops-internals-x86.h b/src/yb/gutil/atomicops-internals-x86.h index 3d11c7dc0c53..bea24943c972 100644 --- a/src/yb/gutil/atomicops-internals-x86.h +++ b/src/yb/gutil/atomicops-internals-x86.h @@ -60,7 +60,7 @@ struct AtomicOps_x86CPUFeatureStruct { bool has_sse2; // Processor has SSE2. bool has_cmpxchg16b; // Processor supports cmpxchg16b instruction. }; -extern struct AtomicOps_x86CPUFeatureStruct AtomicOps_Internalx86CPUFeatures; +extern struct AtomicOps_x86CPUFeatureStruct YbAtomicOps_Internalx86CPUFeatures; #define ATOMICOPS_COMPILER_BARRIER() __asm__ __volatile__("" : : : "memory") @@ -119,7 +119,7 @@ inline Atomic32 Acquire_AtomicExchange(volatile Atomic32* ptr, Atomic32 new_value) { CheckNaturalAlignment(ptr); Atomic32 old_val = NoBarrier_AtomicExchange(ptr, new_value); - if (AtomicOps_Internalx86CPUFeatures.has_amd_lock_mb_bug) { + if (YbAtomicOps_Internalx86CPUFeatures.has_amd_lock_mb_bug) { __asm__ __volatile__("lfence" : : : "memory"); } return old_val; @@ -149,7 +149,7 @@ inline Atomic32 Barrier_AtomicIncrement(volatile Atomic32* ptr, : "+r" (temp), "+m" (*ptr) : : "memory"); // temp now holds the old value of *ptr - if (AtomicOps_Internalx86CPUFeatures.has_amd_lock_mb_bug) { + if (YbAtomicOps_Internalx86CPUFeatures.has_amd_lock_mb_bug) { __asm__ __volatile__("lfence" : : : "memory"); } return temp + increment; @@ -159,7 +159,7 @@ inline Atomic32 Acquire_CompareAndSwap(volatile Atomic32* ptr, Atomic32 old_value, Atomic32 new_value) { Atomic32 x = NoBarrier_CompareAndSwap(ptr, old_value, new_value); - if (AtomicOps_Internalx86CPUFeatures.has_amd_lock_mb_bug) { + if (YbAtomicOps_Internalx86CPUFeatures.has_amd_lock_mb_bug) { __asm__ __volatile__("lfence" : : : "memory"); } return x; @@ -200,7 +200,7 @@ inline void Acquire_Store(volatile Atomic32* ptr, Atomic32 value) { #else inline void MemoryBarrier() { - if (AtomicOps_Internalx86CPUFeatures.has_sse2) { + if (YbAtomicOps_Internalx86CPUFeatures.has_sse2) { __asm__ __volatile__("mfence" : : : "memory"); } else { // mfence is faster but not present on PIII Atomic32 x = 0; @@ -209,7 +209,7 @@ inline void MemoryBarrier() { } inline void Acquire_Store(volatile Atomic32* ptr, Atomic32 value) { - if (AtomicOps_Internalx86CPUFeatures.has_sse2) { + if (YbAtomicOps_Internalx86CPUFeatures.has_sse2) { CheckNaturalAlignment(ptr); *ptr = value; __asm__ __volatile__("mfence" : : : "memory"); @@ -274,7 +274,7 @@ inline Atomic64 NoBarrier_AtomicExchange(volatile Atomic64* ptr, inline Atomic64 Acquire_AtomicExchange(volatile Atomic64* ptr, Atomic64 new_value) { Atomic64 old_val = NoBarrier_AtomicExchange(ptr, new_value); - if (AtomicOps_Internalx86CPUFeatures.has_amd_lock_mb_bug) { + if (YbAtomicOps_Internalx86CPUFeatures.has_amd_lock_mb_bug) { __asm__ __volatile__("lfence" : : : "memory"); } return old_val; @@ -304,7 +304,7 @@ inline Atomic64 Barrier_AtomicIncrement(volatile Atomic64* ptr, : "+r" (temp), "+m" (*ptr) : : "memory"); // temp now contains the previous value of *ptr - if (AtomicOps_Internalx86CPUFeatures.has_amd_lock_mb_bug) { + if (YbAtomicOps_Internalx86CPUFeatures.has_amd_lock_mb_bug) { __asm__ __volatile__("lfence" : : : "memory"); } return temp + increment; @@ -420,7 +420,7 @@ inline Atomic64 Acquire_AtomicExchange(volatile Atomic64* ptr, Atomic64 new_val) { CheckNaturalAlignment(ptr); Atomic64 old_val = NoBarrier_AtomicExchange(ptr, new_val); - if (AtomicOps_Internalx86CPUFeatures.has_amd_lock_mb_bug) { + if (YbAtomicOps_Internalx86CPUFeatures.has_amd_lock_mb_bug) { __asm__ __volatile__("lfence" : : : "memory"); } return old_val; @@ -448,7 +448,7 @@ inline Atomic64 Barrier_AtomicIncrement(volatile Atomic64* ptr, Atomic64 increment) { CheckNaturalAlignment(ptr); Atomic64 new_val = NoBarrier_AtomicIncrement(ptr, increment); - if (AtomicOps_Internalx86CPUFeatures.has_amd_lock_mb_bug) { + if (YbAtomicOps_Internalx86CPUFeatures.has_amd_lock_mb_bug) { __asm__ __volatile__("lfence" : : : "memory"); } return new_val; @@ -510,7 +510,7 @@ inline Atomic64 Acquire_CompareAndSwap(volatile Atomic64* ptr, Atomic64 old_value, Atomic64 new_value) { Atomic64 x = NoBarrier_CompareAndSwap(ptr, old_value, new_value); - if (AtomicOps_Internalx86CPUFeatures.has_amd_lock_mb_bug) { + if (YbAtomicOps_Internalx86CPUFeatures.has_amd_lock_mb_bug) { __asm__ __volatile__("lfence" : : : "memory"); } return x; diff --git a/src/yb/gutil/dynamic_annotations.c b/src/yb/gutil/dynamic_annotations.c index 14a813769b70..d84484e37e8f 100644 --- a/src/yb/gutil/dynamic_annotations.c +++ b/src/yb/gutil/dynamic_annotations.c @@ -141,7 +141,7 @@ void AnnotateFlushState(const char *file, int line){} #if DYNAMIC_ANNOTATIONS_EXTERNAL_IMPL == 0 -static int GetRunningOnValgrind(void) { +static int GetYbRunningOnValgrind(void) { #ifdef RUNNING_ON_VALGRIND if (RUNNING_ON_VALGRIND) return 1; #endif @@ -153,24 +153,24 @@ static int GetRunningOnValgrind(void) { } /* See the comments in dynamic_annotations.h */ -int RunningOnValgrind(void) { +int YbRunningOnValgrind(void) { static volatile int running_on_valgrind = -1; int local_running_on_valgrind = running_on_valgrind; /* C doesn't have thread-safe initialization of statics, and we don't want to depend on pthread_once here, so hack it. */ ANNOTATE_BENIGN_RACE(&running_on_valgrind, "safe hack"); if (local_running_on_valgrind == -1) - running_on_valgrind = local_running_on_valgrind = GetRunningOnValgrind(); + running_on_valgrind = local_running_on_valgrind = GetYbRunningOnValgrind(); return local_running_on_valgrind; } /* See the comments in dynamic_annotations.h */ -double ValgrindSlowdown(void) { - /* Same initialization hack as in RunningOnValgrind(). */ +double YbValgrindSlowdown(void) { + /* Same initialization hack as in YbRunningOnValgrind(). */ static volatile double slowdown = 0.0; double local_slowdown = slowdown; ANNOTATE_BENIGN_RACE(&slowdown, "safe hack"); - if (RunningOnValgrind() == 0) { + if (YbRunningOnValgrind() == 0) { return 1.0; } if (local_slowdown == 0.0) { diff --git a/src/yb/gutil/dynamic_annotations.h b/src/yb/gutil/dynamic_annotations.h index 9fbc73e5d2e6..23e149450274 100644 --- a/src/yb/gutil/dynamic_annotations.h +++ b/src/yb/gutil/dynamic_annotations.h @@ -419,7 +419,7 @@ default crosstool/GCC supports these GCC attributes. */ #define ANNOTALYSIS_STATIC_INLINE -#define ANNOTALYSIS_SEMICOLON_OR_EMPTY_BODY ; +#define ANNOTALYSIS_SEMICOLON_OR_EMPTY_BODY ; // NOLINT #define ANNOTALYSIS_IGNORE_READS_BEGIN #define ANNOTALYSIS_IGNORE_READS_END #define ANNOTALYSIS_IGNORE_WRITES_BEGIN @@ -488,11 +488,11 @@ extern "C" { void AnnotateRWLockCreate(const char *file, int line, void *lock); void AnnotateRWLockCreateStatic(const char *file, int line, void *lock); void AnnotateRWLockDestroy(const char *file, int line, void *lock); -void AnnotateRWLockAcquired(const char *file, int line, void *lock, long is_w); -void AnnotateRWLockReleased(const char *file, int line, void *lock, long is_w); +void AnnotateRWLockAcquired(const char *file, int line, void *lock, long is_w); // NOLINT +void AnnotateRWLockReleased(const char *file, int line, void *lock, long is_w); // NOLINT void AnnotateBarrierInit(const char *file, int line, - const volatile void *barrier, long count, - long reinitialization_allowed); + const volatile void *barrier, long count, // NOLINT + long reinitialization_allowed); // NOLINT void AnnotateBarrierWaitBefore(const char *file, int line, const volatile void *barrier); void AnnotateBarrierWaitAfter(const char *file, int line, @@ -508,10 +508,10 @@ void AnnotateCondVarSignalAll(const char *file, int line, const volatile void *cv); void AnnotatePublishMemoryRange(const char *file, int line, const volatile void *address, - long size); + long size); // NOLINT void AnnotateUnpublishMemoryRange(const char *file, int line, const volatile void *address, - long size); + long size); // NOLINT void AnnotatePCQCreate(const char *file, int line, const volatile void *pcq); void AnnotatePCQDestroy(const char *file, int line, @@ -529,7 +529,7 @@ void AnnotateBenignRace(const char *file, int line, const char *description); void AnnotateBenignRaceSized(const char *file, int line, const volatile void *address, - long size, + long size, // NOLINT const char *description); void AnnotateMutexIsUsedAsCondVar(const char *file, int line, const volatile void *mu); @@ -568,23 +568,23 @@ void AnnotateFlushState(const char *file, int line); If for some reason you can't use "valgrind.h" or want to fake valgrind, there are two ways to make this function return non-zero: - Use environment variable: export RUNNING_ON_VALGRIND=1 - - Make your tool intercept the function RunningOnValgrind() and + - Make your tool intercept the function YbRunningOnValgrind() and change its return value. */ -int RunningOnValgrind(void); +int YbRunningOnValgrind(void); -/* ValgrindSlowdown returns: - * 1.0, if (RunningOnValgrind() == 0) - * 50.0, if (RunningOnValgrind() != 0 && getenv("VALGRIND_SLOWDOWN") == NULL) +/* YbValgrindSlowdown returns: + * 1.0, if (YbRunningOnValgrind() == 0) + * 50.0, if (YbRunningOnValgrind() != 0 && getenv("VALGRIND_SLOWDOWN") == NULL) * atof(getenv("VALGRIND_SLOWDOWN")) otherwise This function can be used to scale timeout values: EXAMPLE: for (;;) { DoExpensiveBackgroundTask(); - SleepForSeconds(5 * ValgrindSlowdown()); + SleepForSeconds(5 * YbValgrindSlowdown()); } */ -double ValgrindSlowdown(void); +double YbValgrindSlowdown(void); /* AddressSanitizer annotations from LLVM asan_interface.h */ @@ -647,7 +647,7 @@ void __asan_set_death_callback(void (*callback)(void)); one can use ... = ANNOTATE_UNPROTECTED_READ(x); */ template - inline T ANNOTATE_UNPROTECTED_READ(const volatile T &x) + inline T ANNOTATE_UNPROTECTED_READ(const volatile T &x) // NOLINT ANNOTALYSIS_UNPROTECTED_READ { ANNOTATE_IGNORE_READS_BEGIN(); T res = x; @@ -666,7 +666,7 @@ void __asan_set_death_callback(void (*callback)(void)); } \ }; \ static static_var ## _annotator the ## static_var ## _annotator;\ - } + } // namespace template class UnprotectedWriter { @@ -753,7 +753,7 @@ void __asan_set_death_callback(void (*callback)(void)); #if defined(__cplusplus) #undef ANNOTATE_UNPROTECTED_READ template - inline T ANNOTATE_UNPROTECTED_READ(const volatile T &x) + inline T ANNOTATE_UNPROTECTED_READ(const volatile T &x) // NOLINT ANNOTALYSIS_UNPROTECTED_READ { ANNOTATE_IGNORE_READS_BEGIN(); T res = x; @@ -797,7 +797,7 @@ void __asan_set_death_callback(void (*callback)(void)); #if defined(__cplusplus) #undef ANNOTATE_UNPROTECTED_READ template - inline T ANNOTATE_UNPROTECTED_READ(const volatile T &x) { + inline T ANNOTATE_UNPROTECTED_READ(const volatile T &x) { // NOLINT ANNOTATE_IGNORE_READS_BEGIN(); T res = x; ANNOTATE_IGNORE_READS_END(); diff --git a/src/yb/gutil/once.cc b/src/yb/gutil/once.cc index 45924935f588..41c43f374d77 100644 --- a/src/yb/gutil/once.cc +++ b/src/yb/gutil/once.cc @@ -39,15 +39,15 @@ void GoogleOnceInternalInit(Atomic32 *control, void (*func)(), "or there's a memory corruption."; } } - static const base::internal::SpinLockWaitTransition trans[] = { + static const yb::base::internal::SpinLockWaitTransition trans[] = { { GOOGLE_ONCE_INTERNAL_INIT, GOOGLE_ONCE_INTERNAL_RUNNING, true }, { GOOGLE_ONCE_INTERNAL_RUNNING, GOOGLE_ONCE_INTERNAL_WAITER, false }, { GOOGLE_ONCE_INTERNAL_DONE, GOOGLE_ONCE_INTERNAL_DONE, true } }; // Short circuit the simplest case to avoid procedure call overhead. - if (base::subtle::Acquire_CompareAndSwap(control, GOOGLE_ONCE_INTERNAL_INIT, + if (::base::subtle::Acquire_CompareAndSwap(control, GOOGLE_ONCE_INTERNAL_INIT, GOOGLE_ONCE_INTERNAL_RUNNING) == GOOGLE_ONCE_INTERNAL_INIT || - base::internal::SpinLockWait(control, ARRAYSIZE(trans), trans) == + yb::base::internal::SpinLockWait(control, ARRAYSIZE(trans), trans) == GOOGLE_ONCE_INTERNAL_INIT) { if (func != nullptr) { (*func)(); @@ -56,9 +56,9 @@ void GoogleOnceInternalInit(Atomic32 *control, void (*func)(), } ANNOTATE_HAPPENS_BEFORE(control); int64 old_control = base::subtle::NoBarrier_Load(control); - base::subtle::Release_Store(control, GOOGLE_ONCE_INTERNAL_DONE); + ::base::subtle::Release_Store(control, GOOGLE_ONCE_INTERNAL_DONE); if (old_control == GOOGLE_ONCE_INTERNAL_WAITER) { - base::internal::SpinLockWake(control, true); + yb::base::internal::SpinLockWake(control, true); } } // else *control is already GOOGLE_ONCE_INTERNAL_DONE } diff --git a/src/yb/gutil/spinlock.cc b/src/yb/gutil/spinlock.cc index 99ee0232a146..ba9707cf238b 100644 --- a/src/yb/gutil/spinlock.cc +++ b/src/yb/gutil/spinlock.cc @@ -148,8 +148,7 @@ void SpinLock::SlowLock() { } // Wait for an OS specific delay. - base::internal::SpinLockDelay(&lockword_, lock_value, - ++lock_wait_call_count); + yb::base::internal::SpinLockDelay(&lockword_, lock_value, ++lock_wait_call_count); // Spin again after returning from the wait routine to give this thread // some chance of obtaining the lock. lock_value = SpinLoop(wait_start_time, &wait_cycles); @@ -169,7 +168,7 @@ void SpinLock::SlowLock() { enum { PROFILE_TIMESTAMP_SHIFT = 7 }; void SpinLock::SlowUnlock(uint64 wait_cycles) { - base::internal::SpinLockWake(&lockword_, false); // wake waiter if necessary + yb::base::internal::SpinLockWake(&lockword_, false); // wake waiter if necessary // Collect contentionz profile info, expanding the wait_cycles back out to // the full value. If wait_cycles is <= kSpinLockSleeper, then no wait diff --git a/src/yb/gutil/spinlock_internal.cc b/src/yb/gutil/spinlock_internal.cc index 322d921d84a8..3a4f2157fd4d 100644 --- a/src/yb/gutil/spinlock_internal.cc +++ b/src/yb/gutil/spinlock_internal.cc @@ -45,21 +45,27 @@ */ // The OS-specific header included below must provide two calls: -// base::internal::SpinLockDelay() and base::internal::SpinLockWake(). +// yb::base::internal::SpinLockDelay() and yb::base::internal::SpinLockWake(). // See spinlock_internal.h for the spec of SpinLockWake(). // void SpinLockDelay(volatile Atomic32 *w, int32 value, int loop) // SpinLockDelay() generates an apprproate spin delay on iteration "loop" of a // spin loop on location *w, whose previously observed value was "value". // SpinLockDelay() may do nothing, may yield the CPU, may sleep a clock tick, -// or may wait for a delay that can be truncated by a call to SpinlockWake(w). -// In all cases, it must return in bounded time even if SpinlockWake() is not +// or may wait for a delay that can be truncated by a call to SpinLockWake(w). +// In all cases, it must return in bounded time even if SpinLockWake() is not // called. #include "yb/gutil/spinlock_internal.h" // forward declaration for use by spinlock_*-inl.h -namespace base { namespace internal { static int SuggestedDelayNS(int loop); }} +namespace yb { +namespace base { +namespace internal { +static int SuggestedDelayNS(int loop); +} // namespace internal +} // namespace base +} // namespace yb #if defined(_WIN32) #include "yb/gutil/spinlock_win32-inl.h" @@ -69,6 +75,7 @@ namespace base { namespace internal { static int SuggestedDelayNS(int loop); }} #include "yb/gutil/spinlock_posix-inl.h" #endif +namespace yb { namespace base { namespace internal { @@ -78,14 +85,14 @@ int32 SpinLockWait(volatile Atomic32 *w, int n, int32 v; bool done = false; for (int loop = 0; !done; loop++) { - v = base::subtle::Acquire_Load(w); + v = ::base::subtle::Acquire_Load(w); int i; for (i = 0; i != n && v != trans[i].from; i++) { } if (i == n) { SpinLockDelay(w, v, loop); // no matching transition } else if (trans[i].to == v || // null transition - base::subtle::Acquire_CompareAndSwap(w, v, trans[i].to) == v) { + ::base::subtle::Acquire_CompareAndSwap(w, v, trans[i].to) == v) { done = trans[i].done; } } @@ -97,10 +104,10 @@ static int SuggestedDelayNS(int loop) { // Weak pseudo-random number generator to get some spread between threads // when many are spinning. #ifdef BASE_HAS_ATOMIC64 - static base::subtle::Atomic64 rand; - uint64 r = base::subtle::NoBarrier_Load(&rand); + static ::base::subtle::Atomic64 rand; + uint64 r = ::base::subtle::NoBarrier_Load(&rand); r = 0x5deece66dLL * r + 0xb; // numbers from nrand48() - base::subtle::NoBarrier_Store(&rand, r); + ::base::subtle::NoBarrier_Store(&rand, r); r <<= 16; // 48-bit random number now in top 48-bits. if (loop < 0 || loop > 32) { // limit loop to 0..32 @@ -115,9 +122,9 @@ static int SuggestedDelayNS(int loop) { return static_cast(r >> (44 - (loop >> 3))); #else static Atomic32 rand; - uint32 r = base::subtle::NoBarrier_Load(&rand); + uint32 r = ::base::subtle::NoBarrier_Load(&rand); r = 0x343fd * r + 0x269ec3; // numbers from MSVC++ - base::subtle::NoBarrier_Store(&rand, r); + ::base::subtle::NoBarrier_Store(&rand, r); r <<= 1; // 31-bit random number now in top 31-bits. if (loop < 0 || loop > 32) { // limit loop to 0..32 @@ -135,3 +142,4 @@ static int SuggestedDelayNS(int loop) { } // namespace internal } // namespace base +} // namespace yb diff --git a/src/yb/gutil/spinlock_internal.h b/src/yb/gutil/spinlock_internal.h index bfc2790258cb..a983097d4e6f 100644 --- a/src/yb/gutil/spinlock_internal.h +++ b/src/yb/gutil/spinlock_internal.h @@ -54,9 +54,9 @@ #include "yb/gutil/atomicops.h" #include "yb/gutil/integral_types.h" +namespace yb { namespace base { namespace internal { - // SpinLockWait() waits until it can perform one of several transitions from // "from" to "to". It returns when it performs a transition where done==true. struct SpinLockWaitTransition { @@ -76,5 +76,6 @@ void SpinLockDelay(volatile Atomic32 *w, int32 value, int loop); } // namespace internal } // namespace base +} // namespace yb #endif // YB_GUTIL_SPINLOCK_INTERNAL_H diff --git a/src/yb/gutil/spinlock_linux-inl.h b/src/yb/gutil/spinlock_linux-inl.h index 2644fe3c8b8a..9a9e40837f29 100644 --- a/src/yb/gutil/spinlock_linux-inl.h +++ b/src/yb/gutil/spinlock_linux-inl.h @@ -77,6 +77,7 @@ static struct InitModule { } // anonymous namespace +namespace yb { namespace base { namespace internal { @@ -86,7 +87,7 @@ void SpinLockDelay(volatile Atomic32 *w, int32 value, int loop) { struct timespec tm; tm.tv_sec = 0; if (have_futex) { - tm.tv_nsec = base::internal::SuggestedDelayNS(loop); + tm.tv_nsec = yb::base::internal::SuggestedDelayNS(loop); } else { tm.tv_nsec = 2000001; // above 2ms so linux 2.4 doesn't spin } @@ -113,3 +114,4 @@ void SpinLockWake(volatile Atomic32 *w, bool all) { } // namespace internal } // namespace base +} // namespace yb diff --git a/src/yb/gutil/spinlock_posix-inl.h b/src/yb/gutil/spinlock_posix-inl.h index d43db6a30a56..74a0257c96ff 100644 --- a/src/yb/gutil/spinlock_posix-inl.h +++ b/src/yb/gutil/spinlock_posix-inl.h @@ -47,12 +47,16 @@ * This file is a Posix-specific part of spinlock_internal.cc */ +#ifndef YB_GUTIL_SPINLOCK_POSIX_INL_H +#define YB_GUTIL_SPINLOCK_POSIX_INL_H + #include #if defined(HAVE_SCHED_H) || defined(__APPLE__) #include /* For sched_yield() */ #endif #include /* For nanosleep() */ +namespace yb { namespace base { namespace internal { @@ -64,7 +68,7 @@ void SpinLockDelay(volatile Atomic32 *w, int32 value, int loop) { } else { struct timespec tm; tm.tv_sec = 0; - tm.tv_nsec = base::internal::SuggestedDelayNS(loop); + tm.tv_nsec = yb::base::internal::SuggestedDelayNS(loop); nanosleep(&tm, NULL); } errno = save_errno; @@ -75,3 +79,6 @@ void SpinLockWake(volatile Atomic32 *w, bool all) { } // namespace internal } // namespace base +} // namespace yb + +#endif // YB_GUTIL_SPINLOCK_POSIX_INL_H diff --git a/src/yb/gutil/spinlock_win32-inl.h b/src/yb/gutil/spinlock_win32-inl.h index d5d0b5f51c43..b375ccf940d6 100644 --- a/src/yb/gutil/spinlock_win32-inl.h +++ b/src/yb/gutil/spinlock_win32-inl.h @@ -47,9 +47,12 @@ * This file is a Win32-specific part of spinlock_internal.cc */ +#ifndef YB_GUTIL_SPINLOCK_WIN32_INL_H +#define YB_GUTIL_SPINLOCK_WIN32_INL_H #include +namespace yb { namespace base { namespace internal { @@ -67,3 +70,6 @@ void SpinLockWake(volatile Atomic32 *w, bool all) { } // namespace internal } // namespace base +} // namespace yb + +#endif // YB_GUTIL_SPINLOCK_WIN32_INL_H diff --git a/src/yb/gutil/sysinfo.cc b/src/yb/gutil/sysinfo.cc index 343f911cf3e9..ffbecd74f5d9 100644 --- a/src/yb/gutil/sysinfo.cc +++ b/src/yb/gutil/sysinfo.cc @@ -200,7 +200,7 @@ static int ReadMaxCPUIndex() { static void InitializeSystemInfo() { bool saw_mhz = false; - if (RunningOnValgrind()) { + if (YbRunningOnValgrind()) { // Valgrind may slow the progress of time artificially (--scale-time=N // option). We thus can't rely on CPU Mhz info stored in /sys or /proc // files. Thus, actually measure the cps. diff --git a/src/yb/master/master_main.cc b/src/yb/master/master_main.cc index 1a28933b1374..f53b2cc67cc1 100644 --- a/src/yb/master/master_main.cc +++ b/src/yb/master/master_main.cc @@ -37,6 +37,7 @@ #include "yb/common/wire_protocol.h" #include "yb/consensus/log_util.h" +#include "yb/consensus/consensus_queue.h" #include "yb/gutil/sysinfo.h" @@ -53,6 +54,9 @@ #include "yb/util/mem_tracker.h" #include "yb/util/result.h" #include "yb/util/ulimit_util.h" +#include "yb/util/debug/trace_event.h" + +#include "yb/tserver/server_main_util.h" DECLARE_bool(callhome_enabled); DECLARE_bool(evict_failed_followers); @@ -110,36 +114,13 @@ static int MasterMain(int argc, char** argv) { // the desired replication factor. (It's not turtles all the way down!) FLAGS_evict_failed_followers = false; - // Only write FATALs by default to stderr. - FLAGS_stderrthreshold = google::FATAL; - - // Do not sync GLOG to disk for INFO, WARNING. - // ERRORs, and FATALs will still cause a sync to disk. - FLAGS_logbuflevel = google::GLOG_WARNING; - ParseCommandLineFlags(&argc, &argv, true); - if (argc != 1) { - std::cerr << "usage: " << argv[0] << std::endl; - return 1; - } - LOG_AND_RETURN_FROM_MAIN_NOT_OK(log::ModifyDurableWriteFlagIfNotODirect()); - LOG_AND_RETURN_FROM_MAIN_NOT_OK(InitYB(MasterOptions::kServerType, argv[0])); - LOG(INFO) << "NumCPUs determined to be: " << base::NumCPUs(); - - MemTracker::SetTCMallocCacheMemory(); - - LOG_AND_RETURN_FROM_MAIN_NOT_OK(GetPrivateIpMode()); + LOG_AND_RETURN_FROM_MAIN_NOT_OK(MasterTServerParseFlagsAndInit( + MasterOptions::kServerType, &argc, &argv)); auto opts_result = MasterOptions::CreateMasterOptions(); LOG_AND_RETURN_FROM_MAIN_NOT_OK(opts_result); enterprise::Master server(*opts_result); - if (FLAGS_remote_boostrap_rate_limit_bytes_per_sec > 0) { - LOG(WARNING) << "Flag remote_boostrap_rate_limit_bytes_per_sec has been deprecated. " - << "Use remote_bootstrap_rate_limit_bytes_per_sec flag instead"; - FLAGS_remote_bootstrap_rate_limit_bytes_per_sec = - FLAGS_remote_boostrap_rate_limit_bytes_per_sec; - } - SetDefaultInitialSysCatalogSnapshotFlags(); // ============================================================================================== diff --git a/src/yb/server/webserver_options.cc b/src/yb/server/webserver_options.cc index 1624fb673d93..b0e8ee04beac 100644 --- a/src/yb/server/webserver_options.cc +++ b/src/yb/server/webserver_options.cc @@ -60,7 +60,9 @@ DEFINE_string( "present in the list of comma separated rpc_bind_addresses"); TAG_FLAG(webserver_interface, advanced); -DEFINE_string(webserver_doc_root, yb::GetDefaultDocumentRoot(), +// We use an empty default value here because we can't call GetDefaultDocumentRoot from flag +// initilization. Instead, we call GetDefaultDocumentRoot if we find that the flag is empty. +DEFINE_string(webserver_doc_root, "", "Files under are accessible via the debug webserver. " "Defaults to $YB_HOME/www, or if $YB_HOME is not set, disables the document " "root"); @@ -98,12 +100,13 @@ static string GetDefaultDocumentRoot() { WebserverOptions::WebserverOptions() : bind_interface(FLAGS_webserver_interface), port(FLAGS_webserver_port), - doc_root(FLAGS_webserver_doc_root), enable_doc_root(FLAGS_webserver_enable_doc_root), certificate_file(FLAGS_webserver_certificate_file), authentication_domain(FLAGS_webserver_authentication_domain), password_file(FLAGS_webserver_password_file), num_worker_threads(FLAGS_webserver_num_worker_threads) { + doc_root = FLAGS_webserver_doc_root.empty() ? + GetDefaultDocumentRoot() : FLAGS_webserver_doc_root; } } // namespace yb diff --git a/src/yb/tserver/CMakeLists.txt b/src/yb/tserver/CMakeLists.txt index 6c9876329cec..8c8b4bfc57d8 100644 --- a/src/yb/tserver/CMakeLists.txt +++ b/src/yb/tserver/CMakeLists.txt @@ -208,6 +208,7 @@ set(TSERVER_SRCS ts_tablet_manager.cc tserver-path-handlers.cc tserver_metrics_heartbeat_data_provider.cc + server_main_util.cc ${TSERVER_SRCS_EXTENSIONS}) set(TSERVER_DEPS diff --git a/src/yb/tserver/server_main_util.cc b/src/yb/tserver/server_main_util.cc new file mode 100644 index 000000000000..fa0fc8dfb9f6 --- /dev/null +++ b/src/yb/tserver/server_main_util.cc @@ -0,0 +1,79 @@ +// Copyright (c) Yugabyte, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except +// in compliance with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software distributed under the License +// is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express +// or implied. See the License for the specific language governing permissions and limitations +// under the License. +// + +#include "yb/tserver/server_main_util.h" + +#include + +#include "yb/util/init.h" +#include "yb/util/flags.h" +#include "yb/util/status.h" +#include "yb/util/debug/trace_event.h" +#include "yb/util/mem_tracker.h" + +#include "yb/common/wire_protocol.h" + +#include "yb/server/skewed_clock.h" + +#include "yb/consensus/log_util.h" +#include "yb/consensus/consensus_queue.h" + +DECLARE_int64(remote_bootstrap_rate_limit_bytes_per_sec); + +// Deprecated because it's misspelled. But if set, this flag takes precedence over +// remote_bootstrap_rate_limit_bytes_per_sec for compatibility. +DECLARE_int64(remote_boostrap_rate_limit_bytes_per_sec); + +namespace yb { + +Status MasterTServerParseFlagsAndInit(const std::string& server_type, int* argc, char*** argv) { + debug::EnableTraceEvents(); + + // Do not sync GLOG to disk for INFO, WARNING. + // ERRORs, and FATALs will still cause a sync to disk. + FLAGS_logbuflevel = google::GLOG_WARNING; + + // Only write FATALs by default to stderr. + FLAGS_stderrthreshold = google::FATAL; + + server::SkewedClock::Register(); + + ParseCommandLineFlags(argc, argv, /* remove_flag= */ true); + if (*argc != 1) { + std::cerr << "usage: " << (*argv)[0] << std::endl; + return STATUS(InvalidArgument, "Error parsing command-line flags"); + } + + RETURN_NOT_OK(log::ModifyDurableWriteFlagIfNotODirect()); + + RETURN_NOT_OK(InitYB(server_type, (*argv)[0])); + + RETURN_NOT_OK(consensus::ValidateFlags()); + + if (FLAGS_remote_boostrap_rate_limit_bytes_per_sec > 0) { + LOG(WARNING) << "Flag remote_boostrap_rate_limit_bytes_per_sec has been deprecated. " + << "Use remote_bootstrap_rate_limit_bytes_per_sec flag instead"; + FLAGS_remote_bootstrap_rate_limit_bytes_per_sec = + FLAGS_remote_boostrap_rate_limit_bytes_per_sec; + } + + RETURN_NOT_OK(GetPrivateIpMode()); + + LOG(INFO) << "NumCPUs determined to be: " << base::NumCPUs(); + + MemTracker::SetTCMallocCacheMemory(); + + return Status::OK(); +} + +} // namespace yb diff --git a/src/yb/tserver/server_main_util.h b/src/yb/tserver/server_main_util.h new file mode 100644 index 000000000000..553f9a89f204 --- /dev/null +++ b/src/yb/tserver/server_main_util.h @@ -0,0 +1,29 @@ +// Copyright (c) Yugabyte, Inc. +// +// Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except +// in compliance with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software distributed under the License +// is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express +// or implied. See the License for the specific language governing permissions and limitations +// under the License. +// + +// Utilities used in main functions of server processes. + +#ifndef YB_TSERVER_SERVER_MAIN_UTIL_H +#define YB_TSERVER_SERVER_MAIN_UTIL_H + +#include + +#include "yb/util/status.h" + +namespace yb { + +Status MasterTServerParseFlagsAndInit(const std::string& server_type, int* argc, char*** argv); + +} // namespace yb + +#endif // YB_TSERVER_SERVER_MAIN_UTIL_H diff --git a/src/yb/tserver/tablet_server_main.cc b/src/yb/tserver/tablet_server_main.cc index 677880ac1f14..6644cfc8c882 100644 --- a/src/yb/tserver/tablet_server_main.cc +++ b/src/yb/tserver/tablet_server_main.cc @@ -41,6 +41,7 @@ #endif #include "yb/consensus/log_util.h" +#include "yb/consensus/consensus_queue.h" #include "yb/encryption/header_manager_impl.h" #include "yb/encryption/encrypted_file_factory.h" @@ -68,9 +69,12 @@ #include "yb/util/size_literals.h" #include "yb/util/net/net_util.h" #include "yb/util/status_log.h" +#include "yb/util/debug/trace_event.h" #include "yb/rocksutil/rocksdb_encrypted_file_factory.h" +#include "yb/tserver/server_main_util.h" + using namespace std::placeholders; using yb::redisserver::RedisServer; @@ -163,35 +167,9 @@ int TabletServerMain(int argc, char** argv) { } else { LOG(INFO) << "Failed to get tablet's host name, keeping default metric_node_name"; } - // Do not sync GLOG to disk for INFO, WARNING. - // ERRORs, and FATALs will still cause a sync to disk. - FLAGS_logbuflevel = google::GLOG_WARNING; - - server::SkewedClock::Register(); - - // Only write FATALs by default to stderr. - FLAGS_stderrthreshold = google::FATAL; - - ParseCommandLineFlags(&argc, &argv, true); - if (argc != 1) { - std::cerr << "usage: " << argv[0] << std::endl; - return 1; - } - LOG_AND_RETURN_FROM_MAIN_NOT_OK(log::ModifyDurableWriteFlagIfNotODirect()); - LOG_AND_RETURN_FROM_MAIN_NOT_OK(InitYB(TabletServerOptions::kServerType, argv[0])); - - LOG(INFO) << "NumCPUs determined to be: " << base::NumCPUs(); - - if (FLAGS_remote_boostrap_rate_limit_bytes_per_sec > 0) { - LOG(WARNING) << "Flag remote_boostrap_rate_limit_bytes_per_sec has been deprecated. " - << "Use remote_bootstrap_rate_limit_bytes_per_sec flag instead"; - FLAGS_remote_bootstrap_rate_limit_bytes_per_sec = - FLAGS_remote_boostrap_rate_limit_bytes_per_sec; - } - - MemTracker::SetTCMallocCacheMemory(); - CHECK_OK(GetPrivateIpMode()); + LOG_AND_RETURN_FROM_MAIN_NOT_OK(MasterTServerParseFlagsAndInit( + TabletServerOptions::kServerType, &argc, &argv)); SetProxyAddresses(); diff --git a/src/yb/util/debug/trace_event.h b/src/yb/util/debug/trace_event.h index d1aeb3f96146..8264d9bbb3fd 100644 --- a/src/yb/util/debug/trace_event.h +++ b/src/yb/util/debug/trace_event.h @@ -902,7 +902,8 @@ TRACE_EVENT_API_CLASS_EXPORT extern \ #define INTERNAL_TRACE_EVENT_ADD_SCOPED(category_group, name, ...) \ INTERNAL_TRACE_EVENT_GET_CATEGORY_INFO(category_group); \ trace_event_internal::ScopedTracer INTERNAL_TRACE_EVENT_UID(tracer); \ - if (INTERNAL_TRACE_EVENT_CATEGORY_GROUP_ENABLED_FOR_RECORDING_MODE()) { \ + if (INTERNAL_TRACE_EVENT_CATEGORY_GROUP_ENABLED_FOR_RECORDING_MODE() && \ + ::yb::debug::TraceEventsEnabled()) { \ yb::debug::TraceEventHandle h = trace_event_internal::AddTraceEvent( \ TRACE_EVENT_PHASE_COMPLETE, \ INTERNAL_TRACE_EVENT_UID(category_group_enabled), \ diff --git a/src/yb/util/debug/trace_event_impl.cc b/src/yb/util/debug/trace_event_impl.cc index f6ba2aed54b1..6a5b8132eb81 100644 --- a/src/yb/util/debug/trace_event_impl.cc +++ b/src/yb/util/debug/trace_event_impl.cc @@ -61,6 +61,10 @@ using base::SpinLockHolder; using strings::SubstituteAndAppend; using std::string; +// This is used to avoid generating trace events during global initialization. If we allow that to +// happen, it may lead to segfaults (https://github.com/yugabyte/yugabyte-db/issues/11033). +std::atomic trace_events_enabled{false}; + __thread TraceLog::PerThreadInfo* TraceLog::thread_local_info_ = nullptr; namespace { @@ -2402,6 +2406,10 @@ const CategoryFilter::StringList& return delays_; } +void EnableTraceEvents() { + trace_events_enabled.store(true); +} + } // namespace debug } // namespace yb diff --git a/src/yb/util/debug/trace_event_impl.h b/src/yb/util/debug/trace_event_impl.h index 171dcd3d3e1a..44b315d0f7a7 100644 --- a/src/yb/util/debug/trace_event_impl.h +++ b/src/yb/util/debug/trace_event_impl.h @@ -738,6 +738,14 @@ class BASE_EXPORT TraceLog { DISALLOW_COPY_AND_ASSIGN(TraceLog); }; +extern std::atomic trace_events_enabled; + +inline bool TraceEventsEnabled() { + return trace_events_enabled.load(std::memory_order_relaxed); +} + +void EnableTraceEvents(); + } // namespace debug } // namespace yb diff --git a/src/yb/util/test_util.cc b/src/yb/util/test_util.cc index b3f6b9e9b374..b28577baec45 100644 --- a/src/yb/util/test_util.cc +++ b/src/yb/util/test_util.cc @@ -47,6 +47,7 @@ #include "yb/util/status_format.h" #include "yb/util/status_log.h" #include "yb/util/thread.h" +#include "yb/util/debug/trace_event.h" DEFINE_string(test_leave_files, "on_failure", "Whether to leave test files around after the test run. " @@ -75,6 +76,7 @@ YBTest::YBTest() : env_(new EnvWrapper(Env::Default())), test_dir_(GetTestDataDirectory()) { InitThreading(); + debug::EnableTraceEvents(); } // env passed in from subclass, for tests that run in-memory diff --git a/src/yb/util/trace-test.cc b/src/yb/util/trace-test.cc index a966ffdb3521..63c5a7b8df84 100644 --- a/src/yb/util/trace-test.cc +++ b/src/yb/util/trace-test.cc @@ -484,17 +484,30 @@ TEST_F(TraceEventCallbackTest, TraceEventCallback) { TraceLog::GetInstance()->SetEventCallbackDisabled(); TRACE_EVENT_INSTANT0("all", "after callback removed", TRACE_EVENT_SCOPE_GLOBAL); + const auto n = std::min(collected_events_names_.size(), collected_events_phases_.size()); + for (size_t i = 0; i < n; ++i) { + const auto& name = collected_events_names_[i]; + const auto phase = collected_events_phases_[i]; + LOG(INFO) << "Collected event #" << i << ": name=" << name << ", phase=" << phase; + } + ASSERT_EQ(5u, collected_events_names_.size()); + EXPECT_EQ("event1", collected_events_names_[0]); EXPECT_EQ(TRACE_EVENT_PHASE_INSTANT, collected_events_phases_[0]); + EXPECT_EQ("event2", collected_events_names_[1]); EXPECT_EQ(TRACE_EVENT_PHASE_INSTANT, collected_events_phases_[1]); + EXPECT_EQ("duration", collected_events_names_[2]); EXPECT_EQ(TRACE_EVENT_PHASE_BEGIN, collected_events_phases_[2]); + EXPECT_EQ("event3", collected_events_names_[3]); EXPECT_EQ(TRACE_EVENT_PHASE_INSTANT, collected_events_phases_[3]); + EXPECT_EQ("duration", collected_events_names_[4]); EXPECT_EQ(TRACE_EVENT_PHASE_END, collected_events_phases_[4]); + for (size_t i = 1; i < collected_events_timestamps_.size(); i++) { EXPECT_LE(collected_events_timestamps_[i - 1], collected_events_timestamps_[i]);