Skip to content

Commit

Permalink
[teammgrd]: Improve LAGs cleanup on shutdown: send SIGTERM directly t…
Browse files Browse the repository at this point in the history
…o PID. (#1841)

This PR is intended to fix LAGs cleanup degradation caused by python2.7 -> python3 migration.
The approach is to replace `teamd -k -t` call with the raw `SIGTERM` and add PID alive check.
This will make sure the `teammgrd` is stopped only after all managed processes are being killed.

resolves: #8071

**What I did**
* Replaced `teamd -k -t` call with raw `SIGTERM`
* Added PID alive check

**Why I did it**
* To fix LAGs cleanup timeout issue caused by python2.7 -> python3 upgrade

**How I verified it**
1. Configure 64 LAG RIFs
2. Reload config
  • Loading branch information
nazariig authored Sep 17, 2021
1 parent 002bb1d commit fbdcaae
Show file tree
Hide file tree
Showing 3 changed files with 43 additions and 9 deletions.
45 changes: 40 additions & 5 deletions cfgmgr/teammgr.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -112,18 +112,53 @@ void TeamMgr::doTask(Consumer &consumer)
}
}


void TeamMgr::cleanTeamProcesses()
{
SWSS_LOG_ENTER();
SWSS_LOG_NOTICE("Cleaning up LAGs during shutdown...");
for (const auto& it: m_lagList)

std::unordered_map<std::string, pid_t> aliasPidMap;

for (const auto& alias: m_lagList)
{
std::string res;
pid_t pid;

{
std::stringstream cmd;
cmd << "cat " << shellquote("/var/run/teamd/" + alias + ".pid");
EXEC_WITH_ERROR_THROW(cmd.str(), res);

pid = static_cast<pid_t>(std::stoul(res, nullptr, 10));
aliasPidMap[alias] = pid;

SWSS_LOG_INFO("Read port channel %s pid %d", alias.c_str(), pid);
}

{
std::stringstream cmd;
cmd << "kill -TERM " << pid;
EXEC_WITH_ERROR_THROW(cmd.str(), res);

SWSS_LOG_INFO("Sent SIGTERM to port channel %s pid %d", alias.c_str(), pid);
}
}

for (const auto& cit: aliasPidMap)
{
//This will call team -k kill -t <teamdevicename> which internally send SIGTERM
removeLag(it);
const auto &alias = cit.first;
const auto &pid = cit.second;

std::stringstream cmd;
std::string res;

SWSS_LOG_NOTICE("Waiting for port channel %s pid %d to stop...", alias.c_str(), pid);

cmd << "tail -f --pid=" << pid << " /dev/null";
EXEC_WITH_ERROR_THROW(cmd.str(), res);
}

return;
SWSS_LOG_NOTICE("LAGs cleanup is done");
}

void TeamMgr::doLagTask(Consumer &consumer)
Expand Down
2 changes: 0 additions & 2 deletions cfgmgr/teammgr.h
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,6 @@ class TeamMgr : public Orch
ProducerStateTable m_appLagTable;

std::set<std::string> m_lagList;
std::map<std::string, pid_t> m_lagPIDList;

MacAddress m_mac;

Expand All @@ -50,7 +49,6 @@ class TeamMgr : public Orch
bool setLagMtu(const std::string &alias, const std::string &mtu);
bool setLagLearnMode(const std::string &alias, const std::string &learn_mode);
bool setLagTpid(const std::string &alias, const std::string &tpid);


bool isPortEnslaved(const std::string &);
bool findPortMaster(std::string &, const std::string &);
Expand Down
5 changes: 3 additions & 2 deletions cfgmgr/teammgrd.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ int main(int argc, char **argv)
}

while (!received_sigterm)
{
{
Selectable *sel;
int ret;

Expand All @@ -91,7 +91,8 @@ int main(int argc, char **argv)
catch (const exception &e)
{
SWSS_LOG_ERROR("Runtime error: %s", e.what());
return EXIT_FAILURE;
}

return -1;
return EXIT_SUCCESS;
}

0 comments on commit fbdcaae

Please sign in to comment.