Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1060: Catching up a few bug fixes #852

Merged
merged 3 commits into from
Mar 15, 2024

Conversation

baemyung
Copy link
Contributor

A few number of bugs (which could be critical) have been fixed, and it is catching up into downstream.
They are

edtanous and others added 3 commits March 12, 2024 10:10
nlohmann::json::clear() has different behavior dependent on what the
underlying object is, rather than doing the expected behavior of
completely clearing the json object.  This didn't matter because of a
similar bug in http_connection that relied on nlohmann:json::empty()
which is ALSO type dependent, so these worked.

Unfortunately, in 02e01b5 we wanted to
allow empty objects, and this bug was exposed.

There are two places where clear() is used, once in Response, which is
clearly not the intent, which is to reset the object to the original
constructed state.  The other place we call clear is in Manager, where
we use it to clear incremental results.  That was a previous best practice that has been eliminated everywhere else (now we return as
many results with the error as we are able).  It has been removed.

Tested: Logging into the webui in firefox no longer core dumps.

Signed-off-by: Ed Tanous <edtanous@google.com>
Change-Id: Ic89a037b30fb40c0e6eaeb939cae6e006dd0ffac
http_client is not handling connection termination by server due to
keep alive timeout. At present client is not aware of connection
termination from server. So whenever next redfish is event ready to be
sent, the client will try send/receives data over broken connection.
After failed operation the client will try to restart the connection
by closing the current connection.

Problems:
1) Restart is not attempted on all failure paths.
Eg: stream_truncated error was ignored, which usually happens when try
to read from broken connection, due to which retry is never performed.

2) Ssl shutdown over broken connection often fails to call the shutdown
callback

3) ssl session was reused for new connection attempt. Which is wrong

Solution:

This patch will try to reattempt the connection in all failure cases.

It uses new socket object and  new ssl session for the retries

Tested by:

Test normal event flow between redfish-event clients and the BMC
Test failure event flow between redfish-event clients and the BMC
Tested the bad path by keeping the setup idle for 3 hours on the above
two setups. Verified the events flow after this idle time

Change-Id: I3d725b9d77bea22e2e8860e01ee0dfc971789008
Signed-off-by: Abhilash Raju <abhilash.kollam@gmail.com>
Signed-off-by: Ed Tanous <ed@tanous.net>
When we call a null weakptr, it will cause a crash.
Add nullptr check for weakptr can avoid this situation.

Tested:
bmcweb.service did not experience core-dump.

Change-Id: I4490d68c70ea5d43681f4fb18b3859afb01ed70a
Signed-off-by: Zhao Gang <zhaogang.0108@bytedance.com>
@gtmills
Copy link
Contributor

gtmills commented Mar 14, 2024

@baemyung Do we think this fixes the core dump we are seeing ?

@raviteja-b
Copy link
Contributor

@baemyung have you verified basic event flow with this PR with HMC connected?

@raviteja-b raviteja-b self-requested a review March 14, 2024 04:41
@@ -380,6 +380,16 @@ class ConnectionInfo : public std::enable_shared_from_this<ConnectionInfo>
unsigned int respCode = parser->get().result_int();
BMCWEB_LOG_DEBUG << "recvMessage() Header Response Code: " << respCode;

// Handle the case of stream_truncated. Some servers close the ssl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we take the latest code from upstream? It does not uses socket pointer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For 1060, we can't rebase with upstream but we try to catch up some potentially critical bug fixes if there are.
However, this particular lines are already synced with upstream - related to socket handling.

@@ -153,7 +153,7 @@ struct Response
{
BMCWEB_LOG_DEBUG << this << " Clearing response containers";
stringResponse.emplace(response_type{});
jsonValue.clear();
jsonValue = nullptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jsonValue is an object not a pointer right? should't it be like jsonValue=nlohmann::json();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both ways can be used - nlohmann/json#2156
Here, we're trying to sync up with what upstream does - https://github.com/openbmc/bmcweb/blob/9970e93f794d0110de436828775e333e81330ad5/http/http_response.hpp#L194

@baemyung
Copy link
Contributor Author

@baemyung have you verified basic event flow with this PR with HMC connected?

I have asked the tester of 595145 to try it out,and waiting for the result.

@baemyung
Copy link
Contributor Author

@baemyung Do we think this fixes the core dump we are seeing ?

I don't know at this time, but these seem critical ones to resolve regardless.

@gtmills
Copy link
Contributor

gtmills commented Mar 15, 2024

Let's get this in. We can do something more later

@rfrandse rfrandse merged commit b5d73b0 into ibm-openbmc:1060 Mar 15, 2024
1 check passed
@baemyung baemyung deleted the 1060-catchup-bug-fixes branch March 21, 2024 01:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants