Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Metrics status #889

Merged
merged 6 commits into from
Aug 22, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 19 additions & 14 deletions design/KruizePromQL.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ The following are the available Kruize APIs that you can monitor:
- `listRecommendations` (GET): API for listing recommendations.
- `listExperiments` (GET): API for listing experiments.
- `updateResults` (POST): API for updating experiment results.
- `updateRecommendations` (POST): API for updating recommendations for an experiment.

## Time taken for KruizeAPI metrics

Expand All @@ -21,14 +22,15 @@ To monitor the performance of these APIs, you can use the following metrics:

Here are some sample metrics for the mentioned APIs which can run in Prometheus:

- `kruizeAPI_seconds_count{api="createExperiment", application="Kruize", method="POST"}`: Returns the count of invocations for the `createExperiment` API.
- `kruizeAPI_seconds_sum{api="createExperiment", application="Kruize", method="POST"}`: Returns the sum of the time taken by the `createExperiment` API.
- `kruizeAPI_seconds_max{api="createExperiment", application="Kruize", method="POST"}`: Returns the maximum time taken by the `createExperiment` API.
- `kruizeAPI_seconds_count{api="createExperiment", application="Kruize", method="POST", status="success"}`: Returns the count of successful invocations for the `createExperiment` API.
- `kruizeAPI_seconds_count{api="createExperiment", application="Kruize", method="POST", status="failure"}`: Returns the count of failed invocations for the `createExperiment` API.
- `kruizeAPI_seconds_sum{api="createExperiment", application="Kruize", method="POST", status="success"}`: Returns the sum of the time taken by the successful invocations of `createExperiment` API.
- `kruizeAPI_seconds_max{api="createExperiment", application="Kruize", method="POST", status="success"}`: Returns the maximum time taken by the successful invocation of `createExperiment` API.

By changing the value of the `api` and `method` label, you can gather metrics for other Kruize APIs such as `listRecommendations`, `listExperiments`, and `updateResults`.

Here is a sample command to collect the metric through `curl`
- `curl --silent -G -kH "Authorization: Bearer ${TOKEN}" --data-urlencode 'query=kruizeAPI_seconds_sum{api="listRecommendations", application="Kruize", method="GET"}' ${PROMETHEUS_URL} | jq` :
- `curl --silent -G -kH "Authorization: Bearer ${TOKEN}" --data-urlencode 'query=kruizeAPI_seconds_sum{api="listRecommendations", application="Kruize", method="GET", status="success"}' ${PROMETHEUS_URL} | jq` :
Returns the sum of the time taken by `listRecommendations` API.

Sample Output:
Expand Down Expand Up @@ -64,15 +66,17 @@ Sample Output:

The following are the available Kruize DB methods that you can monitor:

- `addRecommendationToDB`: Method for adding a recommendation to the database.
- `addResultsToDB`: Method for adding experiment results to the database.
- `loadAllRecommendations`: Method for loading all recommendations from the database.
- `loadAllExperiments`: Method for loading all experiments from the database.
- `addExperimentToDB`: Method for adding an experiment to the database.
- `loadResultsByExperimentName`: Method for loading experiment results by experiment name.
- `addResultToDB`: Method for adding experiment results to the database.
- `addBulkResultsToDBAndFetchFailedResults`: Method for adding bulk experiment results to the database and fetch the failed results.
- `addRecommendationToDB`: Method for adding a recommendation to the database.
- `loadExperimentByName`: Method for loading an experiment by name.
- `loadAllResults`: Method for loading all experiment results from the database.
- `loadResultsByExperimentName`: Method for loading experiment results by experiment name.
- `loadRecommendationsByExperimentName`: Method for loading recommendations by experiment name.
- `loadRecommendationsByExperimentNameAndDate`: Method for loading recommendations by experiment name and date.
- `addPerformanceProfileToDB`: Method to add performance profile to the database.
- `loadPerformanceProfileByName`: Method to load a specific performance profile.
- `loadAllPerformanceProfiles`: Method to load all performance profiles.

## Time taken for KruizeDB metrics

Expand All @@ -84,14 +88,15 @@ To monitor the performance of these methods, you can use the following metrics:

Here are some sample metrics for the mentioned DB methods which can run in Prometheus:

- `kruizeDB_seconds_count{application="Kruize", method="loadAllExperiments"}`: Number of times the `loadAllExperiments` method was called.
- `kruizeDB_seconds_sum{application="Kruize", method="loadAllExperiments"}`: Total time taken by the `loadAllExperiments` method.
- `kruizeDB_seconds_max{application="Kruize", method="loadAllExperiments"}`: Maximum time taken by the `loadAllExperiments` method.
- `kruizeDB_seconds_count{application="Kruize", method="addExperimentToDB", status="success"}`: Number of successful invocations of `addExperimentToDB` method.
- `kruizeDB_seconds_count{application="Kruize", method="addExperimentToDB", status="failure"}`: Number of failed invocations of `addExperimentToDB` method.
- `kruizeDB_seconds_sum{application="Kruize", method="addExperimentToDB", status="success"}`: Total time taken by the `addExperimentToDB` method which were success.
- `kruizeDB_seconds_max{application="Kruize", method="addExperimentToDB", status="success"}`: Maximum time taken by the `addExperimentToDB` method which were success.

By changing the value of the `method` label, you can gather metrics for other KruizeDB metrics.

Here is a sample command to collect the metric through `curl`
- `curl --silent -G -kH "Authorization: Bearer ${TOKEN}" --data-urlencode 'query=kruizeDB_seconds_sum{method="loadRecommendationsByExperimentName"}' ${PROMETHEUS_URL} | jq` :
- `curl --silent -G -kH "Authorization: Bearer ${TOKEN}" --data-urlencode 'query=kruizeDB_seconds_sum{application="Kruize", method="loadRecommendationsByExperimentName", status="success"}' ${PROMETHEUS_URL} | jq` :
Returns the sum of the time taken by `loadRecommendationsByExperimentName` method.

Sample Output:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,7 @@ public void init(ServletConfig config) throws ServletException {

@Override
protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
String statusValue = "failure";
Timer.Sample timerCreateExp = Timer.start(MetricsConfig.meterRegistry());
Map<String, KruizeObject> mKruizeExperimentMap = new ConcurrentHashMap<String, KruizeObject>();;
try {
Expand Down Expand Up @@ -112,8 +113,10 @@ protected void doPost(HttpServletRequest request, HttpServletResponse response)
ExperimentDAO experimentDAO = new ExperimentDAOImpl();
addedToDB = new ExperimentDBService().addExperimentToDB(validAPIObj);
}
if (addedToDB.isSuccess())
if (addedToDB.isSuccess()) {
sendSuccessResponse(response, "Experiment registered successfully with Kruize.");
statusValue = "success";
}
else {
sendErrorResponse(response, null, HttpServletResponse.SC_BAD_REQUEST, addedToDB.getMessage());
}
Expand All @@ -127,7 +130,10 @@ protected void doPost(HttpServletRequest request, HttpServletResponse response)
LOGGER.error("Unknown exception caught: " + e.getMessage());
sendErrorResponse(response, e, HttpServletResponse.SC_INTERNAL_SERVER_ERROR, "Internal Server Error: " + e.getMessage());
} finally {
if (null != timerCreateExp) timerCreateExp.stop(MetricsConfig.timerCreateExp);
if (null != timerCreateExp) {
MetricsConfig.timerCreateExp = MetricsConfig.timerBCreateExp.tag("status", statusValue).register(MetricsConfig.meterRegistry());
timerCreateExp.stop(MetricsConfig.timerCreateExp);
}
}
}

Expand Down
103 changes: 55 additions & 48 deletions src/main/java/com/autotune/analyzer/services/ListExperiments.java
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,7 @@ public void init(ServletConfig config) throws ServletException {

@Override
protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
String statusValue = "failure";
Timer.Sample timerListExp = Timer.start(MetricsConfig.meterRegistry());
response.setStatus(HttpServletResponse.SC_OK);
response.setContentType(JSON_CONTENT_TYPE);
Expand All @@ -120,63 +121,69 @@ protected void doGet(HttpServletRequest request, HttpServletResponse response) t
invalidParams.add(param);
}
}
if (invalidParams.isEmpty()) {
// Set default values if absent
if (results == null || results.isEmpty())
results = "false";
if (recommendations == null || recommendations.isEmpty())
recommendations = "false";
if (latest == null || latest.isEmpty())
latest = "true";
// Validate query parameter values
if (isValidBooleanValue(results) && isValidBooleanValue(recommendations) && isValidBooleanValue(latest)) {
try {
// Fetch experiments data from the DB and check if the requested experiment exists
loadExperimentsFromDatabase(mKruizeExperimentMap, experimentName);
// Check if experiment exists
if (experimentName != null && !mKruizeExperimentMap.containsKey(experimentName)) {
error = true;
sendErrorResponse(
response,
new Exception(AnalyzerErrorConstants.APIErrors.ListRecommendationsAPI.INVALID_EXPERIMENT_NAME_EXCPTN),
HttpServletResponse.SC_BAD_REQUEST,
String.format(AnalyzerErrorConstants.APIErrors.ListRecommendationsAPI.INVALID_EXPERIMENT_NAME_MSG, experimentName)
);
}
if (!error) {
// create Gson Object
Gson gsonObj = createGsonObject();

// Modify the JSON response here based on query params.
gsonStr = buildResponseBasedOnQuery(mKruizeExperimentMap, gsonObj, results, recommendations, latest, experimentName);
if (gsonStr.isEmpty()) {
gsonStr = generateDefaultResponse();
try {
if (invalidParams.isEmpty()) {
// Set default values if absent
if (results == null || results.isEmpty())
results = "false";
if (recommendations == null || recommendations.isEmpty())
recommendations = "false";
if (latest == null || latest.isEmpty())
latest = "true";
// Validate query parameter values
if (isValidBooleanValue(results) && isValidBooleanValue(recommendations) && isValidBooleanValue(latest)) {
try {
// Fetch experiments data from the DB and check if the requested experiment exists
loadExperimentsFromDatabase(mKruizeExperimentMap, experimentName);
// Check if experiment exists
if (experimentName != null && !mKruizeExperimentMap.containsKey(experimentName)) {
error = true;
sendErrorResponse(
response,
new Exception(AnalyzerErrorConstants.APIErrors.ListRecommendationsAPI.INVALID_EXPERIMENT_NAME_EXCPTN),
HttpServletResponse.SC_BAD_REQUEST,
String.format(AnalyzerErrorConstants.APIErrors.ListRecommendationsAPI.INVALID_EXPERIMENT_NAME_MSG, experimentName)
);
}
if (!error) {
// create Gson Object
Gson gsonObj = createGsonObject();

// Modify the JSON response here based on query params.
gsonStr = buildResponseBasedOnQuery(mKruizeExperimentMap, gsonObj, results, recommendations, latest, experimentName);
if (gsonStr.isEmpty()) {
gsonStr = generateDefaultResponse();
}
response.getWriter().println(gsonStr);
response.getWriter().close();
statusValue = "success";
}
response.getWriter().println(gsonStr);
response.getWriter().close();
} catch (Exception e) {
LOGGER.error("Exception: " + e.getMessage());
e.printStackTrace();
sendErrorResponse(response, e, HttpServletResponse.SC_INTERNAL_SERVER_ERROR, e.getMessage());
}
} catch (Exception e) {
LOGGER.error("Exception: " + e.getMessage());
e.printStackTrace();
sendErrorResponse(response, e, HttpServletResponse.SC_INTERNAL_SERVER_ERROR, e.getMessage());
} finally {
if (null != timerListExp) timerListExp.stop(MetricsConfig.timerListExp);
} else {
sendErrorResponse(
response,
new Exception(AnalyzerErrorConstants.APIErrors.ListRecommendationsAPI.INVALID_QUERY_PARAM_VALUE),
HttpServletResponse.SC_BAD_REQUEST,
String.format(AnalyzerErrorConstants.APIErrors.ListRecommendationsAPI.INVALID_QUERY_PARAM_VALUE)
);
}
} else {
sendErrorResponse(
response,
new Exception(AnalyzerErrorConstants.APIErrors.ListRecommendationsAPI.INVALID_QUERY_PARAM_VALUE),
new Exception(AnalyzerErrorConstants.APIErrors.ListRecommendationsAPI.INVALID_QUERY_PARAM),
HttpServletResponse.SC_BAD_REQUEST,
String.format(AnalyzerErrorConstants.APIErrors.ListRecommendationsAPI.INVALID_QUERY_PARAM_VALUE)
String.format(AnalyzerErrorConstants.APIErrors.ListRecommendationsAPI.INVALID_QUERY_PARAM, invalidParams)
);
}
} else {
sendErrorResponse(
response,
new Exception(AnalyzerErrorConstants.APIErrors.ListRecommendationsAPI.INVALID_QUERY_PARAM),
HttpServletResponse.SC_BAD_REQUEST,
String.format(AnalyzerErrorConstants.APIErrors.ListRecommendationsAPI.INVALID_QUERY_PARAM, invalidParams)
);
} finally {
if (null != timerListExp) {
MetricsConfig.timerListExp = MetricsConfig.timerBListExp.tag("status", statusValue).register(MetricsConfig.meterRegistry());
timerListExp.stop(MetricsConfig.timerListExp);
}
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,7 @@ public void init(ServletConfig config) throws ServletException {

@Override
protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
String statusValue = "failure";
Timer.Sample timerListRec = Timer.start(MetricsConfig.meterRegistry());
response.setContentType(JSON_CONTENT_TYPE);
response.setCharacterEncoding(CHARACTER_ENCODING);
Expand Down Expand Up @@ -197,6 +198,7 @@ protected void doGet(HttpServletRequest request, HttpServletResponse response) t
checkForTimestamp,
monitoringEndTimestamp);
recommendationList.add(listRecommendationsAPIObject);
statusValue = "success";
} catch (Exception e) {
LOGGER.error("Not able to generate recommendation for expName : {} due to {}", ko.getExperimentName(), e.getMessage());
}
Expand Down Expand Up @@ -233,7 +235,10 @@ public boolean shouldSkipClass(Class<?> clazz) {
e.printStackTrace();
sendErrorResponse(response, e, HttpServletResponse.SC_INTERNAL_SERVER_ERROR, e.getMessage());
} finally {
if (null != timerListRec) timerListRec.stop(MetricsConfig.timerListRec);
if (null != timerListRec) {
MetricsConfig.timerListRec = MetricsConfig.timerBListRec.tag("status", statusValue).register(MetricsConfig.meterRegistry());
timerListRec.stop(MetricsConfig.timerListRec);
}
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@
import io.micrometer.core.instrument.Timer;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import io.micrometer.core.instrument.Timer;

import javax.servlet.ServletConfig;
import javax.servlet.ServletException;
Expand Down Expand Up @@ -79,6 +80,8 @@ public void init(ServletConfig config) throws ServletException {
*/
@Override
protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
String statusValue = "failure";
Timer.Sample timerBUpdateRecommendations = Timer.start(MetricsConfig.meterRegistry());
try {
// Get the values from the request parameters
String experiment_name = request.getParameter(KruizeConstants.JSONKeys.EXPERIMENT_NAME);
Expand Down Expand Up @@ -163,12 +166,11 @@ protected void doPost(HttpServletRequest request, HttpServletResponse response)
KruizeObject kruizeObject = mainKruizeExperimentMAP.get(experiment_name);
new ExperimentInitiator().generateAndAddRecommendations(kruizeObject, experimentResultDataList, interval_start_time, interval_end_time);
ValidationOutputData validationOutputData = new ExperimentDBService().addRecommendationToDB(mainKruizeExperimentMAP, experimentResultDataList);
if (validationOutputData.isSuccess())
if (validationOutputData.isSuccess()) {
sendSuccessResponse(response, kruizeObject, interval_end_time);
else {

statusValue = "success";
} else {
sendErrorResponse(response, null, HttpServletResponse.SC_INTERNAL_SERVER_ERROR, validationOutputData.getMessage());

}
} catch (Exception e) {
e.printStackTrace();
Expand All @@ -188,6 +190,11 @@ protected void doPost(HttpServletRequest request, HttpServletResponse response)
LOGGER.error("Exception: " + e.getMessage());
e.printStackTrace();
sendErrorResponse(response, e, HttpServletResponse.SC_INTERNAL_SERVER_ERROR, e.getMessage());
} finally {
if (null != timerBUpdateRecommendations) {
MetricsConfig.timerUpdateRecomendations = MetricsConfig.timerBUpdateRecommendations.tag("status", statusValue).register(MetricsConfig.meterRegistry());
timerBUpdateRecommendations.stop(MetricsConfig.timerUpdateRecomendations);
}
}
}

Expand Down
Loading
Loading