Skip to content
This repository has been archived by the owner on Sep 22, 2024. It is now read-only.

change triplet locations #411

Open
6 of 8 tasks
Tracked by #527
bartr opened this issue Feb 1, 2021 · 1 comment
Open
6 of 8 tasks
Tracked by #527

change triplet locations #411

bartr opened this issue Feb 1, 2021 · 1 comment
Labels
FC:PnP-deploy PreProd User Story related to Epic Testing:Load EE Fundamentals: testing
Milestone

Comments

@bartr
Copy link
Member

bartr commented Feb 1, 2021

  • to better simulate "triplets" should we put our regions closer together?
  • east2 and east are the closest
  • west2, west and westcentral
  • central, northcentral, westcentral
  • Europe and Asia have some DCs close to each other

We could add a cluster in northcentral and westcentral to test the hypothesis

Task Lists:

We created three clusters following the NGSA-AKS Documentation.

Difference between preprod and test clusters:

  • Test clusters does not have FluxCD running
  • Test clusters only have ngsa-cosmos

Log analytics table

ngsa-pre-log
image

Kusto Query used to generate the table above (we limit data to ngsa-cosmos only):

ngsa_CL 
| where Zone_s != "DEBUG"  and k_app_s == "ngsa-cosmos"
| extend E2EIngestionLatency = ingestion_time() - TimeGenerated
| summarize count(), avg(E2EIngestionLatency), percentiles(E2EIngestionLatency,50,95) by Zone_s

triplet1-test-log
image

Kusto Query used to generate the table above:

ngsa_CL 
| where Zone_s != "DEBUG" 
| extend E2EIngestionLatency = ingestion_time() - TimeGenerated
| summarize count(), avg(E2EIngestionLatency), percentiles(E2EIngestionLatency,50,95) by Zone_s

Update 25 Feb:
ngsa-pre-log
image

ngsa_CL
| where Zone_s != "DEBUG" and k_app_s == "ngsa-cosmos"
| where Date_t between (datetime(2021-02-24T20:00:00Z) .. datetime(2021-02-24T22:00:00Z))
| extend AgentLatencyT = _TimeReceived - Date_t //_TimeReceived - TimeGenerated
| summarize count(), avg(AgentLatencyT), percentiles(AgentLatencyT,50,95) by Zone_s

triplet1-test-log
image

ngsa_CL
| where Zone_s != "DEBUG"
| where Date_t between (datetime(2021-02-24T20:00:00Z) .. datetime(2021-02-24T22:00:00Z)) 
//| extend AgentLatency = ingestion_time() - Date_t //_TimeReceived - TimeGenerated
| extend AgentLatencyT = _TimeReceived - Date_t //_TimeReceived - TimeGenerated
| summarize count(), avg(AgentLatencyT), percentiles(AgentLatencyT,50,95) by Zone_s

With the new query we used Date_t instead of TimeGenerated. It might be a bug with FluentBit, but FluentBit doesn't set TimeGenerated.

According to the documentation, if TimeGenerated is not set at the source, it is set at Ingestion Endpoint and is same as the hidden variable _TimeReceived. Hence using _TimeReceived - Date_t seems like a better query.

Besides we utilized _TimeReceived instead of ingestion_time() due to a feedback we got from Observability PM that reads as
"Since we batch data on ingest, you may have different records , from different regions, be batched together while ingested into Kusto"

We can update our queries to use Date_t or we can change FluentBit to copy Date_t as TimeGenerated

NOTE: Central US have

@bartr bartr added this to the M2.02 milestone Feb 1, 2021
@bartr bartr added PreProd User Story related to Epic Testing:Load EE Fundamentals: testing TestLoad User Story related to Epic labels Feb 1, 2021
@atxryan atxryan assigned atxryan and kforeverisback and unassigned atxryan Feb 11, 2021
@atxryan
Copy link
Member

atxryan commented Feb 12, 2021

Per @kforeverisback:
Queries:
Why

  1. Do we want to bring the regions closer together because:
    1.1 We want to lower latency between cluster and services?
    1.2 We want to have redundancy for log-analytics or cosmos? i.e. use paired regions for redundancy

How
2. Create log-analytics or cluster or both resources in different regions and check the latency and other metrics?
2.1 One example: Create resource in two proximally close region and one in a paired region
e.g: clusters in East US2 and Central US2 and another in West East US 2 and create and log-analytics+Cosmos in WestCentral
or we can create resource in two paired region and one proximally close region
3. Do we do something about Cosmos?
3.1 Cosmos is currently in CentralUS (proximally middle to all clusters)
3.2 Since it is in the middle (and we are only reading from it, not writing) it might be necessary to meddle with current Cosmos setup
4. Can we figure out the best regions to use without creating resources?
4.1 There might be some doc/blog/post about the latency of paired (or unpaired) regions
4.2 If there are no documents, we can write it if required

This was referenced Feb 24, 2021
@atxryan atxryan added the BLOCKED Blocked issue or issue will span multiple sprints label Mar 3, 2021
@bartr bartr added FC:PnP-deploy and removed BLOCKED Blocked issue or issue will span multiple sprints FC:Scaling TestLoad User Story related to Epic labels Mar 12, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FC:PnP-deploy PreProd User Story related to Epic Testing:Load EE Fundamentals: testing
Projects
None yet
Development

No branches or pull requests

5 participants