change triplet locations #411

bartr · 2021-02-01T18:49:56Z

to better simulate "triplets" should we put our regions closer together?
east2 and east are the closest
west2, west and westcentral
central, northcentral, westcentral
Europe and Asia have some DCs close to each other

We could add a cluster in northcentral and westcentral to test the hypothesis

Task Lists:

Create resources in West Region
- Create CosmosDB and Log Analytics in West US2
- Create AKS cluster in WestUS2
- Create AKS cluster in WestUS
- Create AKS cluster in West Central US
- Measuring FluentBit and Log Analytics latency
- Compute latency by substracting TimeGenerated from _TimeReceived.
  https://docs.microsoft.com/en-us/azure/azure-monitor/platform/data-ingestion-time
  https://docs.microsoft.com/en-us/azure/azure-monitor/platform/log-standard-columns
Create resources in Central Region
Create resources in East Region

We created three clusters following the NGSA-AKS Documentation.

Difference between preprod and test clusters:

Test clusters does not have FluxCD running
Test clusters only have ngsa-cosmos

Log analytics table

ngsa-pre-log

Kusto Query used to generate the table above (we limit data to ngsa-cosmos only):

ngsa_CL 
| where Zone_s != "DEBUG"  and k_app_s == "ngsa-cosmos"
| extend E2EIngestionLatency = ingestion_time() - TimeGenerated
| summarize count(), avg(E2EIngestionLatency), percentiles(E2EIngestionLatency,50,95) by Zone_s

triplet1-test-log

Kusto Query used to generate the table above:

ngsa_CL 
| where Zone_s != "DEBUG" 
| extend E2EIngestionLatency = ingestion_time() - TimeGenerated
| summarize count(), avg(E2EIngestionLatency), percentiles(E2EIngestionLatency,50,95) by Zone_s

Update 25 Feb:
ngsa-pre-log

ngsa_CL
| where Zone_s != "DEBUG" and k_app_s == "ngsa-cosmos"
| where Date_t between (datetime(2021-02-24T20:00:00Z) .. datetime(2021-02-24T22:00:00Z))
| extend AgentLatencyT = _TimeReceived - Date_t //_TimeReceived - TimeGenerated
| summarize count(), avg(AgentLatencyT), percentiles(AgentLatencyT,50,95) by Zone_s

triplet1-test-log

ngsa_CL
| where Zone_s != "DEBUG"
| where Date_t between (datetime(2021-02-24T20:00:00Z) .. datetime(2021-02-24T22:00:00Z)) 
//| extend AgentLatency = ingestion_time() - Date_t //_TimeReceived - TimeGenerated
| extend AgentLatencyT = _TimeReceived - Date_t //_TimeReceived - TimeGenerated
| summarize count(), avg(AgentLatencyT), percentiles(AgentLatencyT,50,95) by Zone_s

With the new query we used Date_t instead of TimeGenerated. It might be a bug with FluentBit, but FluentBit doesn't set TimeGenerated.

According to the documentation, if TimeGenerated is not set at the source, it is set at Ingestion Endpoint and is same as the hidden variable _TimeReceived. Hence using _TimeReceived - Date_t seems like a better query.

Besides we utilized _TimeReceived instead of ingestion_time() due to a feedback we got from Observability PM that reads as
"Since we batch data on ingest, you may have different records , from different regions, be batched together while ingested into Kusto"

We can update our queries to use Date_t or we can change FluentBit to copy Date_t as TimeGenerated

NOTE: Central US have

The text was updated successfully, but these errors were encountered:

atxryan · 2021-02-12T17:36:41Z

Per @kforeverisback:
Queries:
Why

Do we want to bring the regions closer together because:
1.1 We want to lower latency between cluster and services?
1.2 We want to have redundancy for log-analytics or cosmos? i.e. use paired regions for redundancy

How
2. Create log-analytics or cluster or both resources in different regions and check the latency and other metrics?
2.1 One example: Create resource in two proximally close region and one in a paired region
e.g: clusters in East US2 and Central US2 and another in West East US 2 and create and log-analytics+Cosmos in WestCentral
or we can create resource in two paired region and one proximally close region
3. Do we do something about Cosmos?
3.1 Cosmos is currently in CentralUS (proximally middle to all clusters)
3.2 Since it is in the middle (and we are only reading from it, not writing) it might be necessary to meddle with current Cosmos setup
4. Can we figure out the best regions to use without creating resources?
4.1 There might be some doc/blog/post about the latency of paired (or unpaired) regions
4.2 If there are no documents, we can write it if required

bartr added this to the M2.02 milestone Feb 1, 2021

bartr added PreProd User Story related to Epic Testing:Load EE Fundamentals: testing TestLoad User Story related to Epic labels Feb 1, 2021

atxryan mentioned this issue Feb 8, 2021

Sprint 1 Goal - PnP spike, Scaling spike, and Prometheus demo #448

Closed

21 tasks

atxryan added the FC:Scaling label Feb 10, 2021

atxryan assigned atxryan and kforeverisback and unassigned atxryan Feb 11, 2021

gortegaMS assigned PurpleBriar and gortegaMS Feb 12, 2021

This was referenced Feb 24, 2021

Sprint 2 tracking #501

Closed

Sprint 3 Tracking #527

Closed

atxryan added the BLOCKED Blocked issue or issue will span multiple sprints label Mar 3, 2021

bartr added FC:PnP-deploy and removed BLOCKED Blocked issue or issue will span multiple sprints FC:Scaling TestLoad User Story related to Epic labels Mar 12, 2021

dsturgell unassigned kforeverisback, PurpleBriar and gortegaMS Mar 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

change triplet locations #411

change triplet locations #411

bartr commented Feb 1, 2021 •

edited by gortegaMS

Loading

atxryan commented Feb 12, 2021

change triplet locations #411

change triplet locations #411

Comments

bartr commented Feb 1, 2021 • edited by gortegaMS Loading

Task Lists:

Difference between preprod and test clusters:

Log analytics table

atxryan commented Feb 12, 2021

bartr commented Feb 1, 2021 •

edited by gortegaMS

Loading