Terminology

Latency: time spent waiting to be serviced
- Overloaded term (sometimes it can refer to response time)
Response time: time for an operation to complete (waiting time/latency + service time)
Saturation: amt/degree of queued work
Bottleneck: limiting factor (like reagent) for the system
Workload: Input to the system or load applied, jobs
Cache: Faster storage area tier

PA: Analysis –> detect bottleneck –> find bottleneck –> solve bottleneck –> analysis

Time scales

Utilization:

Time based: E[time server was busy]
1. $U = \frac{B}{T}$; U = utilization; B = total time system busy over T the observation period
Capacity based: System/component’s ability to deliver amount of throughput
1. Proportion of system/component’s resources currently working
2. @100% capacity-based utilization; saturation has been reached. <100%: no worries

Caching

Perspectives in System Performance

Resource Analysis Perspective [by sysadmin]
1. Start @devices (resource level)
2. Includes perf issue investigations, and capacity planning
3. Demand supply
Workload Analysis Perspective [by devs]
1. Targets
  1. Requests (workload applied)
  2. Latency (response time)
  3. Completion (error rate)
2. Metrics
  1. Throughput
  2. Latency

No clear methodology until recently!

Tools (OS specific application tools)
1. List avail. perf tools
2. For tool T, list useful metrics
3. For metric M, list ways of interpretation
USE (Utilization - Saturation - Errors)
1. Resources: CPU/RAM/NIC/STORAGE/ACCELERATORS
2. Some rscs cannot be fully monitored?
3. Machine health
RED (Request rate - Errors - Duration)
1. Usu. cloud services/microservice. Check per svc.
2. User health
Workload characterzation
1. Inputs rather than resultant perf.
2. Who causes laod
3. Why load called
4. What are attributes of the load
  1. IOPS/Throughput/Direction (R/W).
  2. Include the variance when possible/appropriate
5. How load changing over time
Monitoring
1. perfstats over time.
2. For capacity planing/quantifying growth/peak usage
3. Time series: Historic values (time-based patterns)

Availability:

This is only relevant for fixed amount of resources (c.f. 15 years ago?):

Nowadays we have automatic scaling.

Provisioning for resources have 3 main strategies:

Capacity of s system:

throughput and response time

Service-Level agreements (SLA):

determination of what app users can expect for
- response time
- throughput
- system avail
- reliability
tie IT costs to SLA.
if these are not met, pay fine etc.

Inputs and Outputs of the system

Only via observation. Measurements can be taken; existing knowledge (e.g. another system).

We refine this model until validation/calibration passes.

Data collection: How do we determine param values for basic components?

No data collection facilities available:
1. Benchmark (Synthetic workload e.g. locust.io)
2. Industry practice
3. Rules of thumb (ROTs)
Some:
1. All of the above and measurements
All/Detailed:
1. Use measurements only