Availability

Overview

Availability describes whether a product or service is operational and accessible when users need it. In business terms, it answers a simple question: can people use the service right now? In technical terms, it reflects whether the required systems, dependencies, and interfaces are functioning within expected limits.

Availability is related to, but not the same as, uptime and reliability:

Availability focuses on whether the service is usable at a given time.
Uptime usually refers to the amount of time a system is running without interruption.
Reliability describes how consistently a system performs correctly over time.

The scope of availability can vary. It may apply to a full service, a specific application, a component, or an entire environment.

Why availability matters

High availability supports business continuity, user satisfaction, and operational stability. When a service is unavailable, users may be unable to complete tasks, internal teams may lose productivity, and customer trust may be affected.

Availability is commonly tracked to:

Measure service quality and customer experience
Support service-level objectives and agreements
Identify recurring operational issues
Track the impact of incidents and maintenance

How availability is calculated

A simple way to calculate availability is:

Availability = available time / total time

It is usually reported as a percentage over a defined measurement period, such as a day, month, or quarter.

Example:

If a service was available for 719 hours out of 720 hours in a month, availability is 99.86%.

Depending on the policy, some periods may be excluded from the calculation, such as planned maintenance windows or approved downtime. Always confirm the measurement rules used for your environment.

What affects availability

Availability can be reduced by many factors, including:

Infrastructure failures
Application errors or crashes
Network issues
Database or storage problems
Failures in upstream or downstream dependencies
Problematic deployments or configuration changes
Planned maintenance
External services and third-party integrations

Even if your core system is healthy, a dependency outage can still make the service unavailable to users.

Monitoring availability

Availability is typically monitored using metrics, dashboards, alerts, and health checks.

Metrics: Track successful requests, failed requests, response codes, and service checks.
Dashboards: Show current status and trends over time.
Alerts: Notify teams when availability drops below a threshold.
Health checks: Confirm whether critical endpoints or components are responding as expected.

Use reliable data sources and monitor at intervals that match the service criticality. When reviewing data, look for:

Sudden drops that may indicate an incident
Short spikes that may reflect transient issues
Partial outages affecting only some users, regions, or features

Improving availability

Availability can often be improved through both technical and operational practices.

Redundancy: Remove single points of failure.
Failover: Automatically switch to healthy systems when a component fails.
Scaling: Ensure capacity is sufficient for demand.
Backups: Protect data and support recovery after failures.
Testing: Validate recovery procedures, failover, and deployment changes.
Incident response: Detect, triage, and resolve issues quickly.
Change management: Review changes before release and use maintenance windows when needed.

Troubleshooting low availability

If availability drops, start by checking the most likely causes.

Review recent incidents and deployment history.
Check logs, error rates, and health check results.
Verify the status of dependencies and third-party services.
Confirm the monitoring configuration and selected time range.
Determine whether the issue is full, partial, or intermittent.

If the cause is not clear or the issue is ongoing, escalate according to your support process and contact support if needed.

Best practices

Define clear SLOs and SLAs for critical services.
Review availability trends on a regular basis.
Document maintenance windows, incidents, and recovery actions.
Use consistent measurement rules across teams and reports.
Track both overall availability and component-level availability where relevant.

Uptime
Downtime
Reliability
Resilience
SLA
SLO
Incident