Case Study

Improved Platform Stability

Shorten Mean Time to Recover

Reduced Fault Detection Time

Enhancing Third-Party Service Reliability for a Leading Mobile App Platform

User type

Ops Team

Team facts

16 Ops.

1 location

Testany solution

Gatekeeper
Plan
Output Relay

Want to see what Testany Platform can do for your team?

Demo & Try

23,000

test pipeline executed per month

70%

Reduced Time to locale 3rd party dependency issues

The background

A renowned mobile application platform’s operations team faced dual challenges: meeting platform reliability KPIs and reducing the Mean Time to Repair (MTTR). The team relied heavily on various third-party services, and the reliability of these external services became a major concern. The unpredictability of third-party service outages had a direct impact on the overall platform’s stability and user experience.

To address these challenges, the team turned to Testany Platform, deploying automated test pipelines for their third-party services and leveraging a combination of monitoring alerts and high-frequency probing to significantly reduce the time to detect and resolve service reliability issues.

18min

Shorten Mean Time to Recover

Ready to get started?

Demo & Try

Learn about pricing >

The challenge

1. Platform Dependency on Multiple Third-Party Services: The platform’s reliance on external services created instability, especially during high traffic periods. The team faced the challenge of monitoring the availability and reliability of these third-party services in real-time.

2. Long Fault Detection and Repair Cycle: Although the platform’s monitoring system could detect issues to some extent, the fault resolution process was slow. The time to fix issues, especially those involving third-party services, was too long, affecting both user experience and business continuity.

3. Lack of Systematic Third-Party Service Monitoring: Existing monitoring tools were primarily focused on the platform’s internal services. There was no efficient or systematic approach to monitoring the health and reliability of third-party services, making it difficult to identify issues proactively.

The solution

To enhance reliability monitoring and accelerate fault resolution, the operations team implemented the Testany platform to strengthen the monitoring capabilities for third-party services. The implementation involved the following key steps:

1. Deploying Business Logic-Driven Automated Test Pipelines: The operations team deployed automated test pipelines for multiple third-party services using Testany Platform. These test pipelines go beyond traditional availability checks by simulating real business processes and covering end-to-end testing of business logic and user journeys that depend on third-party services. This ensures that the services are not only reachable but also allows the team to quickly pinpoint issues and trigger various fault recovery processes (internal or external) in the event of failures, maximizing efficiency in problem resolution.

2. Combining Monitoring Alerts with High-Frequency Probing: The operations team integrated Testany Platform with their existing monitoring system and set up an alert mechanism based on fault types. When an alert is triggered by the monitoring system, it automatically activates the corresponding Testany test pipelines based on preset conditions, allowing the team to quickly locate faults, diagnose causes, and verify whether the service has recovered. This mechanism enables faster and more accurate fault identification and allows the team to immediately validate the business functionality as soon as the service is restored, significantly reducing MTTR.

3. Automating and Accelerating Fault Response: Through Testany’s high-frequency probing, the system can immediately trigger a fault diagnosis process when a third-party service fails. The operations team can quickly pinpoint the problem and initiate repairs, greatly reducing manual intervention time and human error, which further accelerates the fault resolution process.

The results

The integration of Testany Platform resulted in substantial improvements across several areas:

• 70% Reduction in Time to Detect Third-Party Service Reliability Issues: By leveraging Testany’s automated test pipelines and real-time monitoring alerts, the platform was able to detect third-party service failures more quickly, allowing the team to react faster and prevent issues from escalating.

• 18-Minute Reduction in Mean Time to Repair (MTTR): With automated issue detection and faster diagnostics, the team significantly reduced the time it took to resolve issues. The average repair time for third-party service failures dropped by 18 minutes, minimizing the disruption to users.

• Improved Platform Stability and User Satisfaction: With the ability to proactively detect and resolve third-party service issues, the platform’s stability was greatly enhanced, particularly during high-demand periods. This led to fewer user disruptions and a notable increase in overall user satisfaction.