Will active service assurance test traffic impact the user experience?
As with many things, the answer is “it depends”. If you have a 2Mbps connection and you configure a probe to send 2Mbps continuously through it, then yes, there will be a big impact. There are two main types of tests.
- Stress / Activation tests. These should be user impacting, as their purpose is to fully test a connection to the design limit. These tests should not of course, be run when users are active. They are strictly time limited tests, they run, finish and show results.
- Continuous Monitoring. These are designed to co-exist with users. Typically, you can think of a test agent as adding an extra user on the network.
Configured suitably, active service assurance will not have a noticeable effect on user traffic.
How can active service assurance show me the real user experience when it is based on test traffic?
It is true that network analytics looks at real user traffic and can measure quality KPIs of that traffic, thus giving a detailed picture of what the user experienced. Only network analytics can tell you what users are actually doing, who is downloading the most data, who is the top skype user, who uses Fortnite and so on.
Active service assurance however, IS real user traffic, it is just created by software and not an actual user. Therefore, by measuring quality KPIs of this software generated user traffic, you get exactly the same user experience view. The critical difference between the two is that every time you find a problem and solve it using active assurance, you have saved the user experience of real users AND avoided a possible support escalation.
How much will the active assurance test traffic cost from my cloud provider?
If you are thinking of either performing Cloud Connectivity testing , or longer term Cloud monitoring, then you will incur fees for the test traffic. Of course, you need to check the charging structure of your own cloud provider, but typically they charge for traffic leaving the cloud. At the time of writing, these costs are sub 15p per GByte transferred (for AWS and depending on volume levels) and are generally reducing year on year. Most tests do not create large volumes of traffic. They are divided into roughly two types:
- Stress testing. This can be at high data rates, to check a connection can reach 100Mbps or 1Gbps for example, however, the tests are time limited and not usually repeated frequently, perhaps only on first commissioning, or at weekends to re-confirm a capability. Given the low number of times the test runs and their short duration, data volumes created are low.
- Continuous monitoring. This traffic can be continuous, for example a 10kbps UDP stream, to constantly measure loss, latency and jitter. Other tests might be towards http or dns services, to check performance. This type of continuous testing is designed by run during operation and adds small data volumes and hence low cloud throughput fees.
For most users, cloud fees for active assurance test traffic will be less than the price of a coffee each day.
How much will it cost to host a probe in my public cloud?
The Netrounds test agent is optimized for a small footprint. It needs only a single vCPU a minimum of 256MB RAM and 2GB storage. Most cloud providers will charge a comparatively low fee for this minimal amount of resource requirement, but you should check with your provider to find out the expected costs.
How much does active service assurance cost?
There are two different cost models. Either a SaaS delivered service, which is a pure OpEx model, or a fully purchased solution with a mix of initial CapEx and ongoing support fees. Normally the SaaS model is more attractive commercially, with the exceptions being if a large deployment (i.e more than 50 to 100 probes) is planned, or for corporate policy reasons, when a SaaS system is simply not allowed.
There are a few factors to consider that will determine the cost: scale and features required. Each probe you wish to deploy has a fee associated with it. The fee will also depend on whether you need to purchase hardware, or only software. In the later case, you will need to provide your own suitable platform (see the platforms question below).
The next pricing factor involves the features that you need. Typically, additional fees are charged for more specialised tests, such as for Video, advanced turn up tests such as RFC 6349 or Y.1564, Wi-fi, or 4G. More basic tests, such and UDP, TCP, HTTP, DNS and VoIP are included by default. Please contact Frame if you have a specific test in mind and want to know if it is a default feature or not.
Along with the probe count, the total number of simultaneous test streams you need to run will affect the price. A minimum system has 500 streams and further purchases are made in blocks of 500.
- SaaS is the better option for “smaller” scale solutions (less than between 50 and 100 probes)
- Price is based on the number of probes, features and total test streams.
- Platform costs for the probe needs to be considered, either dedicated HW, or virtualised.
- Cloud related fees from your cloud provider (i.e. if deploying probes into a cloud) are likely the lowest factor, but should be factored in.
Who are the main active service assurance solution providers?
Active service assurance is a large and quite crowded market, so there is no shortage of choice. For some larger vendors, active service assurance is an add-on extra to a main offering, for others, they focus in particular market segments:
- Cisco / ThousandEyes – Enterprise focused
- Cisco / IPSLA – basic functionality
In addition to the main global players mentioned above, there are many smaller vendors serving their local geographic markets. Big is not always beautiful, its always a good idea to look outside the industry giants, to find innovation from often more nimble vendors.
What platforms are supported by the active service assurance probe software?
Key to the success of an active assurance solution is the ability to deploy a test probe almost anywhere. A hardware platform would typically need a minimum quad core Xeon CPU, 4GB RAM and 16GB of storage, in addition to suitable interface cards and ports. A virtual deployment requires 1 vCPU, 256MB RAM (512MB ideally) and 2 GB storage.
A laptop or other hardware unit that can boot from a USB stick can often be used as a probe, sometimes on a temporary basis.
How can we try active service assurance?
A trial can be setup quickly and easily, often a simple system can be ready within one day. Once an account on the SaaS platform is created for you, you will be able to download the probe software and deploy it on your platform. From there, its simple to use the GUI to create a first test and get started. You will find you can try more functionality by deploying two or more probes, so you can run tests between them. Ideally, one in the “site / cloud under test” and another in a different location. Frame will be available to assist in the whole process.
The trial system (and the production SaaS) is hosted in AWS (in Europe), so the test agent probes you deploy need to be able to reach public IP addresses.
Does active service assurance work within GDPR rules?
GDPR concerns personal data, such as names, email, IP addresses, credit card details and so on. The Test agents generate their own synthetic network traffic and don’t have access to real user data. No personal data is accessed, processed or transferred by the testing.
I already have network analytics, why do I need active service assurance?
Network analytics is a powerful solution, for example to look at a small office network, or to cover many offices and inspect network traffic remotely. However, there are some fundamental differences, often the two solutions compliment each other, giving a more complete solution.
Network analytics inspects user traffic and reports many KPIs on that traffic. This is very useful, especially for tracking quality metrics and usage over time in order to plan capacity upgrades. Service Assurance on the other hand generates its own traffic and inspects that. Critically, this means that it is possible to see problems in the test traffic, before users experience the same issue. In active service assurance, testing is proactive and continuous, analytics solutions are reactive and only work when there is suitable user traffic to analyse.
In many ways active service assurance enhances a network analytics solution. The traffic created artificially by the active assurance solution can appear in the analytics solution and be part of the overall monitoring view that Operations look at.
How difficult is it to deploy active service assurance?
Its not uncommon for a simple deployment to be completed within a week. For example, to test data centre connectivity, the steps after a SaaS purchase would be:
- Login to your account and download the software
- Deploy the probe in your datacentre
- Run some basic config, to ensure it can reach your account (simple command line actions)
- The probe will start up, automatically call home and appear in your account.
- Give the probe a meaningful label, to make later selection easier, i.e. “DC01-probe”
- Use the GUI to configure your first test
- Repeat as necessary to deploy further probes and configure more tests.
Most customers can deploy a solution without help, but assistance is available. For more advanced integrations, such as to an automation framework such as Jenkins / Ansible or an orchestrator like Cisco’s NSO, detailed documentation and professional services are available.
Why monitor public clouds when they are totally reliable?
There are a few parts to the answer, to this great question.
Firstly, it’s not completely true, whilst public cloud providers do put a huge amount of effort into reliability and uptime, outages and degradations do occur. All the main cloud providers give a dashboard of availability, for example: AWS, Azure and Google compute.
Secondly, even if your cloud provider is running smoothly – does that guarantee your services are ok? Unfortunately not, which is why VNF monitoring at the service level is important. By verifying the performance of each of your VNF’s in a service chain, you have the best chance to keep your service chain running optimally.
Lastly, what actually matters is the end to end performance, not just the performance in the cloud. You cloud provider will typically not be responsible for the last mile to your end users. This cloud connectivity link in the chain, can have a major impact on end to end performance. If you cannot determine if a problem is in the cloud OR between your location and the cloud, you are at a big disadvantage when troubleshooting.
I have monitoring in place today, why should I add active service assurance?
The vast majority of network operators today, still rely on tried and trusted management or performance based KPIs, such as CPU load, free memory, link utilisation, disk space and so on. These parameters tell you about the health of the platforms used in a network. The long-held assumption was that if the platform was ok, then the service it was delivering was also ok.
Anyone who has spoken to a network operations engineer will know that a bank of green lights on an operations dashboard does NOT mean that all users are happy. Service impacting issues can occur even on platforms that are at low load and appearing to be running smoothly. Misconfigurations, incorrect software patches or default builds left on network elements will not show up as management errors. Only by testing the service can you see these issues.
Active Service Assurance adds a critical additional layer of monitoring:
- Fewer errors leak out into the user domain.
- Customer escalations are reduced.
- Operations work more strategically and focus on improvement instead of firefighting.
- Customer satisfaction increases.
- Network operations engineers work satisfaction increases.
How does active service assurance impact Mean Time to Repair (MTTR)?
Lets image two scenarios. Firstly Company A without active service assurance and secondly, Company B with.
Company A have management and performance monitoring in place, which all reports ok, but still receive calls from employees complaining about poor voice quality on conference calls held in various parts of the office. In fact the employees have been having problems for many days, but it has now got too bad to ignore. Now the network engineer starts to look, if he has an analytics solution, he can start to dig through historical data, trying to look for conference call traffic at times the employees mentioned they had problems. He finds some traffic, but there is no clear trend, some traffic is perfect, some has higher packetloss, but its not clear if that is the root cause. And on he goes, hunting for the needle in the haystack. MTTR = Initial weeks of poor quality before users complained + 1 day(?) to find root cause (much longer without a network analytics solution) + access point installation time.
Company B have active assurance in place, which is setup to continuously monitor key network parameters such as loss, latency and jitter in every office, including via wifi access points. Monday morning, an alert goes off due to latency rising above a threshold at two wifi access points in two different offices. The network engineer looks at real time analytics and sees traffic spikes to the same wifi access points related to OS updates. Other access points are coping ok and latency is still acceptable. He has established root cause of a service impacting issue BEFORE ANY USERS COMPLAIN. Too many users are connected to the access points in question, an additional one will be installed. MTTR = 1 hour + access point installation time.
How can active service assurance test an SD-WAN link when it is tunnelled?
Testing SD-WAN presents some unique challenges for any testing solution, including active service assurance. The SD-WAN tunnel does of course create a security barrier if you tried to inject traffic into the tunnel directly, however a test probe will inject traffic before the SD-WAN, just as a normal user would. The SD-WAN edge will then put the test traffic into the tunnel in the usual way.
SD-WAN controllers will usually allow different priorities of traffic, from real time, through to best effort. To test this behaviour, the probe can generate VoIP traffic (usually classified as real time) and then attempt to load the SD-WAN link with best effort UDP or TCP traffic. Correct behaviour should be that the VoIP traffic will always get through and the best effort traffic will suffer if capacity is reached. This type of stress testing should usually only be performed during turn up testing, to fully test the SD-WAN link ahead of production use.