Monitoring Systems and Applications in the Cloud

To truly appreciate the simplicity of monitoring in the cloud, it helps to first separate the hype from the reality. What is the cloud? The cloud is just a collection of applications and services layered on top of the same virtualization and storage services that we’re already familiar with in the local data center. The difference is that those services are now available to all users via self-help, and implemented in very short time frames. Thus, monitoring the cloud is really no different than monitoring the systems, services, and applications that we already have.

By Lawrence Garvin, Server Monitoring Head Geek, SolarWinds

There are just more of them to monitor, and the way they’re monitored is the key difference.

Private Cloud

So let’s start simple, with our own private cloud operating from the corporate data center. Prior to implementing cloud services, IT most likely had full control over the creation and removal of hosts, virtual systems, storage, and network infrastructure. This was all orchestrated in response to specific business objectives. But now, with the cloud, these events occur on-demand, requested by entities outside the scope of IT. As such, monitoring must maintain a higher level of reactivity to events.

* Capacity Planning

In a private cloud, you’re confronted with challenges that didn’t exist in a simple data center. What should you monitor now?

Because virtual machines are created and storage resources are consumed daily in an on-demand environment, it’s critical that you’re aware of your environment’s capacity, and when it will run out.

* Granular Reporting

One of the tenets of cloud services is that usage is billed on a pay-per-use basis. No longer is IT a cost center, but departments and individuals who consume resources from the cloud are expected to fund their use. As a result, it’s necessary that cloud management and monitoring tools offer per-resource reporting.

* Reducing VM Sprawl

Just as fast as virtual machines are created and storage resources are consumed, they can be abandoned. How well you keep track of these abandoned resources will have a major impact on your capacity planning decisions.

Public Cloud

The two great challenges inherent to working with a public cloud are that it’s “over there, outside the firewall.” Some part of it may be under the control of other entities, but end-users experiencing issues with systems, services, or applications will go right to IT for help. There are different types of public cloud services that will impact how you manage or monitor them. We’re going to talk about the two most relevant.

Applications & Services

Where applications are concerned, typically referred to as Software as a Service (Saas), the focus will be on application performance. As long as the applications and services are available and performing to the expectations of the users, it’s all good. One thing to be aware of is application responsiveness. Monitoring transaction response times—how long does it take from the time the client request is received at the server to the time it takes the application to respond to the request?—is every bit as important. One of the ways this can be done is to monitor the responses by placing a synthetic client within the cloud environment. This eliminates the external network from the equation.

Generally, an IT department won’t have a lot of involvement in addressing performance issues, except when they’re a function of network issues. However, IT can get involved to monitor if need be.

Here are two important areas to monitor for hosted application and service scenarios that the I.T. department does have control over:

Network Bandwidth

Some knowledge that will be critical to ensure network bandwidth can support cloud services:
* What connection speed is available from the end user’s device?
* How many end users are sharing the available bandwidth?
* How many of those end users are working from mobile devices, thus on a wireless infrastructure, rather than wired?
* Is there redundancy in the connection availability?

Ensuring that sufficient bandwidth is installed is a critical aspect of successfully implementing public cloud services. Monitoring that bandwidth is the only way to know that it’s being used for its intended purposes, and when the available capacity is reaching its limits.

Network Latency

Latency is the amount of time it takes for data to get from point ‘A’ to point ‘B’. This truly defines the user experience of application responsiveness or cloud-hosted service. You can have gigabits/sec of bandwidth between the end user and cloud service provider, but if the data takes too long to get from one to the other, the experience will be quite unsatisfactory.

Infrastructure

Infrastructure as a Service (IaaS) is the creation of an entire virtual machine within a cloud infrastructure. Not just the application is available, but all of the underlying systems capability as well. The ability to choose the CPU, RAM, and disk configurations of the machine, the operating system, as well as services and applications, is a significant feature of IaaS. More importantly, the responsibility for all of the underlying systems comes with the privilege of choice. So, just like a host or virtual machine sitting in the corporate data center, all of the systems, components, and services associated with these virtual machines needs to be monitored. In addition to the application and service monitoring previously mentioned.

Depending on your cloud provider, some of these monitoring functions may be furnished, and you’re just a consumer reacting as necessary when the monitoring service provides significant information. In other cases, you may need to implement your own monitoring services to keep track of what’s going on with those machines. If you need to implement your own monitoring services, there will likely be an added expense for additional virtual machine resources.

Monitoring is a Must

In the end, there isn’t much difference between what you monitor when cloud services are in the mix, but there’s likely a significant difference in why you’re monitoring. In the old days with fixed implementation schedules, monitoring was more about availability and reacting when it disappeared. With cloud environments, the focus is more on actual performance and capacity planning. Cloud environments are far more dependent on high-latency networks, such as the Internet, and much more susceptible to capacity challenges, due to their rapid utilization and growth.