AppFirst Rolls Out Cloud Server Management for Top Cloud Providers

Amazon, SoftLayer, GoGrid and RackSpace Integration in New Version

With RackSpace’s recent acquisition of CloudKick, AppFirst, the only provider of complete data collection and proactive monitoring as a service (MaaS) solutions, now becomes the lone independent Cloud server monitoring and management solution available on the market today. Demonstrating their continued commitment to provide real-time information for cloud management and monitoring, the company will roll out a new version this week of their AppFirst solution, providing deep integration with top cloud providers Amazon, SoftLayer, GoGrid and RackSpace. Additional providers will be supported in the coming months. This announcement positions AppFirst as the only comprehensive server monitoring and management that works with any and all applications, independent of language and no matter where the servers are running — on premises, in the cloud or in any combination.

“We’ve heard loud and clear through our rapid customer adoption that a pure, independent, cloud monitoring and management platform built entirely as SaaS, is the best model to power the continued growth and success of the cloud computing industry,” said David Roth, president and CEO of AppFirst. “The RackSpace acquisition of CloudKick validates our position – managing and monitoring cloud servers is a critical business component for companies around the world. With this change in the competitive landscape, AppFirst becomes THE only independent vendor of server monitoring and management solutions available.”

The new version is built on the same technology as AppFirst Professional and AppFirst Basic and allows users to quickly add, remove and manage cloud servers directly from the AppFirst interface, which installs collectors on servers automatically. Once the collector begins capturing real-time data, each server can be managed and monitored right from the AppFirst dashboard.

AppFirst captures everything occurring in the entire application, and its Deterministic Root Cause feature allows users to drive right to the root cause on each and every incident, eliminating the hours, days and sometimes weeks it takes to track the source of an issue down.  Using patent-pending technology, AppFirst removes the need to use  statistical methods to isolate causes of performance degradation. Old methods of monitoring provide ample opportunity for the proposed solution to be wrong and have a narrow focus of polling for data on a few components of an application or are language runtime specific, which means users are not seeing everything across the application stack.

Alerting changes

By customer request we have made a change to alerting.  From now on you will receive only 1 email/SMS alert for all incidents except process down.  Process down will remain as it is, you’ll be alerted on whatever interval is specified in the alert.  If an incident reoccurs, you will be alerted again, once.

We appreciate customers giving us feedback and take it seriously.  Keep it coming!!

Commentary on Rackspace acquisition of CloudKick

Rackspace announced today that it has acquired CloudKick. I am already getting calls and emails for my thoughts on this as folks are naturally curious what this might mean for AppFirst so I thought a quick post might be a good idea. In short, we think this is great news for AppFirst.

First of all we want to extend our congratulations to Alex and the entire team at CloudKick as we think this is a great outcome for them.  Based on their server level monitoring offering, which many would estimate is around 90% of what Rackers look to monitor, and the history that Rackspace has with them going back to their launch at UTR 2009 this appears to be a solid fit.

For AppFirst, the announcement came as no surprise and we find it flattering that now Rackspace along with Amazon have already copied our free Basic Server offering.  AppFirst remains the industry’s only comprehensive cloud monitoring & management solution that works with any & all applications, independent of language no matter where the servers are running on premises, in the clouds or hybrid.

On their site & in today’s press release CloudKick is being positioned as a solution for multiple IaaS clouds (all of which are competitors to Rackspace).  To the CloudKick users that are not running their servers or cloud instances at Rackspace that have been tweeting their concerns today, my message is simple:  AppFirst is your answer and we’ll not only provide you the server monitoring you were getting but we will provide you a view of your entire application as it’s running that nobody else can show you.

We’ve heard loud and clear through our rapid customer adoption that a pure, independent, cloud monitoring & management platform built entirely as SaaS, is the best model to power the continued growth and success of the cloud computing industry. We made a decision when we founded the company in 2009 and we remain committed to that path of execution.  We think the amazing traction we’ve had since launching AppFirst last April is a testament to that vision.

And so to our customers, partners, friends and employees, you can rest assured that we will continue to be the innovation leader in this space and stay true to our vision.  We are committed to all applications performing to users expectations and enabling the technical people responsible for making this happen to be both pro-active and to drive down the price of delivering this performance at the lowest cost.  Only AppFirst provides the 100% visibility required to identify what parts of your application can you run for less.  Our customers don’t have to guess and or settle the so called “good enough” tools that provide incomplete data thus really aren’t “good enough.”

AppFirst begins a trend offering free server monitoring

About a month ago we began giving our server monitoring solution away for free. The whole idea was to demonstrate that server monitoring is critical to businesses and that AppFirst does it best. Obviously the idea hit home, because just recently, Amazon began offering its server monitoring solution for free as well. A coincidence? We think not.

Server monitoring is important – we realize that. But it needs to be done well, not just in the same old way it’s been done for years, which is how companies like Amazon are approaching it. And aside from that, what’s truly critical to a business is deeper, clearer insight into their applications – not just their servers. What application is causing the slowdown? What server is it running on? What is the exact process in that application? AppFirst is the only company that delivers that insight – doesn’t matter the OS, doesn’t matter the application and doesn’t depend on polling – with real-time monitoring and real-time answers.

A RAM-Based Data Architecture

RAM-Based Data Model

Background

Writing an reading from disk has proven to be too slow for any reasonable load given our need to process continuous and concurrent data streams. Our backend architecture uses a Ram-based data model. We utilize distributed memory based on memcached. We have found that placing data in memcached and subsequently writing data to disk asynchronously is a viable solution.

Writing data to disk sequentially is much faster than writing a large number of small files in random order.  The same is true for reads.  Random access to data on disk simply does not scale to large data models.  The amount of data collected for a viable server monitor is fairly large, moreover the application monitor data is much larger.

Our requirements for a persistence solution include:

  • Replication – It’s not feasible to do traditional backups when the data is quite large.
  • Fault Tolerance – When a server in the storage cluster goes down we should not lose data or access to the cluster.
  • Dynamic Expansion – We want the ability to add servers to a storage cluster without needing to restart.

Research Results

We have looked at numerous technologies for a persistence solution. We defined a set of tests that are meant to more or less model our production data models. In simple terms we write 1000 100KB files from 10 threads, read them back and validate the data. This is followed by 100,000 file writes of 100KB from 100 threads and read/validate. We timed the 1000 file tests. The purpose of the 100,000 files test is a basic load validation, less a timing exercise. Here are the results:

                                    Writes                     Reads

  1. NFS                  115-200 secs       8 secs
  2. MogileFS          125-130 secs       52-55 secs
  3. MongoDB         40-45 secs           40 secs
  4. Lustre              70-75 secs           14 secs
  5. Membase         74-76 secs           50 secs
  6. Cassandra        50-80 secs           50-80 secs

Things We Learned

We built an async write capability using memcached, a queue server, a write process and NFS. We started by looking at distributed file systems as an NFS replacement. We discovered that technologies like Membase and Cassandra combine a RAM based data model with async writes to a backing store that uses distributed storage. As we gained experience with Membase and Cassandra we realized that we didn’t necessarily need to build our own version of RAM storage with async backing store on a distributed file system.

We found that NFS breaks down under load. We discovered that the client pauses for a few seconds under heavy write load. There is evidence, much of it anecdotal, that NFS has issues under heavy load. We wanted to get first hand experience.

It’s clear from the data that NFS and Lustre reads are significantly faster than any other technology. We found that because these are kernel mode file systems, as opposed to user mode file systems, such as those using FUSE, for example, they exhibit much better read performance. This is due to the use of the file buffer cache managed by the kernel. This also made it clear that our tests were not at all representative of random access. Therefore, the use of the kernel’s buffer cache produced dramatic differences. We decided not to make an attempt to create realistic random file access.

In terms of a distributed file system it is clear to us that Lustre is far and away more reliable and scalable. Administration of a Lustre deployment is no more difficult than others and is far more predictable.

Conclusions

While Lustre is almost certainly the more stable and scalable technology, we have a few issues with deploying it in production; we need kernel modules on all clients and we need not only kernel modules but also a patched kernel on each server in the storage cluster. Moreover, we need to provide a RAM based solution with async backing store in front of Lustre.

We are moving forward to further evaluate the use of Membase or Cassandra. Both have a RAM based architecture and a viable distributed storage model. We will perform tests of replication, failover and dynamic expansion.

Stay tuned, we’ll fill you in on the results and our final decision.

This week at AppFirst: 12/01/10

Features:

  • Maintenance Windows – The alerting page has a new sub tab called “Maintenance Windows” where you can suppress alerts.
  • Clear alert history – The full view of the alert history widget now has a link that lets you remove all your triggered alerts.
  • Refer a Friend – You can now refer a friend to win awesome prizes!
  • Clicking on the Resolve tab now takes you directly to the page rather than to the root cause wizard first. You can still get to the root cause wizard with a new link on the options bar of Resolve.

Bug Fixes:

  • Fix an issue with displaying incorrect thread data in Data Insight.
  • Fix a time issue in the graph on the Server and Service Widgets.
  • Fix the number of servers running displayed in the Application Widget.
  • Fix error if there are process names longer than 64 characters.
  • Fixed problem on windows collector with named pipe path names.

 

 

 

Have you noticed a problem that needs fixing, or do you have a great idea for a new feature on AppFirst? Let us know! We rely on your feedback to make a product that works for everyone!