The Value of Context in Web Scale IT

Frequently in the world of monitoring we hear about the miraculous solutions provided from a single agent. How the world can be solved with transaction tracing, user experience monitoring and understanding the performance of the software pipeline as it applies to business.

All of these are great starting points, but today I’ll go through the case where the software pipeline is affected by the environment in which it lives, and application monitoring only shows the symptoms.

For example, my dashboard this morning:

AppFirst Dashboard Frontend Response Time Spike

The specific item in question, “Frontend Response Time” has a high value for a period this morning.

Frontend response time spike

And as I look into this value, it becomes evident that this is being caused by something external to the software stack.

As a quick background, the eCommerce solution I’m running (magento) uses files to keep track of each user as they access the site. A normal activity which isn’t excessively risky. However, due to this activity, thousands of session files exist which quickly eat up all available inode space if no action is taken.

Session files east up all available inode space is no action is taken

Now, performing normal maintenance, removing these excessive files is easy enough. However, a simple ‘rm -f’ command is insufficient (too many items). This is easily remedied with xargs:

Removing excessive files with xargs

However, execution like this takes time. Time that we can see with the data we’ve collected from the system, and graphed in-line with the web stack performance data.

System and web stack performance graph

The blue line is the ecommerce response time, the purple line the system disk utilization, and the orange line is the number of files being accessed by xargs. You can see the clear correlation between running the file removal on the response time, as well as the reduction in storage space. As soon as the files have been removed, the response time recovers.

If you’d like to stop treating symptoms and see how the environmental context affects your business, sign up for a free 30 day trial at http://www.appfirst.com/signup/.

It’s Here: The New AppFirst UI

We’ve been working very hard in recent weeks to prepare for the release of our updated user interface here at AppFirst. We are excited about all the new features and over the next few months we will add even more functionality and flexibility to our product.

I would like to share some of the new features available with the new user interface. There are many changes and improvements that have been made throughout the UI and we’ll touch on some of them in this blog post.

For starters we have incorporated responsive design to many of our views and are moving toward a full HTML5 implementation. With the plethora of devices on the market today we wanted a better experience for our customers across all devices. Additionally, we have improved several of the workflows to make them more intuitive and easier to work with.

So what are some of the new features? Here are some of the highlights:

Autodetect

Autodetect is a great new feature for our users. AppFirst has always supported some level of autodetection, however the new Autodetect is much more integrated into the user interface with “opt in” and “opt out” options at every step. For every software component we find, Autodetect can find and configure alerts, log sources, polled data items, progress groups, and dashboards.

First-time users will use Autodetect when they first install a collector. However, Autodetect is also available through Admin so users can run it at any time on any number of servers they choose. We are really excited about this new feature and we think our customers will be as well.

Dashboards

Customers have always loved our dashboards. Now we’ve taken them to the next level with new features customers have been asking for. For example, the ability to duplicate dashboards, copy widgets from one dashboard to another, copy widgets within a dashboard, and a new feature called TV Mode. TV Mode allows customers to display their favorite dashboards on a large centralized TV within their office or network operations center. Because some computer monitors are on the small side, TV Mode allows you to display more widgets than will fit in the normal view. Simply choose how many widgets to view and how often to rotate views and TV mode will rotate through all your widgets in an “ad rotator” fashion.

Dashboard Widget Market

Our new Widget Market is where users will find all of the widgets that can be added to dashboards. Widgets provide small templated views of specific metrics, whether it’s system metrics, log metrics, process metrics, or any other data source in AppFirst.

Many of our widgets now support an extra level of functionality as well.

  • The ability to drill down with Correlate in context of the data from that widget.
  • Using the Polled Data Metrics widget, you can create a more detailed view of specific application components, such as Apache, IIS, Oracle, and so on. We call this the “Widget Detail” view.

The market is still in it’s early stages and we plan to add many more widgets over the coming weeks. Stay tuned for updates!

Servers

The Servers view has also been redesigned to quickly and easily provide more information about all your servers, VMs, or cloud instances. Like many of the views in the new user interface, there are more options for searching, filtering, and paging large numbers of servers. Multiple metrics are shown for multiple servers and users can toggle between a list-view and a grid-view. Clicking on a single server displays a wide range of details in a tabbed view format.

Consolidated Alerts

Alert History and Alert Status have been combined into a single new Alerts view, establishing a logical workflow to investigate alerts. Use the visual bar chart to easily identify the volume and criticality of alerts over time. You can then dive into the time interval you want to investigate and view the alerts that triggered at that point in time. We included a Resolve button so you can clear alerts of their critical or warning status, indicating to your team that things are now OK.

Administration

Virtually everything in Admin has been logically changed or redesigned. This new design offers a streamlined way for users to find and configure Admin options, including adding new users and collectors, organizing servers and process groups, and access to the new Autodetect feature. Users will also like the new Setup area for configuring alerts, logs, devices, and polled data items.

First Time User Experience

New users will now be guided through a much more complete experience when installing their first collector. Collector options have been expanded from Linux and Windows to include other operating systems like Solaris, FreeBSD, and AIX. Additionally, we’ve included links to easily access our integrations with configuration management tools like Chef and Puppet.

After installing the first collector, users will head to Autodetect where alerts, logs, polled data items, process groups, and even dashboards can be configured automatically. Now you can spend less time considering how and what data to collect in your environment.

 

So that’s a quick tour of some of the new changes and some insights into the usability improvements. We are excited about all the new changes and are always interested in hearing from you about what you like and what you think we could improve upon. Let us know your thoughts!

*Note – current customers will have the ability to return to the classic interface for the next several weeks to help with the transition.

Now in Beta: AppFirst’s New UI

Today AppFirst is announcing the release of our new user interface into public beta. In this beta, the goal was to bring visibility of your applications and infrastructure to the forefront, allowing you to:

  • Redefine how you approach troubleshooting
  • Integrate and visualize other data sets for complete visibility
  • Group all application components to increase transparency and accountability for application-specific service levels

Therefore, we’ve made some big improvements in the UI functionality in addition to new feature releases.

Dashboard Widget Market

We’ve implemented a Widget Market to streamline the creation of Dashboards. In addition to the enhanced UI, we’ve developed three new Dashboard widgets, allowing you to better visualize your key metrics.

Dashboard TV Mode

We’ve developed a Dashboard TV Mode designed for display on large monitors for your whole team. See it in action below.

Admin Market

All of our partner integrations can now be configured through our new Admin Market. Easily connect your other tools into AppFirst for a complete visualization into every component in your environment. Supported tools right now include CloudWatch, AppDynamics, New Relic, and PagerDuty. Many more to come.

Process Groups

Applications have been renamed to Process Groups in AppFirst. With an additional 5 Community Templates added, you can group more of your application components with preconfigured templates. For example, use our Apache Community Template to group all Apache processes across any number of servers (could be one server, could be one million). Once set up, that Apache Process Group will automatically detect and collect those processes that match the Template rules (httpd and apache). You no longer have to worry about telling your agents what to look for. AppFirst does it automatically.

Updated Navigation through AppFirst

You’ll notice that we changed the horizontal navigation to a vertical navigation, giving you more real estate to view your data. You’ll also notice we’ve implemented breadcrumb navigation in deeper parts of AppFirst, making it easier to find your way around.

Try today by clicking the Beta link at the top of our product or by clicking here. The future user interface of AppFirst is in your hands (literally), so let us know what you think. Feedback is always looked at and considered for future iterations. And stay tuned for more in-depth posts on our new UI in the coming weeks.

Big Data Growing Pains with HBase

Over the last week there has been a lot going on at AppFirst. With the data delays on the console, many of our users have been feeling it as well.

We’ve built a robust pipeline to handle the data from thousands of servers streaming data into our backend. We’ve learned a tremendous amount about the limitations of various software including Nginx, RabbitMQ, Redis and now, HBase.

HBase is the core for the large volume data store for our public SaaS offering, and over the last month we’ve been reaching some disk usage limits that have required frequent maintenance and capacity management. This is nothing new in the operations world, but what we’ve learned is HBase doesn’t take as nicely to server removals as we believed.

During the last week we’ve been upgrading capacity and performing data maintenance with more regularity. Adding storage should be transparent to HBase. However, in the process of taking a node down, adding disks, and rejoining the cluster, the change causes enough of an event to back up our queues.

We are actively working to find new ways to improve this flow and minimize the impact to our customers. It is important to note that while there was a delay in data processing to our backend, there was no data loss – our queueing systems managed this condition as expected. Once HBase was back online, it handled data storage very well.

Additionally, over the weekend, we found that our implementation of Zookeeper was performing a significant amount of disk read/write activity. As an in-memory distributed coordination and synchronization tool, disk I/O is a serious concern when it comes to operations.

When examining the source of this disk I/O, we found that our Zookeeper was configured to use 9 nodes. Apache recommends that Zookeeper run in a 3, 5, or 7 node configuration. As you can see from the graph, once we updated the configuration to run on a 5-node setup, disk I/O fell dramatically.

We are aware of the impact the configuration and maintenance has on our users and infrastructure. We have been developing a new storage model that improves data maintenance capability and we continue to analyze the impact of system configurations. These changes will result in a much smoother experience.

Achieving Web-Scale in the Enterprise

On Tuesday, AppFirst CEO David Roth and Claus Moldt, CEO of m>Path and former Global CIO of SalesForce.com, held a Q&A, “The Modern Data Center: Metrics for Achieving Web-Scale and IT as a Service.”

Claus shared his best practices for achieving business agility through web-scale IT from his tenure at SalesForce.com and EBay.

View the slides below:

Click here to view the webinar on-demand where Claus Moldt walks through how he built out a web-scale framework at SalesForce and EBay.

Join us on January 9 for a Live Webinar

We will be hosting a live webinar on Thursday, January 9 at 2:00PM EST titled Complete Visibility Across Your IT Ecosystem. In this webinar, systems engineer Michael Forhan will walkthrough the AppFirst product and explain how you can use our patented Miss Nothing data collection for continuous and complete visibility into your application’s topology and resource utilization throughout your entire IT ecosystem.

Once equipped with the right tools and knowledge, you’ll be able to:

  • Solve issues before users notice
  • Optimize costs through metric-driven capacity planning
  • Achieve IT as a Service

Register now, invite your friends, and step up your IT game with AppFirst.

Will the Infrastructure Move Live Up to the Promise? Potentially. But First, Know Your Application.

While talking to customers, we hear their stories of virtualizing or moving their applications to the cloud with the promise of better performance and/or reduced cost. All too often, that’s not the case. Users, both internal and external, are complaining that your application isn’t performing like it used to. Virtualization or moving to the cloud first starts with understanding the resources your application needs to perform its best.

We define an application as the full application system. This means all processes that make up the application, the infrastructure the application relies on, and the networks that the application utilizes. By collecting data around the full application system, you’re able to get a true application footprint.

Why is this beneficial?

Because there’s only one thing worse than investing time and money in virtualization or the cloud only to find out that it doesn’t work the way it was promised: Hearing about it from your users.

Your application’s system has many moving parts. They’re no longer static apps that have to be updated only once a year. The infrastructure that your application runs on affects how it performs.

  • If you’re running in a public cloud, there’s a good chance you’re sharing resources with other applications.
  • If you’re moving to a virtual environment, you need to make sure what you’re virtualizing still plays nice with the rest of your IT environment.
  • If you’re an agile shop and making weekly pushes (if not quicker), you need to know how your application continues to evolve and what your dynamic application requires.

Now more than ever it’s important to understand what resources your application needs to perform at its best.

Sizing your Application

Knowing how much of a resource hog your application is (application footprint) gives you the data you need to size your application. Sizing your application is important for a few reasons:

  • Migrations: Whether you’re moving to the cloud or a virtualized environment, it’s necessary to understand what your applications need before you make the move. Sure they perform well on your local servers, but what’s going to happen when they’re sharing resources with stranger applications? What happens when they’re running on a VM and have different resource requirements?
     
    When doing a migration, understanding what your application needs is specific to you. All applications are not created equal and all applications are unique. See below for an example of how our Apache application performs from one week to the next.

  • Fixing Failed Migrations: On the other side of the coin, you could have recently moved to a cloud or virtual environment with the promise of better application performance. But now that you’ve invested in this new infrastructure, your application is performing worse than it was before and you’re hearing it from your users. The question to ask yourself then is, “Did we right-size our application?” or “Were we prepared to make the investment in this migration?”
     
    Knowing your application footprint is key in enabling you to size your application for the resources it needs to perform. Understanding the behavior of your applications will make a migration much smoother, lead to less headaches, and less sleepless nights.
  •  

  • Cost Efficiency: Knowing what resources your applications need allows you to properly size them for whatever infrastructure you use (or move to). Graph your application over time and drill down into all the running processes to understand which processes are culprits of performance degradation. Not only can you see how this performs over time, but you can see how to improve your applications based on poor running processes.

How does this work in AppFirst?

We define applications by grouping any number of processes across any number of servers. This means, for example, that your “Apache” application would include all your running httpd processes. And if a new process spun up that matched the criteria for your apache application, it would be automatically added so you can have a continuous view of your application’s performance.

AppFirst Process List

Graphing your applications using our Correlate tool allows you to not just visualize the performance of your apps, but drill down to the process-level and understand which processes on which servers are the culprit of performance degradation. We believe it’s important to build a solution in which both the application performance data and the data about what caused the application performance degradation are in the same place. That way you have all the information at your fingertips — current data, and historical data.

AppFirst Correlation

Business Critical Apps in Cloud or Virtual Environments

Best said in Bernd Harzog’s white paper on “The Need for Application Operations in the Dynamic Data Center and the Cloud:”

It is essential that organizations ensure the performance of these applications. The teams that own the virtual infrastructure will not be allowed to virtualize business critical and performance critical applications unless the performance (response time) of these applications can be assured. Attempting to infer the performance of applications that run in virtualized and cloud based environments will not work. This is why a solution that directly measures the response time of the application is needed.

This is especially true for any application that runs in a public cloud, as the cloud vendor cannot or will not provide infrastructure performance metrics that prove the quality and speed of the infrastructure. Therefore, both virtualized and cloud based applications will require modern, agile, self-configuring solutions that measure response time end-to-end across distributed environments that (in the case of clouds) span organizational boundaries.

Here are the capabilities and benefits that AppFirst can provide for these business critical applications. To read more, check out the full white paper by Bernd Harzog.

Top 2013 IT as a Service Resources

As we close the doors on another year, we’d like to share our top list of resources we’ve found. A bit different from our list on DevOps resources last year, this year, we’re bringing you a list of the top IT as a Service resources. They’re the best reads, presentations, and thought leadership pieces from top people and companies in the industry.

Without further ado, here’s what to read to get ready for 2014.

1. The Three Transformations of ITaaS

Thought leader and then EMC Global Marketing CTO Chuck Hollis (now VMware Chief Strategist) breaks down the three transformations of ITaaS: The Infrastructure Transformation, the Operations Transformation, and the Application Transformation.

2. Vision from the Top 2013: David Roth, AppFirst

AppFirst’s CEO and fearless leader David Roth shares his thoughts on the future of Enterprise IT, federated cloud architectures, and the ecosystem needed to support it.

3. IT as a Service and the Future of Private Clouds

Citrix and CIO.com teamed up to publish a white paper on their findings from a recent IDG Research Services survey and the trends they’re noticing in the industry. The result? IT leaders must embrace new architectures to become service providers to their organizations.

4. IT Evolution: Today and Tomorrow

During VMworld last month, VMware released insights from its 2013 Journey to IT as a Service Survey, based on feedback from more than 1,000 CIOs and other IT decision makers. What’d they find out? IT organizations are moving through three stages of transformation: IT Productivity, Business Productivity, and ITaaS.

5. IT as a Service: A Work in Progress

In another survey with IDG Research Services, this time EMC and VMware worked with CIO.com to publish a white paper on how ITaaS is working for IT leaders across the globe. The paper goes into details on their challenges, progress, and payoff.

In addition, here’s our top blog posts from this year.


AppFirst’s Top Blog Posts from 2013

1. Best Practices for Managing HBase in a High Write Environment

2. Managing IT Lifecycle with Continuous Collection

3. AppFirst and StatsD

4. Why Application Operations is needed in the Dynamic Data Center and the Cloud

5. Engine Yard Partners with AppFirst for Monitoring and Alerting Solution

6. An Early Look Into Our Newest Log Applications: Log Search and Log Watch

7. AppFirst and Nagios Plugins: The Real Scoop

8. AppFirst Summer Intern Competition 2013

9. Halloween Infographic and Horror Story Winners

10. EMA Radar Report Identifies AppFirst As Value Leader In Advanced Performance Analytics

 

Share your favorite reads or other pieces of content in the comments and make sure to have a happy New Year!