Web-Scale IT in the Enterprise

Enterprise IT organizations are turning to a portfolio of SaaS, PaaS and hybrid cloud services to quickly respond to the business at a lower price point while improving service levels. Yet a new level of transparency and data precision is required to effectively deliver on this promise. Only AppFirst delivers a miss nothing approach to collecting data across the entire IT ecosystem. With this foundational data set it is now possible for organizations to achieve:

  1. A steady-state high-definition view of your end-to-end application infrastructure as compared to the still- picture model that was state-of-the-art with legacy technologies.
  2. Significantly drive down costs while ensuring business critical service levels.
  3. A deterministic view of the n-tier application environment, eliminating the need to make probabilistic assumptions to manage the system.
  4. The correct data view to fully leverage automation and predictive analytics technologies.

This brief discusses the key learnings that Claus Moldt, former Global CIO for Salesforce.com took away from addressing these challenges. The need for a new approach is reiterated by the following Gartner research note on Building a Modern APM Architecture for the World of Web Scale IT.

Read the following report on Web-Scale IT in the Enterprise.

Achieving Web-Scale in the Enterprise

On Tuesday, AppFirst CEO David Roth and Claus Moldt, CEO of m>Path and former Global CIO of SalesForce.com, held a Q&A, “The Modern Data Center: Metrics for Achieving Web-Scale and IT as a Service.”

Claus shared his best practices for achieving business agility through web-scale IT from his tenure at SalesForce.com and EBay.

View the slides below:

Click here to view the webinar on-demand where Claus Moldt walks through how he built out a web-scale framework at SalesForce and EBay.

Top 2013 IT as a Service Resources

As we close the doors on another year, we’d like to share our top list of resources we’ve found. A bit different from our list on DevOps resources last year, this year, we’re bringing you a list of the top IT as a Service resources. They’re the best reads, presentations, and thought leadership pieces from top people and companies in the industry.

Without further ado, here’s what to read to get ready for 2014.

1. The Three Transformations of ITaaS

Thought leader and then EMC Global Marketing CTO Chuck Hollis (now VMware Chief Strategist) breaks down the three transformations of ITaaS: The Infrastructure Transformation, the Operations Transformation, and the Application Transformation.

2. Vision from the Top 2013: David Roth, AppFirst

AppFirst’s CEO and fearless leader David Roth shares his thoughts on the future of Enterprise IT, federated cloud architectures, and the ecosystem needed to support it.

3. IT as a Service and the Future of Private Clouds

Citrix and CIO.com teamed up to publish a white paper on their findings from a recent IDG Research Services survey and the trends they’re noticing in the industry. The result? IT leaders must embrace new architectures to become service providers to their organizations.

4. IT Evolution: Today and Tomorrow

During VMworld last month, VMware released insights from its 2013 Journey to IT as a Service Survey, based on feedback from more than 1,000 CIOs and other IT decision makers. What’d they find out? IT organizations are moving through three stages of transformation: IT Productivity, Business Productivity, and ITaaS.

5. IT as a Service: A Work in Progress

In another survey with IDG Research Services, this time EMC and VMware worked with CIO.com to publish a white paper on how ITaaS is working for IT leaders across the globe. The paper goes into details on their challenges, progress, and payoff.

In addition, here’s our top blog posts from this year.

AppFirst’s Top Blog Posts from 2013

1. Best Practices for Managing HBase in a High Write Environment

2. Managing IT Lifecycle with Continuous Collection

3. AppFirst and StatsD

4. Why Application Operations is needed in the Dynamic Data Center and the Cloud

5. Engine Yard Partners with AppFirst for Monitoring and Alerting Solution

6. An Early Look Into Our Newest Log Applications: Log Search and Log Watch

7. AppFirst and Nagios Plugins: The Real Scoop

8. AppFirst Summer Intern Competition 2013

9. Halloween Infographic and Horror Story Winners

10. EMA Radar Report Identifies AppFirst As Value Leader In Advanced Performance Analytics


Share your favorite reads or other pieces of content in the comments and make sure to have a happy New Year!

Halloween Infographic and Horror Story Winners

With Halloween just a few days away, we’re here to give an early scare of our own! First, check out our Monitor Horror Story Infographic: The Haunted House of Application Performance! You’ll find some scary monitoring statistics and the monitoring costume party of IT.

And then, check out the winners of our Horror Story contest. Hopefully these scary tales won’t ruin the true spirit of Halloween!

Application Monitoring Infographic

Make sure a light’s on when you’re reading these horror stories.

First Place Story

When monitoring is out and away, the ghouls are at play
Company: Random Software Company

So I am hired as a Sr. Network engineer for this company. From my first day working there, without any knowledge of the infrastructure, I am asked to fix tons of little gremlins in the network and their systems. From day one I hear complaints on slowness, connection issues, application disconnect issues, vpn issues, and more. At this point there had not been a network person on staff for over a year. They had one person in their IT staff that had no time to do anything other than fix printers and user applications.

During my first week there I started looking over the network, web servers, and back end database servers. At this point it is very clear that they have no monitoring in place at all. No way to tell if servers are over utilized or if databases are functioning properly. They also have no monitoring on any of the network devices. It is my second week in that I am hit hard with the realization that there is just nothing in place yet and they want me to fly in like a superhero and fix all their problems in the blink of an eye.

So my first task is to implement some sort of monitoring to allow me to get some insight into the inter-workings of the network and systems in the company. I evaluated a lot of different products that week including AppFirst. Right away I found a ton of problems including the fact that the MPLS network was set up in such a way that all traffic coming in was forced from the entry point from the internet located in the Midwest, over to California to a very small remote office, and then back to the location in the Midwest. This was causing all traffic for the websites to hit our firewall in the Midwest then travel to California and then travel back to the Midwest to hit the web servers for no reason at all. Fixing this issue alone solved all of the application issues.

“…they want me to fly in like a superhero and fix all their problems in the blink of an eye.”

From there, I focused on the web servers and back end servers to see what other problems were occurring. I noticed right away that multiple servers had very little or no hard drive space left, which was causing issues. I also noticed that resources were being over utilized on our database machines as well. I quickly put an action plan together to resolve those issues as well.

To make a very long story short, I have been at this company for about two months now and have fixed a ton of problems going from no monitoring at all to what we have in place now.

Second Place Story

Server ghouls haunt bulk ingestion
Company: Roshvert

I was suppose to monitor an Ingestion Server that was performing a bulk ingestion through an EC2 instance with around 200 GB of data to be ingested to another server.

Since it was a huge amount of data and the ingestion would take another day to complete, I kept the ingestion going and the logs were performing well. I decided then that I’d log in early tomorrow morning to check the ingestion status. During this time, the log files were supposed to be created automatically through the ingestion and the name of the log file for any particular day should be log_dd-mm-yyyy.txt with date of that day mentioned. It was a staging server and the code was supposed to be supplied for UAT in a day or two.

I logged in early the next morning to check the ingestion status. I was totally puzzled as I couldn’t make out what was happening:

  • The log file for the previous day log_27-08-2013.txt was showing everything went well until 11pm midnight and no logs thereafter.
  • The log file for today log_28-08-2013.txt got created with no data in it.
  • The ingestion process was running with no errors.
  • The server logs showed no errors.
  • The system never went down.
  • Nearly 150 GB of data was still to be ingested and was not progressing at all.
  • None of the logs showed any updates as to why the ingestion was not progressing.

Since the delivery was urgent, I stopped the ingestion on the instance and restarted it. To my horror, the ingestion was not progressing at all. I tried running ingestion on other instances, and it worked fine.

Then something hit me, and I went back to check the logs of ingestion. The ingestion logs still showed nothing with 0 kb space used by the logs. Wait!!! Space? 0kb? 150 GB data still remaining?

I immediately checked the disk space and found zero space available. Whoaa!!!

What actually happened is while performing the ingestion, the server created a duplicate copy of the data on the same instance, and until the entire ingestion completes, this data used to remain there. Around 250 GB of disk space was used by ingestion by midnight and the disk was full. I immediately attached a bigger volume to the instance and restarted the ingestion. Thankfully it was complete in a few hours and that saved me from a big trouble!!!

Third Place Story

The unknown ghoul is still on the loose
Company: Overno

We’re a Nagios shop and every now and then we get alerts for machines that simply don’t exist. We’ve paid people to check our configurations and no one can find anything wrong.

The latest server ghost appeared a few nights ago and every admin was paged for a machine that was down. We have a relatively new admin who flipped out because he thought we added a new server that he couldn’t reach. He got the datacenter on the phone immediately to have them look into it — needless to say the technician thought we were out of our minds for making them look into a server that never existed. We checked the other machines, thinking that wires had crossed somewhere and they were all fine.

So we still don’t have any idea what even triggered Nagios to alert everyone.

Have any scary stories you want to share before the holiday? Let us know in the comments!

Watch the OpenStack/AWS API Debate from our last OpenStack NYC Meetup

As you may (or may not) know, AppFirst is the head organizer for the OpenStack NYC Meetup. Along with the rest of the organizers, we’ve thrown together some superb Meetups recently. Back on September 12, we hosted another installment of the popular OpenStack/AWS API debate with Randy Bias (Cloudscaling), Nati Shalom (Gigaspaces), and Alex Freedland (Mirantis). Moderating the discussion was Dave McCrory (Warner Music Group).

Lucky for us (and you), our good friends at G33ktalk recorded the Meetup and made the video available for us. You can watch the debate here. It’s some pretty great stuff, not gonna lie.

And make sure to join us for our next Meetup on October 8! Rob Hirschfeld, Sr Distinguished Cloud Architect and community-elected OpenStack Foundation board member will be discussing the OpenStack Core definition.

Make Sure Application Performance Is Not Hurting Your Business

In another example from a sports-heavy weekend, we take a look into the impact of performance in the life of fantasy sports.

For all who play fantasy sports, Sunday is the most important day of the week. For baseball, it’s the last day to hold off an impeding comeback or make a push for that win (especially now, during the Fantasy Baseball playoffs). And for football, when your eyes are not locked on tv screens or wings and beer, they’re laser-focused on your own personal Sunday rosters.

So when Yahoo Fantasy Sports, the leading fantasy sports platform (full disclosure: I’m a big Yahoo Fantasy Sports user) goes down on the most important day of the week, you better make sure you resolve that issue FAST.

Yahoo Fantasy Sports was down Sunday for about 15 minutes according to my experience. That could easily make or break your week.

Here’s what Yahoo’s users had to say while they were down:

The first rush of tweets:

As you can see in the top tweet, sentiments of leaving Yahoo for a competitor have already started.

Then more:

You can see a retweet of Yahoo’s status in that stream, but the longer you’re down, the worse it gets. Next up on the bad tweets for Yahoo comes a tweet for an infographic about the login times and outages for Yahoo Fantasy Sports and their competitors.

And by the time Yahoo is back, this user nails the point home. It really is #toolittletoolate.

Performance is everything nowadays. Because there is so much choice in the marketplace, vendor lock-in isn’t as prevalent as it once was. You need to delight you users by giving them an amazing experience. Because if your website or application falters, your competitors are only one click away.

Why Application Operations is needed in the Dynamic Data Center and the Cloud

Massive changes are occurring to how applications are built and how they are deployed and run. The benefits of these changes are:

  1. Dramatically increased responsiveness to the business (business agility)
  2. Increased operational flexibility, and
  3. Reduced operating costs.

The environments onto which these applications are deployed are also undergoing a fundamental change. Virtualized environments offer increased operational agility, which translates into a more responsive IT Operations organization. Cloud Computing offers application owners a complete outsourced alternative to internal data center execution environments. IT organizations are in turn responding to public cloud with IT as a Service initiatives.

Taken together, these changes replace a monolithic, dedicated application environment that did not change very quickly with a distributed, shared, and rapidly changing environment that creates new application performance management challenges.

This paper lists and addresses those challenges for IT Operations.

First generation APM solutions were built around a set of assumptions that are in many cases no longer true today.

  1.  These solutions assumed that the application was going to get built or bought, then run inside the firewalls of the enterprise data center.
  2. They assumed that applications were going to get built in Java or .NET, which were for a while the dominant development environments used by developers.
  3. They assumed that the average application would only get enhanced once or at most twice a year.
  4. Finally, many first generation APM solutions completely ignore the fact that in most enterprises, 80% of the applications are purchased commercial applications and not custom developed by the enterprise themselves.

In fact all of the above assumptions are being invalidated in modern enterprise environments. The dynamics listed below are combining to create an entirely new and different enterprise computing environment which must be addressed by entirely new application performance management tools. This new enterprise computing environment is depicted in the diagram below, and described in detail in the following white paper.

Modern enterprise computing environment

Download this white paper to read more about the new enterprise application operations environment and the criteria for evaluating a virtualization and cloud aware APM solution.

The Need for Application Operations in the Dynamic Data Center and the Cloud, by Bernd Harzog, CEO of APM Experts.

Lessons from Distill

Last week was Engine Yard’s Distill 2013 Cloud Conference. It was two days of Keynotes, breakout sessions and food with some tremendous players in Business and Development. From Nolan Bushnell, founder of Atarti, to Richard Rodger, the COO of NearForm, there was a lot of great keynotes and sessions on development, a vision for the Cloud, and having a bit of fun as geeks.

What I found interesting was a common thread among sessions for modular, fast paced development, whether it was Fred George’s talk on Anarchy, Richard Rodger’s talk on Rapid Development or Richard Watson on Cloud-Aware Applications.

The cloud is a new and drastically different way to see your business as an infrastructure. We have done away with our own data centers and hardware in pursuit of flexibility, elasticity and scalability. It isn’t just about hardware. The startup culture has thrived in this new dynamic – companies like Netflix, who were born in the cloud, have used it as a way to redefine operations from “mean-time between failure” to “mean-time to recovery.” They were raised with unstable resources and instead of lamenting the failure of the cloud, they rewrote how they approached infrastructure and in turn changed the meaning of software development.

Infrastructure gave way to IaaS and PaaS. And with all the criticism, it is likely to stay. This change isn’t alone. Startups have found ways to create a cloud of their development teams. In many of the conversations I had, the developers, the company – they were people spread to the wind. They weren’t commuting into one office, they were national or international and relied on collaborative tools to help build those interoffice bonds. The only thing the developer needs is a computer and internet access and the startup is moments from giving birth to a new product. Google sees innovation as a scientific measure of interactions. Even the CEO of Yahoo has pulled back remote workforce. But what I saw at Distill was successful startups with people spread from Japan to the Netherlands, Australia to South Dakota. Is this a fundamental shift? I wouldn’t be able to say, but it may come across our screens some day as DaaS – Development as a Service.

I’ve mentioned IaaS and maybe someday DaaS, but a lot of the talk in sessions and during cocktails was about development itself. What is the best way to deploy onto the cloud, what is the best way to write reusable code, how do we respond rapidly to business change? The answer seemed to be micro-services. By creating small individual pieces, programs that did one thing very well – we reduce overhead and increase scalability. We give meaning to elasticity. Developing hundreds to thousands of small programs, removing deep levels of connescence (thank you Jim Weirich), and leveraging developer driven re-examination and re-factoring we create responsive, elastic cloud applications.

Ultimately I came away from Distill with two things: 1 – Distill was a great name for summarizing the state of change in development. 2 – The cloud is much more than a collection of servers you can rent when you need. Breaking down infrastructure, software stacks and developers we have created an environment where not only is the application responsive and dynamic, but so in the infrastructure that supports it.