Another great time in Vegas for AWS re:Invent! In addition to great conversations, great speakers, and our new CloudWatch integration, our Director of Solutions Architecture, Bob Fox, was interviewed by Tim Crawford, CIO Strategic Advisor for AVOA, about how CIO’s are taking advantage of the AppFirst platform.
- We’re providing CIO’s the ability to understand the real costs of applications, whether they’re running locally or in the cloud.
- We’re providing their development teams with access to see how their code is running in production.
- We’re providing their support teams with access to what’s going on to help reduce mean time to repair and investigate.
The decision to open our new log applications to the public was not one taken lightly. Giving our customers the ability to search all of their log files for any keywords is quite taxing on our system, so we had to take several precautions. To ensure the reliability of our entire architecture, we decided to create a separate web server solely responsible for retrieving log data from our persistence storage HBase. By making this an isolated subsystem, we don’t run the risk of a potentially large query bogging everything else down as well.
Since this new web server only has one purpose, we knew that we wanted it to be lightweight and performant. After much research, we ultimately decided to use a Sinatra, Puma, and Nginx combination running with JRuby. Puma is a relatively new ruby/rake web server that excels at handling many concurrent requests. There are several documents that show that it consumes less memory compared to similar setups with Unicorn or Passenger servers. On our low-end development machine, we found that our log application web server was able to serve 50 requests per minute for queries to HBase of approximately 3-4 MB each. As our log search and log watch applications see more usage, we can replicate this web server and place a load balancer in front to handle the increased activity.
We’ve also taken several approaches to make the web server resilient against possible attacks and misuse. Because we allow our users the ability to search for any keyword, they could potentially make ridiculously large queries to our system. Imagine a user searches for “1” across multiple log files. This would probably return every log message in the user-defined time span assuming that the log messages have the date/time. This kind of query is counterproductive to what a search function should do. So to battle this, we’ve set generous data limits on our users.
We’ve also implemented a chunking feature to improve the responsiveness of the log search application. If the user makes a large query spanning multiple hours or days, the web application will break up the queries into smaller manageable pieces. The UI will display the most recent data first and then backfill with older data.
It’s difficult to predict how this feature will be used on production, but it is always important to prepare for the worst. Stay tuned to see how our preparations have panned out.
Two is better than one!
Managing all your performance alerts is now easier than ever. AppFirst now integrates with PagerDuty, providing the most complete operational data set and alert management solution available for operations teams.
AppFirst allows organizations to fully understand the resources required by an application and the manner in which it interacts with other applications, users and systems. By integrating with PagerDuty, these two important solutions now deliver a single, proactive alert management solution to get ahead of issues before end users notice.
With the AppFirst + PagerDuty Integration, you can now rest assured knowing any alerts you set will trigger and alert the right person (even if they’re fast asleep).
Setting us the AppFirst – PagerDuty integration can be done in three easy steps:
- Navigate to our Partners Page (Administration – Partners)
- Select PagerDuty from the drop-down menu
- Add your PagerDuty Account ID, API Key, and give it an Account Name.
Done! (not really a step)
- Note: Your Account ID is the first part of your PagerDuty subdomain, e.g. CompanyName of CompanyName.PagerDuty.com
To add or edit an alert to include PagerDuty alert details, head to Administration – Alerts. When configuring the alert, select the Users you want to receive the alert, as well as the PagerDuty partner service.
And that’s it! For the full integration guide, you can check it out at PagerDuty’s website.
Set those alerts, and get some sleep.
- The AppFirst Team
AppFirst is back at AWS re:Invent this week and is proud to announce our newest integration into our big data platform: CloudWatch. Last year, almost everybody we talked to asked us if we integrate with CloudWatch. And now we do!
So why use AppFirst if you already use CloudWatch?
With this new integration, AppFirst is offering CloudWatch users a single source of truth into the real resource utilization and configurations of their AWS instances.
AppFirst allows an organization to fully understand the resources required by an application, as well as the manner in which it interacts with other
applications, users and systems. We do this by collecting very granular metrics about your app stacks and provide this data to you in real-time. We’ll show you per-process metrics for cpu, disk, average response time, socket responses, files opened, file R/W, memory, network connections, network I/O, threads, and much more. We call this our Miss Nothing Data.
But why is this important you may ask..
We work hard to give our users a complete view into their applications and infrastructure. By integrating CloudWatch metrics into our platform and normalizing it with our Miss Nothing Data (as well as your logs, statsd, Nagios, and much more), you’ll get unparalleled visibility. This data helps you:
- Get a firm understanding of your AWS billing info per AWS service
- Understand your application infrastructure over time – AppFirst stores data for up to one year, enabling users to visualize historical trends for
- Recapture stolen time – A visual graph provides at-a-glance confirmation when instances need to be moved to another region for better performance and resource utilization.
- Understand application footprints – Recording of the actual application footprint running over time enables accurate comparing and contrasting of the same application running on different servers.
- Re-imagine capacity planning – Add or remove resources based on live application behavior – the direct measure of service effectiveness.
Don’t be limited. You can’t manage what you can’t see. But here is what you can see with AppFirst and CloudWatch:
When we closed our Series B round late last year, it was all about growing and expanding. We had customers. We had revenue. But as a company, AppFirst was running in Lean Startup mode. When co-founder Donn Rochette and I started AppFirst, our vision was of a SaaS company largely comprised by SMB users, but one that would expand to the enterprise business in the future, when they were ready to embrace SaaS solutions. The growth of the cloud for business applications has driven a proliferation of public, private and hybrid clouds — and it’s also driven AppFirst’s growth. And at the end of last year, we had indeed captured the attention of larger enterprises, with a number of global enterprises were calling us for our solution.
The Series B funding enabled us to begin our transition from ‘lean startup’ to a ‘scalable business’ and to invest in the right people — since inception we focused on recruiting smart engineers that have built out our amazing platform so going forward, we needed to hire a talented and savvy sales team, as well as solutions architects that have allowed us to shape solutions on top of the AppFirst platform. That funding has also enabled us to add a number of strategic, client-facing hires to our team in the last few months, including enterprise sales associates, field systems engineers and sales directors.
Our latest addition to the AppFirst team is Bob Fox, who joined us in early September from Splunk. Bob will continue to build out our sales team and will work closely with our customers to ensure the new features we are developing continue to match their needs and expectations.
We have similar DNA to Splunk. Like us, the people at Splunk saw the market clearly and understand that, no matter how powerful their technical vision may be, customers will drive market acceptance by sharing their specific needs for Operational Intelligence. Bob was driven by that at Splunk and he was instrumental in how the company listened to, and partnered with, their client base to build horizontal and vertical solutions on top of their platform. Now he gets to roll up his sleeves and help us reach our next level of success. We are lucky to have his talents with us here at AppFirst.
Although all of us at AppFirst are thoroughly enmeshed in technology in our business lives, we work hard to keep it about the people who matter: our customers. Our growing ranks of customers are excited about accelerating the speed of value they can realize from this immense data platform we provide — this is driving our growth and we want to tackle it in a variety of ways. In order to make our customers happy, we work to get the right people on board to make that happen. In the end, everybody wins.
Root cause analysis is one of the most challenging topics in the application performance monitoring industry, and one of the most difficult areas to get right. There are the following approaches to root cause analysis:
Inference from resource utilization data.
This legacy approach involved creating baselines for lots of resource utilization metrics, and then trying to infer if there was an application problem when one of those metrics went out of bounds. This approach has been proven to be useless in the modern abstracted, shared, dynamic, and distributed data center. There are simply too many metrics for which to set manual thresholds, many of the metrics get skewed by the virtualization process, and it becomes an impossible task to set up the monitoring system to strike an acceptable balance of missed events and false alarms.
Statistical correlation of response time, throughput and resource utilization metrics
Most products cannot directly link the response time of an application with the chain of actions in the infrastructure that support that application. Therefore they rely on statistical correlation to say that if response time deviated at this point in time and these resource utilization metrics were out of bounds at the same time, then these resource utilization metrics probably point to a constraint that is the cause of the performance problem. The problem with this method of root cause analysis is that it’s prone to either too many false alarms or too many instances of issues that should have been caught but were not (false negatives).
Deterministic root cause
This is the most difficult of the approaches to implement because it requires solutions to directly measure response time and accurate resource utilization metrics, and to deterministically (not statistically) link these metric together through topology discovery. This is also the only method that promises to give operations teams immediate answers to the question of what issue in the infrastructure is causing the application performance issue. Deterministic root cause is also the only approach that can work for both resource utilization based issues and issues caused by changes in configuration.
Download this white paper to read more about why deterministic root cause is essential in the new enterprise applications environment.
The Need for Application Operations in the Dynamic Data Center and the Cloud, by Bernd Harzog, CEO of APM Experts
This is AppFirst’s second time sponsoring Velocity Conference, and we can’t say enough about it. We met some great people once again and got great insight into the types of pain the web performance community deals with today. For me personally, I love hearing what issues our community is wrestling with on a daily basis. We’re working hard every day at AppFirst to ease their pains and make their jobs easier.
Probably the coolest thing we’ve done at a conference came on Thursday, the last day. We had Gene Kim join us at our booth for a book signing of his newest book, The Phoenix Project: A Novel about IT, DevOps, and Helping Your Business Win. We were offering 100 books, and people lined up like there was a new iPhone on sale. And Gene was great. He whipped through 100 books like a champ, while taking the time to meet everyone and chat for a bit (and this is after his session on DevOps with three other web performance rockstars).
Check out some pics from the signing below. Maybe you’ll even see us at Velocity New York this October!
All Savage IO DataBricks now feature powerful, enterprise-class monitoring; deliver unified visibility to all data sources.
CLOUD EXPO – New York, NY June 11, 2013 – Savage IO, Inc., provider of innovative storage solutions, announced today that it has added Savi 360, powerful, enterprise-class remote monitoring capabilities, to all DataBricks, its high-performing, low-cost alternative to legacy storage. At the same time, the company announced it has partnered with AppFirst to power Savi 360 by collecting data and delivering unified visibility into all data points throughout the server. Now DataBrick users can completely control every aspect of their datacenter hardware, stay on top of what’s happening in real time and understand how performance is impacting their business. Savage IO has also boosted throughput up to 22 gigabytes, offering data centers a Converged Storage option that raises the bar for storage and performance.