About Donn Rochette

CTO

Solaris Collector – Now in Beta

Introduction

AppFirst is releasing support of our collection technology for Solaris platforms. We are doing a beta release for a few users to get validation and feedback.

Why Solaris?

Solaris certainly isn’t a growth market, to be sure. So, why develop a collection technology for Solaris platforms? I suppose the simple answer is; completeness.

We have a number of customers that continue to use proprietary Unix systems. These systems most often host relatively critical services. In most cases, there has been significant investment in the services hosted by proprietary Unix systems, in particular Solaris.

As AppFirst continues to pursue the notion of assembling, in one collection, the most complete IT operations data, it is actually fairly natural to include Solaris at some point.

Supported platforms

This release supports Solaris 10 x86, Solaris 11 x86, Solaris 10 Sparc, and Solaris 11 Sparc. Our test matrix is getting fairly large at this point as we include all the necessary instances of Windows, Linux, FreeBSD, and Solaris.

We are investigating support of Solaris 8 and 9 in branded zones. This a bit of a stretch. We know that. Everyone should be off Solaris 8 by now, no doubt. But, it’s sort of like what Sir Edmund Hillary said of climbing Mt Everest: because it’s there. We want to know if it’s possible. Stay tuned. For those of you scoffing at why we would even consider Solaris 8 and 9, chances are you will be vindicated. For those of you who share my curiosity in the matter, we’ll let you know the why and wherefore.

Vagaries of Solaris

We were a bit surprised by the fact that the detailed OS services we use were actually pretty consistent across Solaris platforms. It’s been a few years since I dabbled with Solaris. It was quite refreshing to see what has been done with Solaris 11.

Many of the open source libraries that are the basis of application stacks in a Linux world are readily available in Solaris 11. Our list includes things like curl, libcurl, libz and openssl. A pleasant surprise. Not the case with Solaris 10, but that’s to be expected.

The biggest surprise we ran into was the use of atomic functions for 32 bit Sparc. Where possible, we have been using the intrinsic functions provided by gcc. These are available and work well for Solaris i386, x86_64 & 64 bit Sparc. There is no atomic cas, compare and swap, for 32 bit Sparc. Apparently there is an issue with the instruction set for 32 bit Sparc. The gcc people just decided to not implement this. There is a native atomic cas provided by Solaris that works for 32 bit Sparc.

We were a bit mystified by curl on Solaris 10 Sparc. It seems pretty silly; http://curl.haxx.se/mail/lib-2008-09/0051.html. The net effect is you end up building from source a 32 bit and a 64 bit curl. Not an issue on Solaris 11 Sparc. While it’s an annoyance, it’s not a show stopper.

Sparc hardware

As you would expect, we did the bulk of our development on Solaris 10 & 11 x86 using VMs. For whatever it’s worth, we never did get Parallels to create a VM using Solaris. We had to use VMware.

We tested on Sparc hardware using a customer’s servers. While this causes a bit of trepidation because we are on someone else’s server doing testing, it actually worked quite well.

We don’t have plans to install Sparc hardware any time soon. We plan to do support using Sparc hardware from the people who have the servers and the need for the information. Is this feasible? We would like to hear from you on this topic. Is it a workable approach?

What’s Next?

As you can probably guess, the next platform is AIX. We are serious in our quest for complete IT operational data. That includes all data from all platforms. We’d like to hear from you. Let us know what you think.

And, if you’re a Solaris user and want to try our Solaris collector in beta, sign up for our Beta program and we’ll follow up with a custom installer for you.

An Early Look Into Our Newest Log Applications: Log Search and Log Watch

Log Search

We just finished a beta cycle on a new application that provides the ability to search for any logs collected by AppFirst, aka Log Search. We’re pretty excited about this capability. We know there are a lot of log search tools available, but we think what we are able to offer is pretty cool. The ability to correlate and now search log data along with all other data collected and streamed by AppFirst represents a unique solution. (Yes, that means our patent-pending Miss Nothing Data, Nagios plugins, Windows Performance Counters, and StatsD.)

We’ve been collecting log data for a while now – available for alerting and correlation with the rest of our data. With our new Log Search application, you can search any and all logs for keywords and severity level in any time range you want. You can even throw two keywords in there if you’d like.

We spend a lot of time looking at logs when we are working with our production systems. We’ve found that having all logs from all sources consolidated in one place has been really useful for us. No more having to go into each box and search for the log sources individually. They’re all now in one place, making search a hell of a lot easier.

Log Watch

There are a few scenarios where we’ve found that we want to watch a particular log file. We want to see what’s happening right now. In this scenario, it’s less about looking for specific entries in log data and more about what’s transpiring. Since we use this capability a lot, we added support for it in our new application. We call it Log Watch. It’s like using the command “tail -f /var/log/messages” in Linux/Unix.

In a lot of scenarios there is nothing that replaces the ability to search logs for specific information. We will expand our keyword search capability to include support for a search language as we proceed (this would include Boolean expressions). We are also looking at supporting SQL statements at some level. We’re not sure if this is needed or if it’s as useful. Something we’re looking into.

Speed Searching

Search results are pretty fast. This is something we were worried about. We took a very different approach to search technology and it wasn’t obvious how performance would work out when we started. We use HBase for persistent storage. We really wanted to avoid the need for any sort of ETL – Extract, Translate, Load – into a separate search technology. An ETL would require additional storage and increased complexity. We wanted to find out if we could perform searches directly on information resident in HBase.

Turns out, you really can perform searches directly from HBase. In our design, it greatly simplifies search capabilities. Now that we are able to search log data, we can extend the service to search any of the data that we aggregate. We hope to provide this capability in the near future.

We made use of an HBase feature to perform column searches as the basis for locating specific log entries. We had to modify the way we store log data in HBase in order to make this effective. It did give us the performance and accuracy we were looking for. We were pretty excited when we figured out how to make this work. The details of how HBase works and the specifics of column organization is beyond the scope of this blog. Maybe we should do a blog on these details at some point. Let us know if you’re interested in the comments.

Of course, making something like this work and making it work at scale are very different tasks. We provide REST APIs to perform search on log data and our Log Search application uses these public APIs. You can extend the log application or create your own using the same APIs.

We found that we needed to pay a lot of attention to the size of data being returned. There is a practical limit to the size of a JSON object that you want to return to the browser. Part of the solution was to create paging with the APIs. One of the biggest issues to deal with is the fact that it’s difficult to determine the size of the data resulting from a given search. Indexing of the data helps with this. Of course, indexing is fundamental to search, but we are looking at ways to avoid the need to index massive amounts of data. We aren’t sure where the indexing question is going to end up.

There is a lot of detail about how search can be performed using HBase. We didn’t intend to get into that detail just yet. For this post, we just wanted to describe an overview of the approach as we release our Log Search application. If there’s interest we can describe details. Let us know. Would be great to hear from people who are interested. We’d like to see if other people are looking at similar approaches. We’d like to cooperate with, maybe even commiserate with, people working on similar approaches. Thanks.

Donn Rochette, CTO

P.S. If you’d like to try our beta version of Log Search and Log Watch, you can sign up for our Beta Program.

The Value of Real Data

For better or worse, I’m a baseball fan. It’s something I grew up with. My grandparents were devout Detroit Tigers fans. Some of my fondest memories are of listening to the game on the radio with grandparents and extended family.

One of the talking points in baseball circles these days is the pace of the game. It was a recent topic when the Red Sox and Yankees took nearly 5 hours to play a game last month. There are a few generally accepted reasons why some of the games take longer than others. Most people passionately describe the time consumed when the catcher goes to the mound to talk to the pitcher and the amount of time consumed by the pitcher between pitches. In fact, MLB has instituted new rules this year that are supposed to govern some of these times.

A funny thing happened recently in the midst of all of this discussion. A baseball writer (Didn’t write down his name…) sat down with video of several notoriously long games and measured times. He used a stop watch to measure the amount of time taken for each individual activity. With absolute times in hand, he was able to clearly tell precisely where time was consumed. The activities that took most of the time were not those generally accepted to be the culprits. It turned out, for example, that Derek Jeter stepping out of the batter’s box and walking around between pitches consumed a lot more time than any of the pitchers or visits to the mound by a catcher.

But why bore you with all the baseball reference? It’s an example from my personal frame of reference that illustrates an aspect of the human condition that affects the management of IT infrastructure. It’s really easy to accept that something is factual when one or more individuals speak with passion that it is so. That system is slow because the java app is using too much memory, or it’s slow because the database is over extended. It’s all too easy to accept. I’ve done it.

We had a recent performance issue with our back-end servers. Our entire development team, myself included, was convinced that the issue stemmed from the aggregation of data. We all just accepted that as reality because it made a lot of sense. It was logical. However, when we looked at the visualization of the applications on our infrastructure, it was very clear very quickly that we were all wrong. Data aggregation was really consuming less than 5% of CPU resources. It was the act of responding to API requests that were consuming up to 30% of CPU.

The point is; get the facts. Easier said than done, I know. You don’t have access to the real facts if you are looking at server resources alone – they are insufficient to see the detail you need. Transaction times alone aren’t going to fully inform. All of the copious information that you can get from individual components is a patchwork of data, much of it contradictory. The information from byte code insertion tools is detailed, but it doesn’t help much with management of your application infrastructure. Maybe if you wrote much of the code yourself, you can get what you need. Either way, it’s like real work to get what you need.

What you need is a consistent view of the apps running on your infrastructure. A view that is the same for all apps; web server, app server, database, Java, .Net, Ruby, PHP…well, you get the idea. If you are managing infrastructure, do you care which line of code is calling a SQL statement? Do you care if you’re dealing with Java or Ruby? You do care about the performance profile, about what resources the apps require, about bottlenecks. So, why not look at what matters to you?

Give it a try – it’s easy and not at all scary: http://www.appfirst.com/sandbox/