Wednesday’s Maintenance: What’s New at AppFirst

We appreciate your patience during our scheduled maintenance on Wednesday, January 26, 2011, at 5:00 PM EST. Here’s a list of all the features we’re adding and the bugs we’ve fixed.

 

Features

  • Alert End Dates – Alerts now have an end date. This means no more confusing “reset interval” parameter when you create an alert. Instead, we’ve created “Alert Email Digests.”
  • Alert Email Digests – Visit your profile to set your alert email preferences. You can still choose to receive all alert emails, or instead get periodic digests summarizing all alerts that happened.
  • New look and feel – We’ve updated the look and feel of the application dashboard to be more in line with our website design.
  • Improved scalability – As we add more customers, we constantly have to keep scaling our backend processing.
  • Faster summarization – New data is displayed twice as fast, making it available to you after one minute rather than two.

 

Bug Fixes

  • Deleting a collector is now faster and more reliable.
  • The number of running processes displayed in the Resolve Tab now match the number shown in the drop-down list.

 

Happy monitoring!

The AppFirst Team

 

George Vanecek speaks at the first OpenStack NY Meetup

Openstacklogo_150px

The inaugural OpenStack NY user group Meetup was a success. About 40 souls braved the weather to come listen to George Vanecek, PhD, principal architect at Huawei, share why cloud computing is important and why specifically they are excited about OpenStack.  We can’t share the pizza and beer with you, but you can hear a synopsis of George’s talk below. You can also check out some pictures of the event at OpenStack’s Flickr page.

We look forward to seeing familiar faces and new faces at the next OpenStack NY user group Meetup. Stay tuned for details!

Is there a skewing of time and resources in a virtual world?

The question comes up often and absolutely yes, there is a skewing of time and resource usage values that occurs with virtualization; it’s an artifact of the way the hypervisor works. You can think of time in any individual VM as being “eventually consistent”. When using a hypervisor the OS is being scheduled to run in much the same way as the processes in your application are being scheduled to run.

Since the OS is not running all the time, as it is in a physical server without a hypervisor, the values that you see from the OS are often times skewed. This is due to the fact that the OS is not aware of the fact that it is not running all the time. For example, Linux uses a periodic timer interrupt to create what it calls jiffies. This is just a counter that is incremented when the timer interrupt occurs. The interrupt occurs every 10Ms (it varies on H/W architectures, sometimes the interrupt occurs every 100Ms). Therefore, 100 jiffies is 1 second of elapsed time. You can refer to the constant HZ (param.h) for the specific jiffies value. Windows is a bit more complete in the way they utilize timer hardware to account fot the potential skewing. I won’t bore you with all the details, but you can rely on the performance counter values to be correct to a large degree.

So, given that the OS relies on continuous operation in order to calculate time and time related resource utilization things can get skewed. The better hypervisors,  attempt to account for the missing time. For example, if an OS has been in a wait queue while the hypervisor schedules other OSs to run a period of time elapses. When the OS is scheduled to run by the hypervisor it is not aware of the actual amount of time that has elapsed. So, the hypervisor tries to fix this by, for example, simulating (virtualizing) a number of additional timer interrupts so that the OS catches up; by incrementing jiffies a bunch of time in the case of Linux. The OS eventually catches up, more or less. That what I meant by eventually consistent.

This hardware virtualization of timer related hardware works sort of OK for wall clock time. As you have experienced, it’s not very good for real-time and small critical measurements. The values calculated by the OS can be misleading. It’s an artifact of virtualization. Here’s something to think about; what happens if you want to know response time on a socket connection? It needs to be accurate with respect to wall clock time because that’s what the client sees as response time. If you use calculations that do not over come this skew then you get really messed up values. Especially when you are dealing with microsecond values.

 

So, what do you do if you want to get viable timing and resource usage values for virtualized environments, including clouds? You need to utilize tools that understand this issue and provide you with correct values. Such a solution can not rely on OS interfaces directly to provide you with values. However, most monitoring solutions do exactly that; use OS capabilities to gather the information for monitoring.

 

The solution turns out to be pretty low-level and somewhat complex. I won’t bore you with all the details (unless someone wants to get into it). So let me try to summarize this way; 1) an effective cloud monitoring solution has to get values independent of the OS and calculate values independently 2) an effective cloud monitoring solution must use a hardware mechanism that is not skewed by the fact that the OS is not running constantly. Number 1 is kind of difficult to explain but if there is interest can perhaps be another blog post.

 

Number 2 can be summarized, maybe. At AppFirst we use the TSC counter on Intel & AMD CPUs. Before you stop reading and tell me I’m wrong; let me clarify. A few google searches will point you at various docs describing why this timing mechanism is not safe to use. There are potential problems in a couple areas; does the counter increment consistently when frequencies vary on the CPU (which happens a lot with power management and other activities) and do all cores start with the same values and increment in the same way as all other cores? With older CPUs these were serious limitations. With newer CPUs both Intel & AMD have resolved these issues. Where you have a CPU that supports what Intel calls an Invariant TSC you can rely on the counter to be consistent. There is lots of confusion about this. I’ve found that the only information to rely on for this topic is section 16.11 of the Intel Systems Programming Guide.

 

It’s required that a solution attempting to use an Invariant TSC take care in several ways. We check to see if the CPU supports an Invariant TSC. Where it does values can be quite accurate and they avoid the skew created by virtualization. Of course, it’s a lot of fairly heavy lifting to collect all the necessary data and make the calculations independent of the OS, but frankly, it’s the only way to get you accurate information. So, that’s what we do at AppFirst.

 

In order to really use the cloud and virtual environments you need the real data, guesses aren’t something you should be running your business upon.

 

 

OpenStack New York Meetup is Next Wednesday

Happy New Year!

We’d like to remind you that the first OpenStack Meetup event in New York City is coming up next Wednesday. It will be held on January 12th, 2011 @Dogpatch Labs (36 East 12th Street).

OpenStack is a collection of open source technology products delivering a scalable, secure, standards-based cloud computing software solution.  It is used by corporations, service providers, VARS, SMBs, researchers, and global data centers looking to deploy large-scale cloud deployments for private or public clouds leveraging the support and resulting technology of a global open source community.

Our guest speaker will be George Vanecek, PhD is a principal architect at Huawei’s US Innovation Center R&D. Huawei is a world-leading telecom solutions provider, innovating to provide robust, scalable IaaS and PaaS services to their customers. When looking around the market Dr. Vanecek and his team considered building everything from scratch and adopting commercial technologies. Their decision ended up being to adopt OpenStack. In this open discussion Dr. Vanecek will share what led he and his team to decide on OpenStack and some of the concerns both technical and business he wrestled with.

Also, Michael Mayo, cloud developer evangelist of Rackspace Hosting and Rackspace Cloud will be joining us from the West Coast. To RSVP, please go to: http://www.meetup.com/OpenStack-New-York-Meetup/calendar/15634525/. And don’t forget to forward this event to anyone that might be interested.