Reminder: API Versions 3 & 4 will be shut off Monday, March 16th

This is a reminder that we will be shutting off API versions 3 & 4 on Monday, March 16, 2015 and will now be required to use our API version 5.

All documents can be found at http://support.appfirst.com/apis/ for any assistance that may be needed in setting up API version 5.

If you have any questions at all, please email support@appfirst.com.

Support for Zones & Containers

Support for Zones & Containers

Jan 19, 2015

Back in April we announced the beta release of our Solaris Collector (As a reminder, the Solaris collector works with Solaris 10 and 11 on both SPARC and Intel platforms). Solaris has been receiving rave reviews in beta and the next step in improving Solaris collecting was to provide additional support by collecting within Solaris Zones. Now the AppFirst Collector fully supports both Global and Non-Global Solaris Zones.
You can see all the details about how to install and view your zones in our docs: http://support.appfirst.com/docs/collectors/#zones

And why stop there?

We also decided you needed more details into your Linux Container. So now, the AppFirst Collector can capture details on process and application details in specific Docker containers. See the support documentation for details on how to install on your host OS or directly into a container here: http://support.appfirst.com/docs/collectors/#containers

Server Topology – Tech Preview

Server Topology Mapping

Today we are happy to announce the release of Technology Preview: a gateway for AppFirst customers to try new tools and features that are currently in development. The goal of Technology Preview is to incorporate an early feedback loop for new features; allowing customers and partners to shape the tools that are most important to them. For more details, see our documentation.

Server/Device Topology

The first tool being released into Technology Preview is the Server/Device Topology.

Server Topology allows users to see, at a quick glance, the health and communication paths of their infrastructure. It was created to show the topology of your Servers and Devices as well as the metrics, processes, and logs associated for a given server or device. You are able to set metric thresholds to quickly see the overall health of a Server and its respective elements.

This was initially designed with two audiences in mind: for operators and for application owners.

Server Topology for Operations

For those responsible for the data center, Server Topology can act as a first alert of pinch-points in your infrastructure. By viewing a quick and simple visual, you can quickly identify systems running hot.

Server Topology for Application Owners

For those responsible for the applications running on top of the infrastructure, Server Topology provides a easy-to-digest view of any application crossover between environments.

In this situation, you can apply server tags by application. For example, let’s say you tag all the servers hosting Zookeeper, used as the management layer of our big data store. In this scenario the environment is designed to be self-contained. Therefore, it is critical that each backend is separate and has no communication between the two. Looking at the example below, Server Topology allows users to quickly identify if there is any crossover, which would indicate a misconfiguration on a server or in a line of code.

 

Getting Started: View A Topology

Viewing a topology is easy. Simply select a Server Tag to see its topology. If you want to see a specific layout, configure a Server Tag comprised of these servers before here (Admin – Organize Servers) .

The graph shows a topology of your servers and devices. Each server/device is colored according to its health, which is determined based on the thresholds on the right of the screen.

  • Red indicates that a process is exceeding a set threshold

  • Yellow indicates that a process is above eighty percent (80%) of the threshold

  • Green indicated the a process is healthy according to set thresholds

  • Grey indicates that No Process information can be found for that server or device

These colors percolate upwards; that is, if any process exceeds the threshold it becomes red, the server/device will also be red. You will only see green if all of processes associated with that server/device are below all the thresholds. The health of each process is recalculated when new thresholds are set or added. Similarly, when a metric is deleted, the health colors are recalculated using only the remaining thresholds. You can edit the min, max and step values of the thresholds.

Selecting a server or devices populates a table below the topology graph, displaying processes and metrics are causing problems by highlighting the cells that exceed the set thresholds.

  • Click to see Logs associated with server

  • Click to see Correlate, allowing you to drill down to very specific granular metrics on a given server

  • Click to return to the Processes list

Feedback

The Servers Topology tool is currently in Technology Preview. We’d love to hear your feedback on how this is helpful, how it could be improved, and what you like or dislike. Email product@appfirst.com or give feedback directly from the tool by clicking the feedback icon in the top right:

 

API Updates: V5

Key points for new API v5:

  • The new v5 API uses a different method for versioning. The old API took a version in the URL like http://wwws.appfirst.com/api/v4/servers/. The new API takes a version in the HTTP Accept header like “application/json; version=5″ and the URL for servers is be just http://wwws.appfirst.com/api/servers/ without a version in the URL.
  • The new API offers several return data formats that can be specified in the Accept header:
    • application/json; version=5
    • application/xml; version=5
    • application/yaml; version=5
  • The new API also has a web-browsable interface for convenience when developing. It is accessible at http://wwws.appfirst.com/api/dev/
    • Note that the /api/dev/ URL should only be used for development/testing and not in actual code as it will move to /api/ when v5 becomes the default version. Versions should be in the Accept header
  • The return fields and input fields have changed slightly for certain endpoints in order to standardize field names and provide a more consistent interface for users. Applications currently interacting with v4 of the API will probably need to be updated, but the changes required should not be too complicated. The new return formats can be seen on the web-browsable interface. Support documentation will be updated shortly.
  • There is a new Python client wrapper available at https://github.com/appfirst/afapi which makes integrating API calls into Python code much more convenient.
  • The new API v5 will become the default on February 16, 2015, at which point v3/v4 will be deprecated. Versions 3 & 4 will be shut off on March 16, 2015.

Monitoring IPMI Sensor Data

A customer of ours recently told us about their use-case for extending the AppFirst collector to support the data center team. Ops was already using AppFirst’s platform to collect, aggregate and correlate massive amounts of application & operations data for monitoring & troubleshooting. However, there was glaring concern about how long it would take the data center guys (and gals) to log into their own system and check the current status and history of their hardware data.

In turn, they extended the AppFirst collector to capture IPMI sensor data to monitor the environmental trends of their physical hardware. Taken together they were able to see key metrics from the physical hardware all the way up to business performance.

Specifically, the data center team needed to track a few key metrics and analyze performance over time to see if there is a trend towards downward failure.

  1. Fans: Are the system’s fans about to fail, working too hard or experiencing degradation?

  2. Temperature: Is the system temperature anything but stable?

  3. Voltage: Has a power supply gone bad? Specifically, the Power Good signal, which prevents a computer from attempting to operate on improper voltages and damaging itself by alerting it to improper power supply(1).

How to get started

First, a little background. AppFirst’s collector supports the ingestion of multiple types of additional data: logs, statsd and polled data. Polled data can include scripts with a nagios output format to collect data from APIs, management interfaces, SNMP, IPMI or any other source of data available.

Collecting IPMI Data

AppFirst supports server hardware equipped with a BMC controller, with a variety of hardware sensor data that can be monitored using the IPMI standard. IPMI sensor data can be gathered in-band (through the host O/S), or out-of-band (through a dedicated lan connection independent of the system processor and host O/S). A free tool, FreeIPMI, is available to examine IPMI data, and AppFirst has written a series scripts that allow you to take advantage of AppFirst’s big-data store and visualization tools to easily monitor this data.

Here is an example that uses in-band communications (but out-of-band works also). You can do this two ways.

With polled data:

With log data:

To run either method you will need to install FreeIPMI and OpenIPMI on your system.

You are able to use source from here: FreeIPMI, or you may be able to find an rpm package compatible with your OS.

Before moving any further along, make sure that you can execute the ipmi-sensors command. The output looks like this (but much longer):

[appfirst@servername: ~]$ sudo /usr/sbin/ipmi-sensors
ID | Name                     | Type                              | Reading   | Units  | Event
1   | Pwr Unit Status    | Power Unit                    | N/A           | N/A     | 'OK'
2  | IPMI Watchdog    | Watchdog 2                  | N/A           | N/A     | 'OK'

Polled Data Configuration:

Once you have ipmi-sensors working, you will need a perl script, check_ipmi_sensors, to format the output to be Nagios-compatible. We currently install the required script at the same time you installed your collector, and it is located here:

/usr/share/appfirst/plugins/libexec/check_ipmi_sensors

If not, its available here: https://github.com/appfirst/nagios-plugins/blob/master/check_ipmi_sensors

You can test that it’s working by executing this command:

sudo /usr/share/appfirst/plugins/libexec/check_ipmi_sensors -t fan

and you should get output that looks like this:

fan OK | System_Fan_1=2156.00;System_Fan_2=2156.00;System_Fan_3=2156.00;
System_Fan_4=2156.00;Processor_1_Fan=2450.00

From your AppFirst UI, select Admin → Setup → Polled Data  from the top menu, and locate your collector.  Click on the Server Hostname and add lines similar to these to the config file:

command[sensor_fan]=/usr/share/appfirst/plugins/libexec/check_ipmi_sensors -t fan
command[sensor_temp]=/usr/share/appfirst/plugins/libexec/check_ipmi_sensors -t temperature
command[sensor_voltage]=/usr/share/appfirst/plugins/libexec/check_ipmi_sensors -t voltage

The -t parameter is the sensor type. To get a list of sensor types, execute:

sudo /usr/sbin/ipmi-sensors -L

Note that your hardware may not report all of these.

Save the file, giving the collector up to five minutes to update with the new config to start polling the device. In the Correlate tool, you will soon find the new polled data commands that can be displayed.

Log Data Configuration:

You can monitor the output of any command as log data by simply capturing it’s output and appending it to a file, and then configuring the collector to monitor that file as a log file. One way to do that would be to execute the command as a cron job. You would also want to configure logrotate to prevent the excess consumption of disk space.

An alternative is to use our simple bash script, poll2log, and leave the storage to us.  It simulates log rotation, and only stores the output from one execution. This script should have been installed when you installed your collector, and should be located here:

/usr/share/appfirst/plugins/libexec/poll2log

But if not, you can find it here: https://github.com/appfirst/nagios-plugins/blob/master/poll2log

Add a line like this to your crontab (probably as root):

*/5 * * * * /usr/share/appfirst/plugins/libexec/poll2log "/usr/local/sbin/ipmi-sensors
             --output-sensor-state" /var/log/ipmi-sensors.log

From your AppFirst dashboard, select Administration | Logs, and click the Add Log button (upper left). Find your server in the pull-down list, set the Type to “File”, and the File Path to /var/log/ipmi-sensors.log. Save the configuration, and give the collector a few minutes to receive the configuration. In the Correlate tool, you will soon find the new log files that can be displayed.

Out-of-Band Configuration:

You can monitor the ipmi sensor data from another machine that has a collector installed. This allows you to see the sensor data in the case where the target machine does not have a collector installed. You simply need to add the host information to the command line:

command[sensor_fan]=/usr/share/appfirst/plugins/libexec/check_ipmi_sensors -h 10.7.7.7 -s “-u username -p password” -t fan

Alternatively, this can also be an environment variable:

export IPMI_USER=user" with "-u $IPMI_USER

Links that may be of interest:

Intel BMC Web Console Users Guide:http://download.intel.com/support/motherboards/server/sb/intel_rmm4_ibwc_userguide_r2_72.pdf

(1) Wikipedia: http://en.wikipedia.org/wiki/Power_good_signal