The decision to open our new log applications to the public was not one taken lightly. Giving our customers the ability to search all of their log files for any keywords is quite taxing on our system, so we had to take several precautions. To ensure the reliability of our entire architecture, we decided to create a separate web server solely responsible for retrieving log data from our persistence storage HBase. By making this an isolated subsystem, we don’t run the risk of a potentially large query bogging everything else down as well.
Since this new web server only has one purpose, we knew that we wanted it to be lightweight and performant. After much research, we ultimately decided to use a Sinatra, Puma, and Nginx combination running with JRuby. Puma is a relatively new ruby/rake web server that excels at handling many concurrent requests. There are several documents that show that it consumes less memory compared to similar setups with Unicorn or Passenger servers. On our low-end development machine, we found that our log application web server was able to serve 50 requests per minute for queries to HBase of approximately 3-4 MB each. As our log search and log watch applications see more usage, we can replicate this web server and place a load balancer in front to handle the increased activity.
We’ve also taken several approaches to make the web server resilient against possible attacks and misuse. Because we allow our users the ability to search for any keyword, they could potentially make ridiculously large queries to our system. Imagine a user searches for “1” across multiple log files. This would probably return every log message in the user-defined time span assuming that the log messages have the date/time. This kind of query is counterproductive to what a search function should do. So to battle this, we’ve set generous data limits on our users.
We’ve also implemented a chunking feature to improve the responsiveness of the log search application. If the user makes a large query spanning multiple hours or days, the web application will break up the queries into smaller manageable pieces. The UI will display the most recent data first and then backfill with older data.
It’s difficult to predict how this feature will be used on production, but it is always important to prepare for the worst. Stay tuned to see how our preparations have panned out.