Recent Articles

December 19, 2014
corevis - Node.js Core Dump Analysis

November 03, 2014
BSD Now - Behind the Masq

March 03, 2014
Introducing Zag

RSS
GitHub
Voxer
Twitter

corevis - Node.js Core Dump Analysis

Posted by Dave Eddy on 19 Dec 2014

At Voxer, all of our production machines run on SmartOS, and as such, have access to really powerful tools to help with debugging and fixing problems.

One of the most helpful tools to use when debugging problematic node processes is mdb(1). With the v8 module for mdb(1), there is a ton of valuable information about node/JavaScript - including JavaScript objects, stack traces with filenames and numbers, etc. - that can be gathered from node processes. The engineers over at Joyent have been able to do a lot of cool things with it.

We've created a command line tool to help in the postmortem debugging of node.js processes. It uses mdb(1) under-the-hood to gather a slew of data, and then formats it into a single HTML page to be viewed in a web browser.

The code is released under the MIT license, and is hosted on GitHub:

https://github.com/Voxer/node-corevis

And an example of a troubled node process can be seen here:

http://us-east.manta.joyent.com/devops@voxer.com/public/corevis/example.html

BSD Now - Behind the Masq

Posted by Jen Canfield on 03 Nov 2014

A few weeks ago, Voxer's own Matt Ranney (Co-founder & former CTO) and George Kola (CTO) sat down with BSD Now to talk about Voxer's recent migration from Smart OS to FreeBSD. BSD Now is a weekly video podcast created by three technology enthusiasts who love BSD. Their podcast covers the latest news, provides extensive tutorials, as well as interviews with various people from all areas of the BSD community.

Do you have experience with FreeBSD? What were some of the challenges you ran into? Check out the video interview below and tell us what you think in the comments.

http://www.bsdnow.tv/

Introducing Zag

Posted by DJ Gardiner on 03 Mar 2014

The development and administration of complex systems is always a challenge. Keeping these systems running optimally can require monitoring thousands of metrics. This task is complicated when enhancements or fixes are applied. While log files are an essential tool for analyzing systems, they have their limitations. More flexible and efficient methods are required as the scale increases.

Voxer built Zag for this very purpose, and it has become an indispensable tool to help develop and maintain the Voxer system by collecting, aggregating, and visualizing metrics.

Zag includes many features including:

Dynamical zooming and panning of graphs
Analyze historical or live data
Dashboards
Tags
Histograms, counters, and heat maps

Check out the project documentation, the source on GitHub, or skip to the installation instructions.

histogram

Recording a metric is as simple as agent.histogram("sample-histogram", 123) or agent.counter("sample-counter", 123).

heat map

Nagios HTML Email Templates

Posted by Dave Eddy on 24 Feb 2014

Generate HTML emails for Nagios service and host alerts

We've open sourced a script we have created to turn your boring, plain-text Nagios emails into fancy, HTML, emails (with color!). You can check out the project on GitHub and take a look at the screenshots below.

https://github.com/Voxer/nagios-html-email

The code is written in Node.JS, and the HTML templates are written using EJS. Installation is as easy as

[sudo] npm install -g nagios-html-email

and modifying a single Nagios config file.

Critical Service

Open Source Chef Cookbooks

Posted by Dave Eddy on 20 Feb 2014

Various cookbooks used and created at Voxer for Chef

We just open sourced some of the Chef cookbooks we have created and use here at Voxer. You can check them out on GitHub by clicking the link below.

https://github.com/Voxer/chef-cookbooks

Cookbooks

npm: Simple, no dependency, node.js package manager LWRP
pkgin: Simple, no dependency, SmartOS package manager LWRP
smf: Simple, single dependency, SMF LWRP

iSPDY Released

Posted by on 03 Jan 2014

Voxer is always trying to lower the latency between our users. Our data shows that as we reduce latency, people send more messages. We've done a lot of optimizations on the backend to improve the performance of our technology stack: node.js, riak, redis, and SmartOS. The next set of improvements we have been working on are at the network protocol level by tuning stud and moving to the SPDY protocol, which forms the basis of HTTP 2.0.

Our mobile clients have always used HTTPS to talk to the backend, but this has caused a number of performance problems. The latency of mobile networks is unpredictable, and it fluctuates wildly. Even using persistent connections and HTTP pipelining, we used a pool of HTTPS connections to reduce client latency and to address issues like head-of-line blocking. Negotiating and maintaining a pool of TLS connections is complex and slow, and we haven't found a library that does this well on iOS.

The benefits of SPDY from a networking perspective are well documented. We have found that using SPDY for our application has made the application easier to understand, debug, and maintain. SPDY uses a single TCP socket for all requests, so error handling logic, queuing, retry, and background modes have all gotten simpler.

Twitter has released their CocoaSPDY library which solves a similar problem to iSPDY. Had this library been available when we started using SPDY, we may have chosen to use it instead and extend it to our needs. At this point, iSPDY has been tailored for our specific use case at Voxer, and we will continue to maintain it going forward.

The most important feature in iSPDY that we are relying on heavily is Server Push streams. Older Voxer clients used WebSockets or traditional HTTP long-polling to get live updates, but both of these have various tradeoffs at the client, server, and networking levels. In our application, SPDY with server push is the ideal solution for live updates. A single long-lived TCP connection can be used to send multiple live media streams to a waiting client without waiting for a round trip or causing head-of-line blocking from a slow sender.

Here are our design goals for iSPDY:

low latency tuning options
low memory footprint
server push stream support
priority scheduling for outgoing data
trailing headers
ping support
transparent decompression of response using Content-Encoding header
background socket reclaim detection and support on iOS
VoIP mode support on iOS
optionally use IP instead of DNS name to connect to server and save DNS resolution time

Adding support for SPDY on our backend was relatively straightforward because we are using node.js and node-spdy. Fedor Indutny is the primary author of both iSPDY and node-spdy, so these two libraries work well together. We use stud for TLS termination, so node.js sees an unencrypted SPDY stream, which is supported by node-spdy. Older clients and those on other platforms will continue to send HTTPS, but node-spdy has HTTP fallback support. The server code handling these endpoints is almost entirely unchanged, which has made testing and integrating SPDY much easier.

You can download and play with iSPDY on github.

Hardware-accelerated CRC-32 for Node.js

Posted by Anand Suresh on 02 Dec 2013

CRC and Javascript

A Cyclic Redundancy Check or CRC is a common error detection scheme that catches accidental changes to data blocks. It is simple to implement even on hardware, provides fast results and can be used in a progressive manner, making it perfect for use with streaming data like network packets or disk blocks. However, it relies heavily on bit-manipulation - a class of operations that have been sluggish in Javascript.

The reason for this slowness is in the way Javascript represents numbers; as 64-bit floating point values. Bit manipulation therefore requires a 64-bit number to be cast as a 32-bit integer, apply the bitwise operation, and then cast the resulting value back into a 64-bit floating point value. All of this happens under the hood, and while it doesn't affect performance too much, it becomes more prominent when bit-manipulation operations fall along hot code paths, such as the CRC calculation.

Backpressure and Unbounded Concurrency in Node.js

Posted by Matt Ranney on 16 Sep 2013

Backpressure can be a hard problem in both distributed systems and in evented systems. Most busy node.js services will find themselves overwhelmed at some point due to lack of backpressure somewhere. Yes, I know that "backpressure" isn't a proper word. Neither is "performant", but it's a word we want to exist, so it does.

To do backpressure in node, you obviously need streams. Just listen to any talk or blog post from long time node users, and you'll be convinced. Streams are here to help, and they are part of a balanced breakfast. The thing is, streams only fix a small part of this backpressure problem. In some ways, streams may make the problem worse. Certainly streams are better than "not streams", but there are some pretty serious problems here.

Check Riak

Posted by Dave Eddy on 19 Apr 2013

Author: Dave Eddy - Operations Engineer

At Voxer, we store our data in Riak, an open source, distributed database. Like with any database running in production at scale, we've seen our share of issues. To be fair, we are really using Riak; hundreds of terabytes of data, billions of keys stored, and > 50 servers dedicated to Riak in production.

We have a small Operations team of 3 at Voxer, with no dedicated DBA on staff. As such, any issue that we have encountered with Riak, we've scripted a check to detect the issue to prevent it from happening in the future. All of these checks are rolled into a script to give us a summary of Riak health.

This way, when we get woken up at 2am from a nagios alert that Riak is down or unhappy, we can run this script for a quick summary of Riak health, and step-by-step instructions to solve the issue.

check-riak

A script written and used by Voxer to check Riak health on SmartOS

We've opensourced the script that we use to assess Riak health, check it out here

https://github.com/Voxer/check-riak

Chef Part 2 - Performance

Posted by Dave Eddy on 22 Mar 2013

Author: Dave Eddy - Operations Engineer

If you haven't already, check out Part 1 of the this series of blog posts to read about our migration from Hosted Chef to a private, self-hosted Chef instance.

Chef Part 1 - Migration

Then, read on to see how we made our Chef runs 16x faster.

Performance

We migrated all of our data from a Hosted Chef Server instance to a private one, almost transparently to all of our servers, and in the process saved a lot of money per month that we were paying opscode. So everything is good right? Well, unfortunately, no.

Newer Posts 1 of 2 Older Posts »