LosantPingPong: End-to-end Losant Connectivity Test

Losant is a simple and powerful IoT cloud platform. The platform provides a MQTT broker that sensor devices can connect to and report state in real time; the reported states are stored in the cloud, and can be visualized in several formats.

Recently, Losant sent me a Builder Kit which contains an ESP8266 WiFi chip on Adafruit Feather Huzzah, a TMP36 temperature sensor, and some other components. Following the official instructions, 1.5 hours later, my room temperature shows up on a webpage.

I'm excited, and I decide to leave it running continuously so that I can monitor the activities of @TheTucsonHeat. However, Losant platform shows the device as "offline" from time to time, and sometimes it never comes back online unless I press the hard-reset button. This is mostly because of the bad WiFi in my apartment: 1000ms huge delay, 10% or more packet loss, etc. Since MQTT is based on TCP, it's severely affected. To make things worse, our WiFi connects to the Internet through two levels of Network Address Translation (NAT) gateways; it seems that the ESP8266 sometimes is unable to detect the TCP connection to Losant broker is broken, and thus does not take recovery actions.

As a computer networking student, the first thing coming through my mind is the end-to-end principle: the reliability of an application should be ensured by the end hosts, not by the network. To test whether my temperature sensor remains connected to Losant, I should send a message to the Losant platform, and let the platform send back a reply. And that's the idea of LosantPingPong: an end-to-end connectivity test between a connected device and the Losant platform.

LosantPingPong protocol

The LosantPingPong protocol is very simple:

  1. Periodically, the device sends a state report message {"act":"ping"} (a ping) to Losant broker.
  2. A workflow at Losant should reply with a device command "pong" (a pong), which is delivered to the device.
  3. If the device detects several lost pongs, it concudes the connection to Losant broker is lost, and tries to reconnect.

Implementation - Losant platform

Losant platform accepts an attribute in the state report only if the attribute is registered. Therefore, add a act attribute as String type for the device.

I could have used a Boolean type like the button attribute in the official instructions, but I used a String so that all kinds of actions (including the button) can share this one attribute.

And then, we need a workflow that looks like this:

  • Device node: select the Builder Kit.
  • Conditional node: enter expression {{data.act}} == 'ping'. (caution: use single quotes; Losant hates double quotes and gives an error if I enter "ping")
  • Device Command node: send command to the Builder Kit, and enter command name pong.

Implementation - Arduino ESP8266

I've separated LosantPingPong into a class, so that it can be reused across different projects. It's a good habit to write Doxygen for a reusable class~

#ifndef LOSANT_PINGPONG_HPP
#define LOSANT_PINGPONG_HPP

#include <Losant.h>

/**
 * \brief end-to-end Losant connectivity test
 *
 * 1. Losant device: create state variable `act` of string type
 * 2. Losant workflow: when device reports `{{act}}=='ping'`, send a device command 'pong'
 * 3. sketch globals: declare LosantPingPong instance
 * 4. sketch loop(): invoke LosantPingPong::loop()
 * 5. sketch Losant command handler: invoke handlePong when command name is "pong"
 */
class LosantPingPong
{
public:
  /**
   * \param pingInterval interval between pings, in millis
   * \param pongMissThreshold how many missed pongs causes device.disconnect
   */
  explicit
  LosantPingPong(LosantDevice& device, int pingInterval = 10000, int pongMissThreshold = 6);

  /**
   * \brief send ping request to Losant, and reset if too many missed pongs
   *
   * Every ping should be responded with a pong. If not, it's counted as a missed pong.
   * When the number of missed pongs exceeds a threshold, the LosantDevice is disconnected.
   * Other code is responsible for reconnecting.
   */
  void
  loop();

  void
  handlePong(LosantCommand* cmd);

private:
  LosantDevice& m_device;
  const int m_pingInterval;
  const int m_pongMissThreshold;
  unsigned long m_lastPing;
  bool m_hasPong;
  int m_nMissedPongs;
};

#endif // LOSANT_PINGPONG_HPP

And here's the implementation of this class. The LosantPingPong::loop function checks whether a ping is due. If a ping is due, the function checks whether a pong has been received for the previous ping. If a pong is missed, we increment the n_nMissedPongs counter, and disconnects the MQTT client if this counter goes over the threshold; if a pong has been received, we reset the n_nMissedPongs to zero because it's meant to indicate the number of missed pongs in a row , not the total missed pongs. Finally, we send the ping as a state report to the Losant broker.

#include "LosantPingPong.hpp"

LosantPingPong::LosantPingPong(LosantDevice& device, int pingInterval, int pongMissThreshold)
  : m_device(device)
  , m_pingInterval(pingInterval)
  , m_pongMissThreshold(pongMissThreshold)
  , m_lastPing(millis())
  , m_hasPong(true)
  , m_nMissedPongs(0)
{
}

void
LosantPingPong::loop()
{
  if (millis() - m_lastPing < m_pingInterval) {
    return;
  }

  if (!m_device.connected()) {
    m_lastPing = millis();
    m_hasPong = true;
    m_nMissedPongs = 0;
    return;
  }

  if (m_hasPong) {
    m_nMissedPongs = 0;
  }
  else {
    ++m_nMissedPongs;
    if (m_nMissedPongs >= m_pongMissThreshold) {
      m_nMissedPongs = 0;
      m_device.disconnect();
      return;
    }
  }

  StaticJsonBuffer<200> jsonBuffer;
  JsonObject& root = jsonBuffer.createObject();
  root["act"] = "ping";
  m_device.sendState(root);

  m_lastPing = millis();
  m_hasPong = false;
}

void
LosantPingPong::handlePong(LosantCommand* cmd)
{
  m_hasPong = true;
}

Finally, we integrate the LosantPingPong class into the sketch. You may have noticed that the class would only disconnect the MQTT client, but make no attempt to reconnect it. The sketch is responsible for reconnecting when it detects the MQTT client is disconnected (!device.connected()).

#include <Losant.h>
#include "LosantPingPong.hpp"
// other #includes

// WiFi credentials, etc

LosantDevice device(LOSANT_DEVICE_ID);
LosantPingPong losantPingPong(device);

void
handleCommand(LosantCommand* command)
{
  if (strcmp(command->name, "pong") == 0) {
    losantPingPong.handlePong(command);
  }
  // handle other commands
}

void
setup()
{
  // other setup steps

  device.onCommand(&handleCommand);
}

void
loop()
{
  if (WiFi.status() != WL_CONNECTED || !device.connected()) {
    // reconnect
  }

  device.loop();
  losantPingPong.loop();

  // other logic
}

Next steps

This feature has been running fine in my Losant Builder Kit. The bad WiFi of my apartment still disconnects the ESP8266 from time to time, but it has always been able to reconnect whenever possible.

The protocol and implementation can be improved in several aspects:

  • Both ping and pong should carry a sequence number, so that the device can ensure the pong is in reply to its most recent ping, instead of a duplicate message from sometime earlier.
  • Interval between pings should be randomized around the configured value, to avoid network congestion caused by synchronization.
  • The ping can be piggybacked onto another state report (such as the temperature report) to save bandwidth.
  • Likewise, the pong can also be piggybacked onto another device command, although this may be harder to implement.

Also, pings from a device along with Disconnect events from MQTT broker can be recorded, and used to visualize device uptime. I'll cover this idea in a future post.

UPDATE 2016-05-16: Brandon points out, MQTT has a ping packet, which is sent every 15 seconds by Losant Arduino SDK, and the MQTT client disconnects if ping is timed out. Therefore, LosantPingPong is unnecessary.