When I moved to the US a decade ago, Home Improvement was a popular television sitcom, starring Tim Allen as Tim the tool man Taylor, a host of a made up home improvement show.
Among the support team, when we learn about a new debugging tool, we frequently joke about 'our network debugging tool belt'. Over the years, this tool belt has grown to a nice variety of debugging tools, especially when you include platform specific debugging tools. We currently support a great variety of platforms: from the enterprise platforms (Linux, Solaris, Windows) to various embedded platforms, (including VxWorks, LynxOS and Integrity).
Below is a list of the tools we frequently use on support. I am leaving out platform specific tools, tools to monitor how a particular network stack is processing network traffic (e.g netstat) and general operating system debugging tools (vmstat, cpustat, lsof, gdb, strace etc.).
One of the most common questions on support is: why are my two DDS applications not communicating? The answer to this question falls into two categories: it is either a general networking / connectivity problem, or it is a configuration problem. Let's take a look at the various tools we use to debug.
Tools to verify connectivity
1. ping verifies whether you have basic connectivity between two hosts. Not commonly known is that ping can also be used to generate multicast data. This is especially useful when your DDS discovery is using multicast and you are suspecting a multicast or time-to-live (ttl) problem.
For example, you can try to ping 220.127.116.11 which is the all-hosts group. If you ping that group, all multicast capable hosts on the network should answer, as every multicast capable host must join that group at start-up on all it's multicast capable interfaces. There is one gotcha, by default when running a Linux kernel 2.6, the capability of ICMP to reply to broadcast/multicast ping message is disabled. Whereas, it is enabled by default in Linux kernel 2.4. You can enable it by disabling icmp_echo_ignore_broadcasts (reference: http://kerneltrap.org/node/16225)
If you want to experiment with the time-to-live value many ping implementations allow you to configure this via -t or -T.
3. rtiddsping is a command line utility found in the scripts directory of your RTI Data Distribution Service installation. It basically does the same thing as ping, but not by using the ICMP protocol, but by using DDS. If you were able to ping the hosts and wonder whether the issue is a problem in how your DDS application is configured, then rtiddsping is a neutral application to verify whether the DDS ping /pong topic is working. Take a look at the various flags to modify the peerlist, ttl, multicast and reliability qos parameters.
Tools to analyze the data on the wire
4. One of my favorite network tools is probably Wireshark. Wireshark, previously called Ethereal, is a free packet sniffer and allows you to inspect the data on the wire. It will decode the packets based upon various protocol dissectors. The DDS wire protocol is commonly known as RTPS - real-time publish subscribe wire protocol. RTPS dissectors in earlier versions of Wireshark supported an early non-standard version of the protocol. RTPS2 is the dissector which supports the OMG standard of the wire protocol. RTI contributed the source of this dissector to the Wireshark community. We also offer it as part of our RTI Protocol Analyzer product. If you are using RTI Data Distribution Service 4.2 or later, use the RTPS2 dissector.
We use RTI Protocol Analyzer to debug discovery misconfiguration. You can observe which IP addresses are used during discovery announcements or which ports a particular node is contacting. You can also look at whether the reliability protocol needs to repair samples and how many. You are able to filter on Hearthbeats as well as corresponding ACK/NACK packets. Lastly the nice thing about it is that you can save it to a file to share with the RTI support team. We can partake in the analysis of why something isn't working as expected.
Make sure you are monitoring the data both at the send and receive side. RTI Protocol Analyzer can only capture what is on its wire. If you can not install RTI Protocol Analyzer on the publisher or subscriber, hooking up a simple hub is a good option. A switch, unless you enable port mirroring, won't do the job. (Doh, where did I leave my old hub now?)
5. snoop (Solaris) is a packet sniffer for Solaris. Unfortunately it does not dissect the RTPS protocol.
6. rtiddsspy is a command line utility, found in the scripts directory of your RTI Data Distribution Service installation, which allows you to automatically subscribe to publications in a specific DDS domain. You can observe both the discovery data as well as the user data. Take a look at some of the command line options to filter out particular topics.
7. RTI Recorder is a more complete data recorder. It can be configured to record specific topics and fields to a database which lateron can be queried using sql commands, or can be converted into XML, HTML or Excel. Since we often do not have access to your data source, this is best used in your test lab. You can then share the database file or converted file with us. The nice thing of recorder is that it records a lot more that the user data: also meta information such as time stamps are recorded.
Tools to analyzer the state or configuration of the system
8. We use RTI Analyzer when we suspect a configuration issue: e.g. a quality of service (QoS) mismatch. Another good use case is when you have typo in the topicname. DDS will consider this a different topic. Analyzer will quickly show you all the topics in your system, as well as the qos parameters used by the different DDS entities (reader, writer, publisher, subscriber). The match analysis shows you in one screen which reader/writers are compatible.
One common misunderstanding is that Analyzer can not show you all quality of service paramters of a specific entity. For example resource limits or batching configuration are not shown. This is because those are local configuration parameters and are not sent out during discovery. If the configuration is not sent out during discovery, RTI Analyzer does not know about it.
If you want to want to verify whether you configured your batching correctly, I recommend to use RTI Protocol Analyzer as it will distinguish between regular data and batched data samples. It will even show the indivual batched samples.