Recently we had some issues with multi-DC Cassandra clusters — latency and timeouts that had only started since we created the second datacenter.
The development team had been tasked with ensuring they were using a DC aware load balancer and local consistency, to ensure they only talked to the local DC rather than the second DC, which was with a different cloud provider in a different country.
Using OpsCenter I could see there were some reads going directly to the new datacenter, but the development team could not find where they were coming from. Time to use ngrep — which describes itself as “like GNU grep applied to the network layer.”
Installing ngrep
apt install ngrep
Capturing Connections on Port 9042
ngrep " " -d any -x dst port 9042 and dst host xxx.xxx.xxx.xxx
" "— grep for anything (no string filter)-d any— check all network devicesdst port 9042— limit to packets with destination port 9042 (the CQL native transport port)dst host xxx.xxx.xxx.xxx— limit to packets arriving at this node’s IP
This will display all incoming connections to this node on port 9042. You’ll see messages like:
T zzz.zzz.zzz.zzz:42768 -> xxx.xxx.xxx.xxx:9042 [AP]
Here zzz.zzz.zzz.zzz is the node sending the CQL request. You’ll also see the CQL being sent in the message, giving you a good idea where the requests are coming from.
Filtering Out OpsCenter
If you are using DataStax OpsCenter, the DataStax agent uses CQL to talk to Cassandra on the local node, and OpsCenter itself will also connect to the cluster. To remove these packets from the output:
ngrep " " -d any -x dst port 9042 \
and dst host xxx.xxx.xxx.xxx \
and not src host xxx.xxx.xxx.xxx \
and not src host yyy.yyy.yyy.yyy
Where yyy.yyy.yyy.yyy is the IP address of OpsCenter.
This gives you a clean view of which application clients are actually connecting to each node — making it straightforward to identify which services are ignoring your DC aware load balancing policy and connecting to the wrong datacenter.