Brute Force vRealize Network Insight (vRNI)
- Part 2: Install a vCenter Data Source
(Recap from Part 1) Well, two reasons actually. One, I have always thought vRNI was really cool and valuable and two, I signed up to teach one of the VMware Customer Connect webinars on vRNI Cloud. I know enough about vRNI to completely embarrass myself in front of complete strangers so it was time to dig in.
According to the VMware vRNI Cloud User's Guide: "vRealize Network Insight Cloud delivers intelligent operations for software-defined networking and security."
Micro-segmentation is definitely vRNIs advertised use, but I think vRNI is also really helpful for the kind of Application Dependency Mapping required for data center/cloud migration and D/R planning. It is important to know what hosts/VMs are talking to what hosts/VMs before moving any of them.
(Please read Part 1 about gaining access to vRNI Cloud and adding an AWS Data Source. We will focus on adding a vCenter source in this post.)
Connect vRNI to vCenter
Connecting vRNI to vCenter was much easier than connecting vRNI to AWS. The vCenter flows come from vSphere Distributed Switches, which I do not have, so I hacked that part.
Add an On-Premises vCenter Collector
Select Infrastructure and Support from the Settings section in vRNI Cloud.
Select the "Add Collector VM" button
If you haven't already downloaded the vRNI Collector VM OVA from my.vmware.com, select the DOWNLOAD button to get a copy. I won't go through the steps to install an OVA. You have probably done that before. If not, look here. One of the OVA install screens will ask for the unique shared secret that vRNI cloud is displaying. Copy the secret from the vRNI Cloud screen and paste it into the OVA install screen.
When you boot your new vCenter Collector VM for the first time, you will have a little setup to do at login. Login as the user "consoleuser" with the password "console". Type "setup" at the command prompt and then [ENTER]. Fill in the password, networking and NTP answers. When you have answered all the questions successfully, your new Collector VM will register with vRNI Cloud and the vRNI Cloud screen will update with this confirmation. You can then select "CLOSE" on the VRNI Cloud screen.
Your new Collector VM should be listed and after a few minutes, you should get a green status.
Add vCenter as a Data Source
Select Account and Data Sources from the Settings section in vRNI Cloud.
From the next screen, select ADD SOURCE
Choose your new Collector VM from the drop down list. Fill in the details of your vCenter server. If you have vSphere Distributed Switches (I do not), select the checkbox for "Enable NetFlow (IPFIX) on this vCenter and select your vDS from the list. Give your vCenter Data Source a name and select SUBMIT.
You will now get vCenter inventory in vRNI Cloud. If you have a vDS, you will also get NetFlow data which we need for ADM and micro-segmentation.
Hack: Collecting VM Flow Data Without vSphere Distributed Switches
So, to analyze flows, we need to collect flows. What if you don't use vSphere Distributed Switches which support NetFlow? Well, there is an open source NetFlow forwarder called softflowd that you can use.
Install the vRNI NetFlow Collector
This is pretty easy. Follow the same steps you did to install the vCenter Collector, but instead choose "Physical Flow Collector" as a Data Source. You even use the same source OVA file to create the new NetFlow Collector VM.
Install a Linux NetFlow Forwarder
I followed these steps on one of my running Debian/Ubuntu Linux VMs to install and run the softflowd NetFlow forwarder:
- $ sudo apt update
- $ sudo apt install softflowd
- $ sudo nohup softflowd -n 192.168.1.54:2055 -i ens192 -d
If you have any physical switches on your network that support NetFlow, you can configure them to send flows to the address of your NetFlow collector VM and port 2055.
A Word on Firewalls
So, I was having trouble getting the flows from my Linux VM to my NetFlow Collector VM. What finally worked for me was to turn off the Linux firewall on both VMs and reboot both VMs. These are the steps I used:
- $ sudo systemctl stop ufw
- $ sudo systemctl disable ufw
- $ sudo shutdown -r now
Take a look at some VM flows in vRNI
Once the flows are being successfully forwarded to the NetFlow collector, your Collector VM should show green in Infrastructure and Support and there should be a number in the corresponding "Flows" column in Accounts and Data Sources.
Let's start simple by looking at the vCenter environment discovered by vRNI Cloud.
vRNI > Environments > VMware vCenter
Let's sort the VMs by Total Network Traffic descending to find our top talkers
Sort > Metrics > Total Network Traffic and then select ASC to switch from ascending to descending
Here are the top talkers over the last 24 hours in my Home Lab including the total network traffic through each VM.
In my case, ubuntu-nuc hosts my very busy TIG (Telegraf, InfluxDB, Grafana) stack and is collecting data from all virtual and physical hosts and even Plex usage. This is my my workhorse VM.
ubuntu-zm runs my Zoneminder IP camera system and mostly just sends alerts to my phone when someone drives up to the house 😊
The Easy "Top Talker" Method
There is a built-in "Top Talker" display in the Analytics side bar. Choose Analytics > Flow Insights
By default, we get an analysis of the busiest "pair" of VMs - in this case, my two AWS VMs that copy an entire website between them once a minute.
If we change the Source to be "VM", we see out old friend ubuntu-zm
Applications can be auto-discovered by vRNI or manually discovered by tier. I created the simplest type of tiered application definition for my TIG Stack (Telegraf, InfluxDB, Grafana). The definition looks like this:
When I bring up the Applications screen, I see that the TIG application has experienced an anomaly. Let's dig into that.
Selecting the TIG Stack on the right brings us to this overview
We see that the telegraf tier is showing an anomaly. Selecting the word "Critical" under alerts brings us to this page:
Let's choose TROUBLESHOOT and then START NEW to find root cause. It looks like the app in general and the Telegraf tier in specific were experiencing Network Traffic Rate anomalies. Let's choose ANALYSIS for the Application anomaly.
It looks like the telegraf feeds to the InfluxDB had a few high spikes. We should keep an eye on this for a few days to see if the spikes continue.
So vRNI Cloud is pretty awesome - especially if you have more supported devices in your environment than I do. Application Dependency Mapping and network root cause analysis are not easy. vRNI Cloud makes them easier. As always, thank you for taking the time to read this post. I hope you found this helpful. I welcome your feedback.