We recently started an eval with a product from NetQoS called SuperAgent. Basically, this device passively monitors application performance. It sits off a span or mirror port on the network. We are also using a product called Gigastor which feeds mirrored traffic to SuperAgent as well.
One thing to remember about SPAN configurations is that you have to be very selective about what you are spanning, because mirror ports can be quickly overwhelmed. A lot of times you know what servers you want to mirror traffic from, so you just span all of those source ports or even worse an entire source vlan.
The other thing to keep in mind is that you need to span in such a way to prevent duplicate packets if possible. If you start blindly mirroring all your servers, you may find yourself with duplicate packets which can throw off your packet loss and retransmission delay metrics. You have to remember that when mirrored servers talk to each other, you will certainly end up with duplicates. If you have a multi-tiered application where you have front-end servers talking to back-end, you definitely want to clean up those SPANs!
The most common method of traffic mirroring is using SPAN or RSPAN with Cisco switches. Using a network tap is also a very good method as well.
Here is an example of a scenario where you need to take care in what ports you choose to mirror.
Here you have users communicating with an application that uses two tiers. The front-end queries the back-end server for the requested info from the client. All communication from the client is to the front-end server. In this case, if you choose to mirror both server ports, you will get duplicate packets due to communication between servers. For example, when the front-end sends a SYN to the back-end you see it once going into the front-end port and again when the traffic goes out the back-end port.
Since users don’t directly talk to the back-end. we really only need to span in one place. Simply span only the front-end server port (in both directions). As long as the front-end and back-end servers are on the same LAN with minimal latency between their connected ports, this would work. For example, when the front-end sends a request for data to the back-end, we would still be able to monitor the time it takes to get a response (among other metrics).
In our case, we have a more complicated application we are trying to monitor. Here is an example
Users send requests for data to the Application server which then has to ask the DB where the data exists. The DB responds and tells it to talk to the NAS and provides the metadata location. Before providing the data location on the NAS to the clients, it checks the location ensuring the data exists.
The second half of the total transaction time is the application responding to the client telling it where to connect to on the NAS for its requested data. So you end up with two pieces: how long does it take to find out where to go, and then how long to get the data from the NAS.
In our case, we are using RSPAN because the servers are all on different switches. RSPAN basically takes mirrored traffic and spits it out to a vlan of your choosing. Keep this in mind because the span traffic now flows through the network and can suffer delays possibly affecting accuracy of your timings.
In this case, we want to makes sure we clean up our span to minimize traffic going to the span destination port. We also want to filter any duplicate packets.
Here’s a sample config using VACLs. Using a vlan access list allows you to filter the traffic before spitting it out on the RSPAN vlan. This is a great feature and makes filtering duplicates possible.
We span both directions on the Application and the NAS server because they have clients connecting to those ports. We don’t have to span the DB server because we will see the communication between the DB and Application server from the application span.
Since they are on different switches we can span to the same RSPAN vlan.
App server config ::
! interface needs to exist for VACL and can be shut
interface vlan501
shut
Vlan 501
remote-span
monitor session 1 source interface frontendserver
monitor session 1 destination remote vlan 501
! filter out traffic to/from NAS server
ip access-list extended app_2_nas
permit tcp host NAS host APP
permit tcp host APP host NAS
! monitor all other tcp traffic
ip access-list extended all_TCP
permit tcp any any
! define VACL
vlan access-map RSPAN-501-ACL 10
match ip address app_2_nas
action drop
vlan access-map RSPAN-501-ACL 20
match ip address all_TCP
action forward
! apply VACL
vlan filter RSPAN-501-ACL vlan-list 501
No VACL filters needed on NAS port.
There are definitely a lot of different ways and places to configure the VACL, but this way works and keeps it simple. You could certainly SPAN on all ports, which would require usage of VACLs in other places.
For those trying to using the VACL feature but don’t need to use RSPAN, use a feature called VACL capture.
See the links below for CATOS and IOS examples. It’s basically the same thing but you don’t run into only being able to have x amount of spans configured on a switch. You are actually applying filters to production vlans so make sure you get the ACLS right!
February 10, 2009 at 9:49 am
You should definitely take a look at the ACE Live 4100 that was recently announced from OPNET.
ACE Live is an end-to-end solution that spans monitoring, measurement, and detection of application performance violations, providing visibility of all transactions and users across the enterprise. The new ACE Live 4100 introduces Targeted Forensic Archiving, an intelligent, efficient mode of data capture and storage that facilitates root cause analysis of application performance problems with lower cost of ownership than traditional modes of packet data storage. The ACE Live 4100 is able to automatically store all packets, even in extreme burst conditions, capturing packets to disk at high speeds, and including support for 10G interfaces.
Please contact Matt MacArthur (mmacarthur@opnet.com) to learn more about it!
February 10, 2009 at 11:32 am
Hi, we are also getting our eval in for the opnet products as well. We will definitely be comparing them both!
February 11, 2009 at 10:50 pm
Ted - you just got sales guyed - How many combos you got on your SA? A true network admin rolls 2 mil combos - jk - Get a NV and discover the entire Internet
February 12, 2009 at 4:31 pm
heh, Brings back classic memories… Give Darran an SNMP poller and the first thing he does is try to discover every device on the internet. Then finding out he’s the reason why our wan link is congested. When I asked him why he is discovering the whole internet, he says, “dude, bra… why can’t I discover the internet?”
Darran- 1, Wan link- 0
Good times!