RADIUS Accounting Audit

Over the summer, the I deployed ClearPass at UPenn to take over the authentication services for the wireless authentication. This was a major move for us, and puts UPenn in a great position for future use of the functionality ClearPass has to offer as we start to tackle more segmentation and network security projects.

One of the requirements our InfoSec team has is being able to attribute users who are using our network. This is accomplished by correlating logs from our DHCP servers, NAT boxes, and RADIUS Accounting through Splunk. ClearPass already collected RADIUS accounting data from our controllers, so exporting it to syslog was fairly easy. So, I set this up, sent the logs off to Splunk, and moved on to other projects.

A couple of months later, the InfoSec team came to me about a number of attributions not working. Doing some initial troubleshooting, I discovered there were missing entries in the ClearPass database for users actively on the network, but I didn't know to what extent...so in I went...

Method of Attack

After doing some brainstorming on what the possible problems could be, I narrowed the problem down to three different potential areas:

  1. Issues between the controller and ClearPass

    Maybe the controller isn't sending the right information? Maybe the ClearPass box isn't acknowledging the Accounting-Request packets?

  2. Issues getting the RADIUS accounting packets into the database

    Maybe there is a load issue? Maybe packets are getting to the ClearPass boxes but never making it to the database?

  3. Issues getting the database entries exported to syslog

    Maybe the export to syslog is missing some entries? Maybe the syslog aggregator is dropping some syslog entries it's receiving from the ClearPass box?

Each point in the chain was dependent on the previous point. So, I started systematically tackling each point, and proving there was either an issue, or there wasn't an issue.

Validating Packet Flow

The first step was to confirm the controllers were sending the correct accounting traffic, and that the ClearPass box was getting the traffic.

Most of my analysis was based on packet captures from our Aruba controllers. A packet capture of the control plane can be very easily done with the following commands:

packet-capture destination local-filesystem
packet-capture controlpath udp 1813
(wait a few minutes)
packet-capture copy-to-flash controlpath-pcap
no packet-capture controlpath udp 1813
copy flash: controlpath-pcap.tar.gz <destination>

It's always important to not leave packet captures running unattended on network electronics. I once cause a firewall outage because of that.

To do my initial analysis, I just wanted to verify the correct traffic for my own wireless device. Wireshark make it super easy to filter out the traffic based on different RADIUS attributes. I used this filter to find my phones wireless accounting data:

((radius.code == 4) && (radius.Calling_Station_Id == "94E61D083CBC"))

This returned all of the RADIUS Accounting-Request packets associated with my wireless device. I was able to ensure I was getting a session start and stop for the duration I ran the packet capture.

Great, the controller is sending the correct accounting data for one device, but what about all the devices? This is a much more complex problem, and I move forward on the assumption of if one device is working, then all devices are working. This was an assumption I could come back and tackle if needed later.

The last part of validating the packet flow was to ensure the ClearPass box was sending Accounting-Responses for every Accouting-Request from the controllers. This could either done programmatically, but Wireshark provides a tool to do this analysis. Under Statistics -> Service Response Time -> RADIUS, an analysis of the traffic in the PCAP will be displayed. One of this is the number of unanswered requests.

map to buried treasure

Screenshot of Wireshark Analysis

With only a 2% non-response rate, I concluded ClearPass was correctly responding to the controllers. With the timing of the packet capture, it was completely possible some of the responses were not captured.

At this point, I concluded there was no issues with the communications between the controllers and the ClearPass box, so I moved on to my next theory, ClearPass having issues getting the Accounting packets into the database.

Validating The Database Entries

I went through a couple of different ideas about how to tackle this, but I decided all I needed to prove was a RADIUS Accounting-Request packet comes into to the ClearPass server and gets written to the database.

Accessing the Database in ClearPass

ClearPass provides the ability to access the underlying Postgres database. This ability was the key to being able to do my analysis. Using pgAdmin, I was able access all four of my ClearPass boxes to pull RADIUS accounting logs from the database. .. pgAdmin: https://www.pgadmin.org/

Looking through the table structure, I found the tips_radius_accounting_log table which contained the logs I needed. I used this SQL query to get the entires I needed:

SELECT * FROM tips_radius_accounting_log
   WHERE (timestamp >= '<!--START TIME-->' AND timestamp <= '<!--END TIME-->') AND
      nas_ip_address = '<!--NAS IP ADDRESS-->';

This will return all of the accounting entries the ClearPass box knows about. I got the start and end times from the first and last packet in the packet capture. The NAS IP address I also from inside of one of the accounting-request packet just to ensure I had the correct NAS IP address.

I ran this query on all of the ClearPass subscribers and exported the results in CSV format. I concatenated the files together for future processing later. I had to run this on all the ClearPass boxes as we load balance across all of our subscribers and I wanted to make sure I got all of the possible entries.

Getting the data out of the packet captures

There were a couple of different ways I thought about doing the comparison between the data in the packet capture and the data in the database. The manually intensive process was to go through the packet capture packet by packet and look it up in the database. Now, I don't have weeks to complete this task, so I went for the programmatic method.

Going into this, I hadn't done much PCAP programming in Python before this project. There's no time like the present to learn. I dove into Scapy to extract the information from the packet.

Scapy provides a very nice interface to be able to easily grab the RADIUS attribute data from the packets. I wrote a simple algorithm to extract the data from the packet, and build an array of the packets for further analysis.

for i in PcapReader(PCAP_FILE):
    packet = {}
    packet["src_ip"] = i[IP].src
    packet["dst_ip"] = i[IP].dst
    if packet["src_ip"] in REJECT_ADDRESSES or packet["dst_ip"] in REJECT_ADDRESSES:
        continue
    try:
        packet["type"] = PACKET_TYPE[i[Radius].code]
    except Exception:
        continue
    if packet["type"] == "Accounting-Response":
        continue
    for attribute in i[Radius].attributes:
        key = get_attribute_name(attribute)
        if type(attribute.value) is bytes and "x" in str(attribute.value):
            packet[key] = attribute.value.hex()
            if key in _CONVERT:
                packet[key] = int(packet[key], 16)
        else:
            packet[key] = attribute.value
        packet[key] = transform(key, packet[key])

     packets.append(packet)

As you can see in the code snippet above, I'm ignoring the Accounting-Response as these don't have any baring on my analysis. Long with that, I'm also ignoring RADIUS packets from other servers which aren't apart of this work.

Comparing the database to the packets

With the information extracted from the packets and the database, it was now time to compare the data entries. Looking through the two sets of data, there were three fields which made every entry unique: Acct-Session-Id, User-Name, Acct-Status-Type.

I extracted this information from both the packets and the database entries, and formed a tuple for doing the actual comparison:

#Build the packet hashed array
for i in packets:
    try:
        if 'Acct-Status-Type' in i.keys():
            if str(i['Acct-Status-Type']) in ['Start', 'Stop']:
                hashed_packets.append(
                    (
                        str(i["Acct-Session-Id"]),
                        str(i["User-Name"]),
                        str(i["Acct-Status-Type"]),
                    )
                )
    except Exception as e:
        print(e)
        continue
#Build the CPPM database hashed array
for i in cppm:
    try:
        hashed_cppm.append(
            (
                str(i["acct_session_id"]),
                str(i["user_name"]),
                str(i["acct_status_type"]),
            )
        )
    except Exception:
        print("FAILED: {}".format(i))

With the two harsh, I ran two different comparisons. The first was to make sure all of the database entries were in the packet array, and then the reverse:

i = []
o = []
for entry in hashed_cppm:
    if entry in hashed_packets:
        i.append(entry)
    else:
        o.append(entry)
print("Number of CPPM in Packets: "+str(len(i)))
print("Number of CPPM not in Packets: "+str(len(o)))
i = []
o = []
for entry in hashed_packets:
    if entry in hashed_cppm:
        i.append(entry)
    else:
        o.append(entry)
print("Number of Packets in CPPM: "+str(len(i)))
print("Number of Packets not in CPPM: "+str(len(o)))

Sadly, the number were not that promising when I ran the script:

Number of CPPM in Packets: 1182
Number of CPPM not in Packets: 59
Number of Packets in CPPM: 1203
Number of Packets not in CPPM: 6620

The results of the script show there was a number of accounting entries not being written to the database. Off to TAC I went...

After Thoughts

While the problem we have is a pain in the rear end, doing this analysis work was pretty fun and interesting. I have stashed the code in a git repo (radius-accounting-audit) for future use and for anyone else who might be interested in doing a similar analysis.