Analyzing cowrie honeypot results


Recently I decided to spin up a honeypot for no better reason than “because I can”. After some admittedly quick searching I found the Cowrie SSH and Telnet Honeypot.
According to Cowrie’s Github Readme:

Cowrie is a medium interaction SSH and Telnet honeypot designed to log brute force attacks and the shell interaction performed by the attacker.

Cowrie is developed by Michel Oosterhof.


So I set it up and left it running over a 48 hour period.

And now it’s time to see what we caught. We could use Splunk or some other visualization tool, but in the spirit of learning. I’ll attempt to use python wherever possible to analyze the data.

Initial analysis.

The first task is reading the json formatted log file into python so we can begin analyzing what has happened.
This stumped me for a while because the log files (cowrie.json) do not follow standard json format when you read the whole file in.
This is because each event is logged as it’s own root element. However each root element is on its own line. So we can simply read the file line by line as we import the logs file.
The following python code will do the trick. But first we will use the cat command for something it i’ve rarely used it for, and that is to actually concatenate all of our .json logs into one file for python to read:

$ cat cowrie.json cowrie.json.2018-05-17 cowrie.json.2018-05-18 > combined.json
import json

data = []
with open("./Logs/cowrie.json") as f:
    for line in f:

We now have an array of dictionaries in our data variable, which we can prove like so:

>>> print type(data)
<type 'list'>
>>> print type(data[1])
<type 'dict'>

Which we can reference like any other array of dictionaries:

>>> print data [1] ["message"]
Remote SSH version: SSH-2.0-Go

Ok, now that we have access to all of our log data, lets grab some simple statistics about our honeypots attackers.
Warning! the following code is messy and could definitely be written better. But I’m a python new guy so we’re stuck with the basics.
First off we write four functions like so:

def remove_duplicates(seq, idfun=None): 
   if idfun is None:
       def idfun(x): return x
   seen = {}
   result = []
   for item in seq:
       marker = idfun(item)
       if marker in seen: continue
       seen[marker] = 1
   return result

def get_ip_location(ip):

	url = ''+ip+'/json'
	response = urlopen(url)
	data = json.load(response)

	city = data['city']

	return country, city

def get_lists(json_data, key):
	full_list = ""
	for x in range(0, len(json_data)):
			full_list += str(data[x] [key]) + "\n"
	deduped_list = remove_duplicates(full_list.split())
	#if (key == "src_ip"):
	#	deduped_list.sort(key=lambda s: map(int, s.split('.')))

	return full_list, deduped_list

def get_top_ten(full_list, deduped_list):
	count_dict = {}
	for item in deduped_list:
	    count_dict[item] = full_list.count(item)
	top_ten = sorted(count_dict, key=count_dict.get, reverse=True)[:10]
	return top_ten, count_dict

There’s nothing insane about them but here’s a quick breakdown of each function:
# remove_duplicates – Takes in a list (during the get_lists function) and returns another list with all duplicate entries stripped out
# get_ip_location – Takes an IP address as an argument, connects to the API and returns the country and city of origin for that IP
# get_lists – Takes our JSON log and a dictionary key to search for and builds two lists. One is every occurrence of that key in the JSON log, and the other is a duplicate free list of every occurrence.
# get_top_ten – Takes the two lists generated from get_lists and uses them to build a dictionary containing the amount of identical entries in the JSON log, and a list containing just to top ten by frequency.

Great. So lets just build up a couple of lines to use these functions and pull back information about the top ten IPs, usernames and passwords!

## Get a list of all IP addresses that connected to the honeypot.
## And a deduplicated list too, Then calculate the top ten IP addresses
## and how often they connected
ip_list, deduped_ip_list = get_lists(data, "src_ip")
top_ten_ips, ip_freq_dict = get_top_ten(ip_list, deduped_ip_list)

## Get a list of all passwords used and a list of deduplicated passwords
## Then use those to calculate the top ten passwords and how often they were used
password_list, deduped_password_list = get_lists(data, "password")
top_ten_passwords, pass_freq_dict = get_top_ten(password_list, deduped_password_list)

## Get a list of all usernames that were tried 
## Figure out the top 10 users by connection attempts
user_list, deduped_user_list = get_lists(data, "username")
top_ten_users, user_freq_dict = get_top_ten(user_list, deduped_user_list)

and now we just need to print out our results in a semi pretty way…

## Print out all the sweet sweet data
print("\n\n\t\tHONEYPOT ANALYSIS\n\n")
print("Total unique attacker IPS: {}").format(len(deduped_ip_list))
print("Top 10 attackers by IP:\n")
for x in top_ten_ips:
	print("\t{:15}\t{}").format(x, ip_freq_dict[x]),
	country, city = get_ip_location(x)
	print '\t\t{}\t{}'.format(country, city)
print("\n\tMost common usernames\tMost common passwords:\n")
for x in range(0,10):
	print("\t{:5}\t{}").format(top_ten_users[x], user_freq_dict[top_ten_users[x]]),
	print("\t{:8}\t{}").format(top_ten_passwords[x], pass_freq_dict[top_ten_passwords[x]])

Which gives us the following results!

$ python 


Total unique attacker IPS: 594
Top 10 attackers by IP:

	IP		Connections	Country	City  	30220 		US	Buffalo   	9444 		CA	Montreal    	9215 		RU	St Petersburg 	7169 		FR	Roubaix   	6633 		IE	Macroom   	4884 		IE	Macroom   	4389 		IE	Macroom   	4361 		IE	Macroom    	4311 		RU	St Petersburg  	3985 		GB	Coity

	Most common usernames:	Most common passwords:

	er      	7870 	1       	60653
	se      	7682 	111     	14792
	user    	7579 	1111    	14774
	a       	2142 	111111  	7391
	root    	823 	1111111 	7377
	min     	374 	11111111	7374
	adm     	357 	a       	1763
	admin   	349 	12      	1169
	f       	330 	123     	1143
	es      	214 	1234    	817

Leave a Reply

Your email address will not be published. Required fields are marked *