In my younger years I had all sorts of collections: books, bottle caps, rocks, coins, action figures, movies, etc. Now, most of those collections sit at the back of my parents shed – although I do still have a tendency to hoard movies.
When I first began developing my malware zoo I was mostly populating it with:
- Samples pulled down from the wild using Rag Picker.
- Samples from my honeypots (you can find my “instant honeypot” Vagrant script here).
- Samples handed to me by staff in my company.
The issues being:
- Rag Picker hasn’t been updated in about a year. Many of the sources are no longer updated or no longer online, and it’s unnecessarily complicated.
- Most of the samples my honetpot captures are months old – although it does get the occasional unknown.
- I want more than the two samples I get from staff most days.
I quite liked the approach that Rag Picker had, but felt there was a plenty of room for improvement. So, I began redeveloping it.
A few days of development led to the initial release of ph0neutria. Named after the aggressive Brazillian Wandering Spider, ph0neutria made a few improvements on what Rag Picker presently offered:
- It limits the scope of crawling to only what’s defined in reputable and frequently updated lists.
- It offers only a single, reliable and well organised storage mechanism: Viper Framework.
- It does not attempt to do work that can instead be done by Viper.
At present, ph0neutria sources lists from MalShare, Malc0de and VX Vault. Only the lists can be sourced from Malc0de and VX Vault, whereas the MalShare API can also be used to obtain binaries where they have been removed from their original source. As the MalShare API only permits 1000 sample requests per day (although most days there is no more than this offered), I’ve made a few configuration options that can help reduce reliance on the API (and thus reduce load on their API):
- Attempt to retrieve files from the wild first.
- Retrieve files only from the wild.
- Disable MalShare and only retrieve files from the lists obtained from Malc0de and VX Vault.
The work flow is fairly simple:
- Obtain the lists.
- MD5 hash list: used to retrieve files from the MalShare API. It’s first ensured that there’s no file in Viper with the same hash.
- URL lists: used to retrieve files from the wild. As Viper currenlty converts all tags to lowercase, a hash is taken of the URL and stored as a tag so that it can be later used to verify if a file from a specific URL has been loaded into Viper.
- Pull down the files to a temporary location.
- If required, generate an MD5 checksum of the files and ensure they aren’t already in Viper.
- Load new samples into Viper with the following tags:
- MD5 checksum of file.
- Source domain.
- Source URL.
- MD5 checksum of URL.
- Interact with the files in Viper:
- Download to disk.
- Send them to Cuckoo.
- Send them to VirusTotal.
- Extract strings, metadata, etc.
- Parse for shellcode patterns.
- Scan with Yara rules.
It really is an awesome framework.