What if a Computer Lies?: February 2007

Wednesday, February 28, 2007

A Crawler-based study of Spyware on the Web

This paper examines the threat of malicious spyware from an Internet perspective. The researchers performed a study of the Web using a crawler on executables and conventional Web pages for malicious objects.

In order to determine the spyware infected executables in the Web, they first determined whether a Web object contains executable software, then downloading, installing and executing the software on a virtual machine and finally analyzing whether the installation or the execution of the software caused a spyware infection.

They also talked about the types of spywares found such as Adware, keyloggers, Trojan downloaders, browser hijackers, dialers.

Certain defense mechanisms against spware such as signature-based tools and blacklisting were discussed in detail.
Signature-based anti-spyware tool is one of the most common defenses. It compares a database of spyware signatures to files and processes running on a client computer, it can detect when the computer is infected with known spyware programs.
Blacklisting: To keep a check on spyware, black-lists which contained URL’s or domain those are suspected to contain spyware. Hence easier for a firewall or proxy to block clients from accessing listed slides.

Then it goes on to explain Drive-by download attack, its infrastructure, point it originates, the infections it causes and the effect it has on FireFox browser.
· A Drive-by download is a program that is automatically downloaded to the user’s computer, often without the users consent or knowledge. It replaces the user's home page and changes browser settings.
· It occurs when a victim visits a Web page with malicious content.
· They examined URLS from eight different Web categories and calculated the fraction of URLs and domains that were infectious in each. They found that there were no drive-by download attacks in either “kids ”or ”news” sites whereas more attacks on the “pirate” sites.

Glossary
· A web crawler is a program or automated script which browses the WWW in a methodical, automated manner. It’s a type of bot or software agent. As the crawler visits the URLs, it identifies all the hyperlinks in the page and adds them to the list of URLs to visit.

Honeypot Systems

A Virtual Honeypit Framework - Niels Provos

Let us understand the concept of Honeypot in real life. Imagine a pot of honey lying uncovered. What do you expect will happen ? Naturally, bees will be attracted to it.

So, here the honey filled pot is the dummy computer which is used to lure the bees which can be any of the malicious codes from worms to spam mails. We want to study the way these 'bees' are attracted to the honey to learn more about them and their behaviour. The more we know about something, the better our defences against it.

So honeypot is not exactly a new invention. It has been widely used in computer security for a while now. Then you may ask what is the point of this paper ?

Well, the problem with honeypots are that they are expensive and tiem consuming. Hence the need for virtual honeypots - Honeyd.

Honeyd creates virtual honeypot networks. A virtual honeypot is simulated by another machine that responds to network traffic send to it. The system simulates TCP and UDP services and is used to simulate only the network stack section of the OS.

Design and Implementation:

i) Receiving Network Data : The architecture is based in such a way that all the virtual honeypots report to the Honeyd. Honeyd will only reply to those network packets which have destination IP address as the simulated honeypots.

ii) Architecture : It consists of various components - configuration database, a central packet dispatcher, protocol handlers, personality engine and optional routing component.

iii) Personality Engine : Adversaries are ever adapting and getting cleverer as their detection techniques get better. Any common fingerprinting tool like Xprobe or Nmap can look up a honeypot and establish it to be a dummy. Therefore we need to simulate network behaviour of a given operating system. It changes the options and other parameters of TCP header to match behaviour of network stacks.

iv) Routing Topology : We need to simulate routing topologies to decieve adversaries. We create a virtual routing topology and split it using GRE to tunnel networks. This allows load balancing.

Potential applications for Honeyd include network decoys, detecting and countering worms and spam prevention.

In conclusion, a regular PC can simulate thousands of honeypots using Honeyd, thereby reducing the expense and time component greatly.

Understanding the Network Level Behaviour of Spammers

A paper by Anirudh Ramachandran and Nick Feamster.

This paper studies the network level behaviour of spammers, including: IP address ranges that send the most spam, common spamming modes, persistence of each spamming host across time and characteristics of botnets sending spam. It is very descriptive in nature and gives a good background of previous related work in this area.

The researchers studied and analyzed a 17 month trace of over 10 million spam messages collected from the internet by creating what they call a "spam sinkhole". They then correlated this data with the results of IP-based blacklist lookups, routing information and botnet "command and control" traces.

Towards the goal of developing techniques for designing more robust network-level spam filters, the researchers attempt to characterize the network level behaviour of spammers as observed at at the "spam sinkhole" created by them for testing purposes; with complete logs of all spam received from August 2004 through December 2005.

It goes on to explain the previous work which has been done in this field and explains spamming methods and mitigation techniques for the same. Reading these parts proves extremely useful as one gets a good insight into the topic being dealt with.

The paper further goes on to explain the method used for collecting data by the "sinkhole". The researchers had configured the sinkhole to accept all mail, regardless of the username it was destined for, and also to gather the network-level properties about the mail relay from which spam is received. The following information is collected about the mail relay once spam is received:

1. The IP address of the relay that established a connection to the sinkhole when the spam was received.

2. A traceroute to that IP address, to estimate the network location of the mail relay

3. a passive "p0f" TCP fingerprint, based on the properties of the TCP stack, to allow them to determine the operating system of the mail relay.

4. The result of a DNS blacklist lookups for that mail relay at eight different mail relays.

The paper then goes on to present the collected data in various formats and to derive a set of conclusions, which are listed as follows:

1. The vast majority of received spam arrives from a few concentrated portions of IP address space.

2. Most received spam is sent by windows hosts. Most bots sending spam are active for a simgle time period of less than two minutes.

3. A small sets of spammers continually use short-lived route announcements to remain untraceable.

Tuesday, February 13, 2007