Abstract— in a competition hosted on Kaggle


Abstract— According to
AV vendors malicious software has been growing exponentially last years. One of
the main reasons for these high volumes is that in order to evade detection,
malware authors started using polymorphic and metamorphic techniques. As a
result, traditional signature-based approaches to detect malware are being
insufficient against new malware and the categorization of malware samples had
become essential to know the basis of the behavior of malware and to fight back
cybercriminals. During the last decade, solutions that fight against malicious
software had begun using machine learning approaches. Unfortunately, there are
few opensource datasets available for the academic community. One of the
biggest datasets available was released last year in a competition hosted on
Kaggle with data provided by Microsoft for the Big Data Innovators Gathering.
This paper presents two novel and scalable approaches using LeNet like
Convolutional Neural Networks (CNNs) to assign malware to its corresponding


Keywords— polymorphism,
convolution, activation, drebin Introduction

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!

order now

The inter
connectivity, accessibility and open nature of IT industry has proved to be a
loon for both developers and users. But it comes with some threats as well. The
most significant one is the spread of malwares. Malware referred to as
Malicious software in any software application that can infiltrate into a system
and access or damage resources without the owner’s consent. Malware is a
generic term that may be viruses, worms, Trojan horses, spyware etc.

Adware –
These are malwares which automatically shown the advertisement to the user.

Virus   – It is the software which can harm your
computer by generating its copy automatically. These can be send through
electronic mails.

Worm –
They can be send with the help of networks. They have tendency to
self-replicate itself and dissemination independently. On the other hand,
viruses spread when the user take part in this activity.

– These are the software’s which bypass the login credentials without detected
by the owner. One or more software’s can be installed into system for future

 The potential harm that may result from the
malware requires the anti-malware authors to stay a step ahead of the malware
authors. This paper describes the use of LeNet lie convolution neural network
for malware detection. Malware detection is a technique that is used to distinguish
between a malicious application from a being one. Not only this as, there are
lots of categories of malwares, malware classification is also important.

in malwre detection

In present scenario we detect the malwares
by signature based methods and this process is used by antivirus vendors form
last many years. Malware signature is a kind of algorithm which help us to
identify the type of the malware. when we identify the malware then it is so
easy to identify its family but hackers use the polymorphic engine and
metamorphic engine to stay step ahead form the anti-virus programmers. Lack of
open source dataset for malware poses a great challenge since success of a
machine learning algorithm largely depends on the quantity of the dataset used.
New malwares get inflected into the system with every tick of the clock.
Malware detection suffers with the problem akin to the problem in virus
detection in biological systems. The files liik different but actually belong
to the same family. The malware authors use polymorphism by virtue of which the
same binary file are modified such that they look completely different. This
makes use of traditional technique insufficient. Another challenge is the large
number of files that need to be investigated for proper detection. Thus,
needing very good computational efficiency.


I'm Gerard!

Would you like to get a custom essay? How about receiving a customized one?

Check it out