In recent years, the growth of AI & ML has been unprecedented. Recent studies predict the rapid adoption rate of AI across all business, estimating that the AI market will reach an approximate of $13 trillion by 2030.
To define it, artificial intelligence refers to the machine’s ability to simulate human intelligence without explicit programming. Machine learning is an application of AI that develops mathematical models based on training data and leverage these models to make predictions when new data is supplied.
For instance, you can train a computer-vision ML model using hundreds of image samples that have been labeled by humans, so that it is able to automatically classify images in an object.
Though the principles of AI and ML have been around for decades, it is only in the past couple of years that it has garnered attention. The recent surge in popularity can be attributed to two main factors: the growth of cloud computing and the availability of big data. With cloud computing, AI/ML algorithms can be run practically and big data has enhanced the effectiveness of AI/ML, enabling them to better in a number of applications than humans.
The reality of AI in cyber security
Cyber security is one of the most promising areas for AI/ML. Theoretically, if the machine is able to access all your data, both good and bad, then it can be trained to detect any anomaly or malware as soon as it surfaces. Practically, three are three core requirements for this to work.
- Availability of massive amount of data – Your model’s effectiveness depends on the volume of benign and malware data that you have.
- Data pipeline – You must have data engineers and scientists who are able to create a data pipeline for continuously processing the samples and design effective models
- Providing Insights – You must have security domain specialists who are able to categorize what is good or bad and also has the ability to explain why is that the case.
Most organizations that boast of AI/ML powered security solutions are lacking in one or more of these requirements.
Getting into the process
A fundamental principle of security is a defense in depth. The defense is depth relates to having a number of security layers and not depending on one single technology such as AI/ML. The recent hype around the new AI/ML enabled security endpoints touts that it can do it all. However, if you want to shield a user from cyber fraud, ensure that all user accessed content is scanned and that their system is updated regularly. The scanning of each and every file before you allow downloading needs the ability to stop SSL-encrypted communications between the destination server and the user’s client. Or else, the scanner will be imperceptive to it. As scanning all the file can consume a lot of time, latency can be introduced and lead to issues in user experience. Anyway, the rapid blocking of the threats and permitting the already white-listed stuff is effective in balancing user experience with security.
After known threat intelligence has been employed and no conclusion is available, we enter the space of zero-day threats which is basically unknown threats. Theses zero-day threats do not have recognizable signatures, so sandboxing is used for scanning such kind of unknown threats. Sand-boxing consists of the installation of a suspicious file in a virtual machine sandbox that imitates the end user’s system and then identifying whether the file is good or not based on the behaviour observed. This process can take several minutes. As you know, today’s users want quick results and loathe waiting. With a properly trained AI/ML model, results for such files can be obtained in milliseconds. Most new attacks use exploit kits and they might borrow exfiltration and delivery techniques from previous attacks. AI/ML models can be trained to identify these polymorphic variants.
One critical consideration while using AI/ML for detecting malware pertains to the ability to offer a reasonable explanation regarding the classification of a particular sample as malicious. For instance, if a customer demands an explanation on why a sample was blocked, the answer cannot be on the lines of ‘AI/ML said so’. It is important to have a security domain expert who is able to understand which behaviours/attributes got triggered and who is also able to analyze false positives/negatives. This is required not only to understand why a certain prediction was made, but also to continuously improve the accuracy of the model prediction.
Training AI/ML models
When we talk about the training of AI/ML models, the debate is regarding the use of supervised learning or unsupervised learning. Supervised learning, based on labeled data that is extracted to attain a prediction model. What this means for malware is that human specialists from data sets, take each sample and label them as good or bad, and perform feature engineering to find out what malware features are relevant to the prediction model before training. In unsupervised learning, patterns are obtained to determine structure from data that is not categorized or labeled. The proponents of unsupervised learning claim that this type of learning is not free from feature selection bias and is also not limited by the confines of human classification. The usefulness of unsupervised learning, however, remains to be validated.
The best security areas where AI/ML can help
There are some kinds of security challenges that suit AI/ML more than others. For example, let’s consider the example of Phishing detection. It has significant visual components. With the use of images, logos and other ‘look and feel’ elements, an adversary will be able to make a fake website to resemble its legitimate counterpart. The advancements in AI/ML vision algorithms has made it possible for the technology to identify fake websites that are designed to cheat unsuspecting users.
Additionally, AL/ML algorithms can be employed to detect anomalous user behavior, getting insight into what constitutes the user’s normal behavior and then flagging when there is a notable deviation from the norm.
When AI/ML model is trained effectively under the expertise of data scientists and cyber security experts, it can prove to be a valuable tool to the cyber security defense-in-depth armoury. However, today we are still miles from declaring AI/ML as the penultimate remedy for curbing all cyber frauds.