Cyber Threat Intelligence information gathering without vendors

Deividas was one of the presenters at our SOCshare Community Meetup #3 in June this year. Here’s a brief overview of the insights and recommendations he shared.

Hello, I’m Deividas and I lead and represent a dynamic Cyber Threat Intelligence Research team at Nord Security, specifically for the NordVPN Threat Protection product. My Cyber Threat Intelligence Research team is part of the Threat Intelligence team, which includes Python engineers and data scientists who build advanced machine learning models to protect our customers using Threat Protection within the NordVPN app. Our mission as Cyber Threat Intelligence Researchers is to protect our customers from the ever-evolving landscape of cyber threats, specifically phishing, malware, fraudulent websites and vulnerable applications.

In pursuit of this mission, my team and I are constantly exploring innovative ways to collect, analyse and respond to cyber threat intelligence. Traditionally, organisations have relied heavily on third-party vendors for their CTI needs. While we have strategic partners, we also believe in the power of self-reliance and open source tools (along with our own) to gather meaningful intelligence. This approach allows us to reduce vendor dependency and tailor our intelligence gathering to our specific needs.

What is Cyber Threat Intelligence (CTI)?

Cyber Threat Intelligence is the collection and analysis of information about potential threats and threat actors targeting an organisation or, in our case, our customers. This intelligence enables organisations to anticipate, prevent and respond to cyber threats. In our case, it ensures that our users are protected from hidden dangers that they may not be aware of, providing them with peace of mind and security.

1. Sources of CTI

In order to effectively gather Cyber Threat Intelligence (CTI), it’s important to tap into a variety of sources. Each source offers unique insights that, when combined, can provide a comprehensive understanding of the threat landscape in which you find yourself. Here are the key sources of CTI that my team and I use, and that you could use as well:

1.1 Open Source Intelligence (OSINT)

Open Source Intelligence refers to the process of collecting data from publicly available sources. These can include:

a) News articles and blog posts

You can find actionable Indicators of Compromise (IOCs) or Indicators of Attack (attack vectors) that can be used to block potentially malicious URLs, domains, IPs and hashes that could potentially target your organisation.

b) Social media platforms

Monitoring platforms such as X (formerly Twitter), LinkedIn and Reddit can reveal emerging threats and trends in real time.

c) Public forums

Sites such as Stack Overflow and GitHub issues can provide insight into vulnerabilities and exploits being discussed by the community.

1.2 Dark web intelligence

The Dark Web is a hidden part of the Internet where most illicit activity often takes place. By monitoring dark web forums and marketplaces, we can get early warnings:

a) Data breaches

Information about compromised accounts, databases and personal data. It could tell you if your competitor has recently been breached, who you should be looking out for, or if a colleague has been phished and entered their credentials where they shouldn’t have.

b) Malware sales

Details of the latest malware and exploit kits being sold or discussed. For example, RedLine InfoStealer is currently being sold at a relatively low price, increasing the likelihood of an increase in RedLine InfoStealer infections.

c) Threat actor communication

Interactions between cybercriminals planning or executing attacks. As a dedicated CTI professional, you may even be able to infiltrate these groups to gather valuable intelligence.

1.3 Network and endpoint logs

Internal network and endpoint logs are a goldmine for identifying potential threats. You ask why? Here’s why:

a) Firewall logs: Shows attempts to access or attack your network, and from there you can try to pivot through a potential attacker’s entire infrastructure.

b) IDS/IPS: Records suspicious activity and potential intrusions.

c) Endpoint security logs: Capture information from antivirus software, endpoint detection and response (EDR) solutions, and other endpoint protection tools.

By using these things you can pivot to potentially malicious URLs / domains / IPs / hashes that you can block on your end before it hits you.

1.4 Honeypots

A honeypot is a decoy system designed to attract cyber attackers. Having and monitoring honeypots is essential and can provide insight into:
a) Attack techniques: Understanding how attackers might exploit vulnerabilities.
b) Malware samples: Collect malware samples for analysis, research and blocking. If you have a dedicated reverse engineering person, this can be beneficial. If not, simply blocking the hashes can be very effective.
c) Behavioural patterns: Analysing the tactics and behaviour patterns of attackers.

2. Collaboration and sharing

Collaboration with other organisations and participation in information sharing communities could enhance CTI efforts (I am really grateful that we have Lithuanian information sharing meetups).

3. Tools and Techniques for gathering CTI

Once you have identified potential sources of intelligence, various tools and techniques can be used to gather and analyze this information. Here are my preferred methods:

  • Web Scraping: Extracting data from websites using Python with BeautifulSoup or Scrapy.. and proxies.
  • APIs: Utilizing APIs from threat intelligence platforms to automate data collection (there are many free APIs available for gathering CTI data e.g. AlienVault OTX).
  • Social Media Monitoring: Using tools like TweetDeck or custom Python scripts to track relevant keywords or hashtags. For example, if your competitors are being targeted by the Threat Actor (TA) BlackCat, you can use relevant keywords and hashtags associated with TA BlackCat to find IOCs, attack vectors, and more.

Example of Gathering CTI with Python

Let’s jus take a look at a practical example of how to gather CTI using Python. For this example I’ll use web scraping to collect data from a “Fake Security Blog News Website”.

Requirements:

  • Python 3.x
  • Libraries: requests, BeutifulSoup4, pandas

This is just an example script (nothing advanced), each script should be modified, edited for each blog and much more.

import requests

from bs4 import BeautifulSoup

import pandas as pd

 

#Blog Posts

def fetch_blog_posts(url):

response = requests.get(url)

if response.status_code == 200:

soup = BeautifulSoup(response.content, ‘html.parser’)

posts = []

 

for item in soup.find_all(‘article’):

title = item.find(‘h2’).text

date = item.find(‘time’)[‘datetime’]

summary = item.find(‘p’).text

link = item.find(‘a’)[‘href’]

 

posts.append({

‘title’: title,

‘date’: date,

‘summary’: summary,

‘link’: link

})

 

return posts

else:

print(f”Failed to retrieve content from {url}”)

return []

 

 

#URL of the cyber threat blog to scrape

blog_url = ‘https://fakesecurityblognewswebsite.com’

posts = fetch_blog_posts(blog_url)

 

 

#Save the data to a CSV file

if posts:

df = pd.DataFrame(posts)

df.to_csv(‘cyber_threat_blog_posts.csv’, index=False)

print(“Data saved to cyber_threat_blog_posts.csv”)

else:

print(“No data to save.”)

 

Explanation

  • Fetching blog posts: The fetch_blog_posts function takes a URL as input, makes an HTTP request to that URL, and parses the HTML content using BeutifulSoup.
  • Data extraction: It extracts the title, date, summary and links for each blog post.
  • Data storage: The extracted data is stored in a list of dictionaries, converted to a Pandas DataFrame and saved as a CSV file.

You can modify this script to click on any URL, parse the HTML content within that URL and look for keywords such as BlackCat or AlphaV to identify IOCs, attack vectors, or if there’s no mention or BlackCat or AlphaV just skip that blog post. This process can be automated to check their maliciousness score via VirusTotal, MalwareBazaar, IPAbuse or other sources that provide verdicts on IOCs.

In conclusion, collecting CTI without relying on vendors is feasible, but requires a lot of information and “know how” and hands on (developers, research team…), but in the end it is beneficial for many organisations. By using open source tools and resources, security teams can gain valuable insight while maintaining control over their CTI processes. The provided Python script is just a starting point; numerous other tools and techniques can be integrated to build a comprehensive CTI framework. As cyber threats evolve, so must our methods for gathering and analysing information to ensure we stay ahead of potential adversaries.

Feel free to adapt and extend the Python script to suit your specific needs and explore various sources of tools to create a comprehensive CTI strategy.

SOCshare is part-funded by the European Union. The views and opinions expressed are those of the authors alone and do not necessarily reflect the views and opinions of the European Union or the European Cyber Security Centre of Excellence. Neither the European Union nor the European Cyber Security Centre of Excellence can be held responsible for them.

Other SOCshare updates

Security automation: from idea to tool
SOCcare May 2025 Malpeek: Analysis of a "copyright infringement" malware
SOCcare May 2025 Malpeek: Analysis of a "copyright infringement" malware
NRD Cyber Security recorded strong growth and international expansion in 2024
NRD Cyber Security recorded strong growth and international expansion in 2024
Building awareness is a continuous effort
Building awareness is a continuous effort
Facilitating dialogue on NIS2 within the Lithuanian cybersecurity ecosystem
Facilitating dialogue on NIS2 within the Lithuanian cybersecurity ecosystem
Developing a culture of CTI sharing in Lithuania
Developing a culture of CTI sharing in Lithuania
Festivities in Lithuania in 8 episodes
Festivities in Lithuania in 8 episodes
SOCshare December 2024: cyber threats for elderly
SOCshare December 2024: cyber threats for elderly