The OSINT Newsletter - Issue #99
Offline OSINT: Local Search Tools and Methods
đ Welcome to the 99th issue of The OSINT Newsletter. This issue contains OSINT news, community posts, tactics, techniques, and tools to help you become a better investigator. Hereâs an overview of whatâs in this issue:
How to search large datasets locally
Command-line search methods
Pro tools for processing structured data
âŚand everything you need to know about analysing large files.
đŞ If you missed the last newsletter, hereâs a link to catch up.
⥠Collecting Information from Local Sources in an OSINT Investigation
đď¸ If you prefer to listen, hereâs a link to the podcast instead.
Letâs get started. âŹď¸
Offline OSINT: Local Search Tools and Methods
Not all OSINT happens on the internet. Sometimes the most valuable insights come from something youâve already got downloaded; and every OSINT investigator has heaps of exported spreadsheets and datasets on file to work with. But when youâre archiving everything, itâs easy for your collection of documents - or even the size of the datasets themselves - to get huge.
But processing data with the wrong tools can be a real drag. If youâve ever tried to open a 3GB CSV file in Excel, you already know the pain. Standard office tools simply werenât built for investigative-scale datasets - and thatâs where local device search tools come in.
Letâs get into local search.
What is Local Search?
OSINT investigators often end up working with big datasets. Breach dumps, scrapes, exports and archives can mount up, with a single file easily containing millions of rows. A rookie investigator will usually try to open these with traditional spreadsheet software (think Microsoft Excel); only to find it crashes instantly or slows to a stop. In turn, searching through a dataset is even more of a struggle. Itâs possible, but itâs extremely painful.
Local device search tools are made to solve this problem. They scan the files directly, without loading everything into memory and making themselves sluggish. Instead of manually scrolling through data, you can extract exactly what you need in seconds - like pulling from a digital library catalog, rather than searching shelf-by-shelf.
Searching vs. Processing: How to Handle Large Files
The tools weâre about to talk about are all naturals at searching big files. But what if you want to do more than just search? Then you need processing power. If you want to:
Extract all email domains from a breach file
Identify the most common usernames in a dataset
Count how many times a specific organisation appears
Separate valid data from corrupted rows
Then clearly, just search wonât cut it. Luckily, command-line processing tools excel at these tasks because theyâre designed for automation and scale. Many investigators will even combine the tools weâre about to discuss together; mixing and matching methods and modules lets you build data- processing pipelines that perfectly fit your needs.
For example, you might search up a keyword with grep, then use awk to count the matches. If it sounds like weâre talking nonsense⌠letâs learn what the grep weâre on about.
grep: The Text Search Tool
grep (short for global regular expression print) is one of the most popular local device search tools in the OSINT community. Itâs a Unix command-based search, localised to your device; grep scans text files for matching patterns, and returns every line containing your query.
Itâs fast, simple, and extremely powerful when working with large text-based datasets. The perfect way to surface those pesky data points when theyâre swamped. Use grep to search files for:
Email addresses
Phone numbers
Domain names
Usernames
Keywords related to your investigation
For example, if you wanted to search a breach file for a particular email address, grep could scan millions of rows for it almost instantly.
On top of this, grep can also do pattern matching. This means you can search for entire categories of data, too, as well as exact words; any email address ending in a particular domain, for instance. Because it reads line-by-line rather than loading files fully, grep can comfortably handle big datasets that would blow up normal apps.
csvkit: Making Sense of Spreadsheets
Most OSINT datasets are stored as CSV files. CSV stands for âcomma separated values,â and itâs one of the most common formats for structured data exports. Breach databases, scraped content, and research datasets are frequently distributed this way. Usually, CSV means spreadsheets; but even programs that donât seem like spreadsheet apps will often offer CSV as an output file type.
But CSV files grow big, fast. To deal with this, you need a tool specially designed to deal with CSVs - without opening them and overloading your machine. csvkit is such a tool; it works from the command line to search, filter, and analyse spreadsheets without opening. Instead of scrolling through millions of rows, you can:
View column headers instantly
Filter rows based on conditions
Extract specific columns
Convert files into other (more manageable) formats
For example, if a sheet has three columns full of usernames, IPs, and emails, csvkit allows you to isolate just the column you need and ignore the rest. Makes it much easier to focus on each different data point methodically without getting distracted.
More Tools for Local Data
Beyond grep and csvkit, several other lower-case-named tools are popular in pro OSINT workflows. They might have a disregard for grammar rules, but theyâre great at handling big datasets - searching, processing, analysing, and more.
ripgrep: ripgrep is designed to make grep commands even quicker and easier with little changes; automatically ignoring irrelevant files, like binary data for example. If you have a whole folder of datasets, ripgrep will whip through that entire directory structure - stat.
awk: like grep and sed, awk is a command-line filter. More general than grep, itâs often used for processing structured data - and can handle different commands and modifications than its cousins.
jq: described as âsed for JSON dataâ. Sometimes, datasets are stored in JSON format rather than CSV, making them much more difficult to read manually. jq can search and pull out specific fields from JSON data turning messy machine-readable files into human-readable intel.
SQLite: When a dataset gets super big, itâs sometimes easier to import it into a lightweight database than leave it standalone. SQLite lets you do this. Plus, itâs already the most used database engine in the world.
Example: Local Search in Action
this time, imagine you are a professional osint analyst, working with a dataset containing millions of logins. but something seems wrong. immediately, you realise - all the data appears in lowercase.
somebody has stolen all the capital letters, and the issue is spreading. you need to find out when, and how.
step one: search
first you need to confirm that the capitals have gone. using grep, you scan the dataset for a username you know should be capitalised. Here, every instance appears in lowercase - confirming the capitals arenât where they should be.
step two: process
next, you process the data for evidence. you use awk to analyse patterns across the dataset - counting the examples of that de-capitalised username, and identifying other entries that should have been capitalised. you begin to question the thiefâs motives.
step three: structured analysis
you isolate each column with cvskit, and work through each methodically: usernames, email addresses, dates, checking each for formatting issues. the loss has occurred consistently across all fields. seeing the scale of the crime disturbs you.
step four: check other formats
Finally, you run jq on an older version of your dataset. these files still contain capital letters - meaning the dataset was just corrupted during the csv export.
as for the issue spreading⌠you need a new keyboard.
Key Takeaways
So, now you know the basics of local search. By now you should be able to:
Search: Use commands to find specific data points
Process: Execute more complex commands to make your life easier
Analyse: Work with tools to identify patterns and pivot
type: ignore automatic capitalisation and write in lower case
See you next time, investigators!
đ New CTF Challenge Live - The Hacktivist (2 Parts)
A new CTF challenge has been posted on our CTF website. This weekâs challenge focuses on identifying the hacker username of a threat actor, the date of their first post announcing the start of a cyberattack and the country in which the account is actually operated, using only open source intelligence techniques.
Start competing in our Capture the Flag (CTF)
đŞ If you missed the last CTF, hereâs a link to catch up.
Last weekâs CTF challenge featured a challenge titled âTrace The IPâ. Here is the solution:
Using IP Lookup | Find Your Public IP Address Location and searching for 151.202.95.130 we could see that the IP was linked to several cities : Tuckahoe, Bronxville, New York, Eastchester, Yonkers. Formatting them in alphabetical order gave us : Bronxville, Eastchester, New York, Tuckahoe, Yonkers.
Looking at the ISP we could see that it was Verizon Business.
â Thatâs it for the free version of The OSINT Newsletter. Consider upgrading to a paid subscription to support this publication and independent research.
By upgrading to paid, youâll get access to the following:
đ All paid posts in the archive. Go back and see what youâve missed!
đ If you donât have a paid subscription already, donât worry. Thereâs a 7-day free trial. If you like what youâre reading, upgrade your subscription. If you canât, I totally understand. Be on the lookout for promotions throughout the year.
đ¨ The OSINT Newsletter offers a free premium subscription to all members of law enforcement. To upgrade your subscription, please reach out to LEA@osint.news from your official law enforcement email address.



