The @trickest Inventory project is an interesting resource. It has a massive set of hostnames, live services, spidered URLs, and cloud data organized by Bug Bounty program. There is so much more data than I have interest in storing for my needs. In fact, the only thing I am interested in is the hostnames resource. Here is a quick and dirty way to pull the hostnames.txt
file from every program without cloning the entire project.
First, pull the current project git history without cloning it:
git clone --no-checkout \
--depth 1 \
--single-branch --branch=main \
https://github.com/trickest/inventory.git
Above we are cloning the project without checking out any files; --no-checkout
. We are also only pulling HEAD (--depth 1
) and only focused on the main branch.
Note, just the commit history from main takes up 336Mb 😲
Finally, we are going to download every hostname.txt
file. This is done by finding the listing the HEAD tree, grep'ing for the filename, urlencoding &
, and then downloading the file.
git ls-tree --full-name --name-only -r HEAD | \
grep hostnames.txt | \
sed -e "s/&/%26/" | \
xargs -I {} sh -c 'curl -o $(echo {} | cut -d\/ -f1)_hostnames.txt https://raw.githubusercontent.com/trickest/inventory/main/{}'
At this point you should have a directory full of the relevant files.
If you don't want to pipe into a subshell (yolo), you can use wget (remove the -o subshell) but you will be left with every file named hostnames.txt.X
:
git ls-tree --full-name --name-only -r HEAD | \
grep hostnames.txt | \
sed -e "s/&/%26/" | \
xargs -I {} wget https://raw.githubusercontent.com/trickest/inventory/main/{}