The goal of this project was to develop a passive Google dork script to collect potentially vulnerable web pages and applications on the Internet. There are 2 parts. The first is ghdb_scraper.py that retrieves Google Dorks and the second portion is pagodo.py that leverages the information gathered by ghdb_scraper.py
- Git package
- Python package
How to install and use
git clone https://github.com/opsdisk/pagodo.git
pip install -r requirements.txt
chmod +x *.py
python3 pagodo.py -help
Google is blocking me
If you start getting HTTP 429 errors, Google has rightfully detected you as a bot and will block your IP for a set period of time. The solution is to use proxychains and a bank of proxies to round robin the lookups.
apt install proxychains4 -y
Edit the /etc/proxychains4.conf configuration file to round robin the look ups through different proxy servers. In the example below, 2 different dynamic socks proxies have been set up with different local listening ports (9050 and 9051). Don’t know how to utilize SSH and dynamic socks proxies? Do yourself a favor and pick up a copy of Cyber Plumber’s Handbook and interactive lab to learn all about Secure Shell (SSH) tunneling, port redirection, and bending traffic like a boss.
round_robinchain_len = 1proxy_dns remote_dns_subnet 224tcp_read_time_out 15000tcp_connect_time_out 8000[ProxyList]socks4 127.0.0.1 9050socks4 127.0.0.1 9051
Throw proxychains4 in front of the Python script and each lookup will go through a different proxy (and thus source from a different IP). You could even tune down the -e delay time because you will be leveraging different proxy boxes.
proxychains4 python3 pagodo.py -g ALL_dorks.txt -s -e 17.0 -l 700 -j 1.1
To start off, pagodo.py needs a list of all the current Google dorks. A datetimestamped file with the Google dorks and the indididual dork category dorks are also provided in the repo. Fortunately, the entire database can be pulled back with 1 GET request using ghdb_scraper.py. You can dump all dorks to a file, the individual dork categories to separate dork files, or the entire json blob if you want more contextual data about the dork.
To retrieve all dorks
python3 ghdb_scraper.py -j -s
To retrieve all dorks and write them to individual categories:
python3 ghdb_scraper.py -i