Recently I had an interesting idea related to data analysis. To bring to life the idea it was needed to have certain data in the database that would be used in various analyses.
At the first glance, it was an easy task to scrape necessary data from a social site and store it in the database. However, after some activity from our side, the social site has blocked an IP address from where requests were coming.
Eventually, by trying various methods, I was able to solve the problem. In this article, I described one of the more interesting solutions that helped me in that situation.
Our goal is to have a script in NodeJS that would be deployed to Heroku and which would make many HTTP requests to the defined list of URLs. It also needs to be able to change its own IP address if it will be necessary.
The mechanism solely is based on the feature of Heroku where Heroku changes the IP addresses of its dynos on each restart.
Let's start with a small script in NodeJS that would make requests to a defined list of URLs and will log received responses. The script also needs to have a proper error handling mechanism that would catch all the errors regarding rate limitations.
In the script above, after a certain amount of requests, we artificially raise the rate limitation error. We do it just to be able to show how the IP rotation mechanism works. So that should not be included in real applications.
After running the script you will get something like this:
As you can see in the script logs, all the defined URLs as a response return the IP address of the current machine.
Now let's get back to the IP rotation mechanism.
As already has been said, Heroku has a feature(at the time of writing this article) where it always changes the IP addresses of its dynos whenever they are being restarted.
So by leveraging that feature each time when it would be needed to change the IP address of the server, we will just restart it.
There is a REST API provided by Heroku. It lets automate some processes in the Heroku infrastructure. In our case, we need an endpoint, that would let us restart a dyno just by making an HTTP request. And here is that method:
DELETE /apps/{APP_NAME}/dynos/{DYNO_ID_OR_NAME}
Where:
APP_NAME is the name of the application created in Heroku. You can find it in this page:
DYNO_NAME is the name of the dyno. It is defined in the Procfile. In our case it will be "worker.1".
Besides those two we also need an API key that would let us pass authentication of Heroku APIs.You can find it in this page:
Now let's make the following changes in our NodeJS script:
1. define a method with a name restartMe() and call it each time when an error with a status code 429 occurred
2. inside of restartMe() method by using a package heroku-client let's make an HTTP request to the Heroku endpoint mentioned above. As a result, it will restart the server.
Before deploying the script to Heroku there is something we need to do. We need to create a file that will contain the command of starting our script. The name of the file must be Procfile, as Heroku accept a configuration file only with that name.
worker: node ip-script.js
We also need to set necessary environment variables so restartMe() method can successfully restart the server. That is possible to do in the Heroku application's dashboard:
Now using the commands below let's deploy the script to the Heroku.
$ heroku login
$ cd your-project-folder
$ git init
$ heroku git:remote -a your-app-name
$ git add .
$ git commit -am "Initial commit"
$ git push heroku master
Here is the result of the script. As it can be noticed each time when the error with a status code 429 occurs, it's being restarted and the IP address is being changed.
The entire script:
const axios = require('axios');
const Heroku = require('heroku-client');
const heroku = new Heroku({ token: process.env.API_KEY })
const URLS = [
'https://api.myip.com/?id=1',
'https://api.myip.com/?id=2',
'https://api.myip.com/?id=3',
'https://api.myip.com/?id=4',
'https://api.myip.com/?id=5',
'https://api.myip.com/?id=6',
'https://api.myip.com/?id=7',
];
function throwRateLimitErrorIfNeeded(url) {
if (url.includes('id=5')) {
const err = new Error('Rate limit exceeded');
err.status = 429;
throw err;
}
}
async function main() {
try {
for await (const url of URLS) {
const response = await axios.get(url);
console.log(`URL: ${url} (${response.data.ip})`);
throwRateLimitErrorIfNeeded(url);
}
} catch (err) {
console.log(err.message);
if (err.status === 429) {
await restartMe();
}
}
}
async function restartMe() {
console.log('Restarting myself..')
await heroku.delete(`/apps/${process.env.APP_NAME}/dynos/${process.env.DYNO_NAME}.1`);
}
main();