Serverless Instagram Crawler

Sneha
2 min readApr 18, 2022

Instagram hashtag Crawler with Lambda & DynamoDB.

Let’s get started with some of the basic terms which we will be using in this article.

What is cloud computing?

Cloud computing is the on-demand delivery of IT resources over the Internet with pay-as-you-go pricing. Instead of buying, owning, and maintaining physical data centers and servers, you can access technology services, such as computing power, storage, and databases, on an as-needed basis from a cloud provider like Amazon Web Services (AWS).

What is AWS Lambda?

AWS Lambda is an event-driven, serverless computing platform provided by Amazon as a part of Amazon Web Services. It is a computing service that runs code in response to events and automatically manages the computing resources required by that code. It was introduced in November 2014.

What is Amazon DynamoDB?

Amazon DynamoDB is a fully managed proprietary NoSQL database service that supports key–value and document data structures and is offered by Amazon.com as part of the Amazon Web Services portfolio. DynamoDB exposes a similar data model to and derives its name from Dynamo, but has a different underlying implementation.

What is Serverless?

You can deploy many familiar use-cases instantly with the Serverless Framework. From REST APIs on Node.js, Python, Go, Java, to GraphQL APIs, scheduled tasks, Express.js applications, and front-end applications.

Now, let’s get started for serverless instagram crawler :)

This instagram hashtag crawler will crawl the required hashtag on instagram and

Step1: Configure

Configure like this ( before deploy )-

yarn run config

If you do config, it will save a file .config.json

Step2: Serverless

Get environment variables from .config.json file

provider:
environment:
HASH_TAG: ${file(./.config.json):hashTag}
COUNT: ${file(./.config.json):count}
DYNAMODB_TABLE: ${file(./.config.json):dynamoDB}

Also, lambda function has set schedule & timeout like this.

It means that function will run every midnight (12:00).

functions:
crawling:
..
..
timeout: 180
events:
- schedule: cron(0 12 * * ? *)

Step3: Scripts

yarn run config
yarn run test
sls deploy

Step4: Directory Structure

.
├── dist/ # compiled source dir ( .js )
├── src/ # source dir ( .ts )
└── test/ # test source ( .js )

References:

Cloud Computing: https://aws.amazon.com/what-is-cloud-computing/

Amazon Dynamodb: https://g.co/kgs/buoiPg

Amazon Lambda: https://g.co/kgs/RqLS9w

Serverless: https://www.serverless.com/

--

--