Haikal Hilmi
Back
Multi-million records / day

Social Media Data Pipeline

A unified pipeline behind the Twitter, Instagram, and YouTube scrapers, built for massive volume.

Reliable 24/7 processing at scale, with full monitoring.

How it works

Sources

Scrapers

Queue

RabbitMQ

Process

Docker workers

Store

Elasticsearch

Monitor

Grafana + Prometheus

Problem

Data volume is enormous and must be processed continuously.

Solution

HPC pipeline with monitoring and a queue system.

Tech stack

  • RabbitMQ
  • Docker
  • Elasticsearch
  • Grafana
  • Prometheus
  • Airflow