Tuesday, August 9, 2016

Data Workflow Management in Big Data Analytics

Workflow Management in Big Data Analytics

So now, you have this big powerful analytics cluster of 500+ nodes and now suddenly you have lots of team around your organization ready to attack your cluster with heavy jobs.

You need a way to schedule and manage this jobs in the data pipeline and that is where data wokflow management tool like Airflow, Nodered come into picture.

Airflow

Nodered


Friday, August 5, 2016

Messaging Queue Systems - Kafka, Mesos, RabbitMQ, ZeroMQ, Apache ActiveMQ, OpenMP


  1. Kafka
    • Getting started - http://blog.antlypls.com/blog/2015/10/05/getting-started-with-spark-streaming-using-docker/
  2. Mesos
  3. RabbitMQ
  4. ZeroMQ
  5. Apache ActiveMQ
  6. OpenMP