March 5, 2020

Orange, Presto, Pentaho, Luigi, Scriptella - ETL, Analytics, Business Intelligence

This page is work in progress.

This is note on installation and testing with DBs, Big Data, ETL, analytics, business intelligence software.

Environments, DBs

Below list shows where DBs and Software are installed:
  • Win10 – MySQL DB, Orange, PySpark
  • CentOS7 – Oracle 18c XE
  • Ubuntu18, headless – MariaDB, Postgresql, Hadoop, HBase, Hive, Spark/PySpark, Mahout, PrestoDB, Pentaho, Luigi, Scriptella
  • Ubuntu18 – DB2

Installation Reference


Play Time...


Orange - Analytics, data mining

If behind the FW and/or Proxy, and conda install doesn't work, then download and install:

If not, use Anaconda install steps:
> conda config --add channels conda-forge
> conda install orange3

To run:
> activate <conda environment with Orange>
> orange-canvas

Orange is very promising, great features and GUI - but not mature enough.  It only supports Postgresql DB for example.

Spark - Analytics

PrestoDB Presto is distributed SQL query engine, connecting to multiple/multi-type DBs, such as Hadoop, RDBMS, NoSQL

Pentaho - ETL, Analytics, Report - Pentaho is consist of multiple packages: ETL, Analytics, Business Intelligence.  

Flowable - Java based.  Seems pretty good.

Luigi - Python based ETL

Developed by Spotify.  Looks pretty promising.

Python Based Tools

Above are all similar - python coding based ETL.


Very different concept - using Python but shell with pipe.  Feels like IFTTT for ETL in shell running on local machine.

Scriptella - XML based ETL in Java

Written in Java,
If you're familiar with Spring Framework, Spring Batch is another option -

Worth Mentioning

DB Browser

Free, simple, and works.

Other Tools

Other Lists

No comments: