March 5, 2020

Orange, Presto, Pentaho, Luigi, Scriptella - ETL, Analytics, Business Intelligence


This is note on installation and testing with DBs, Big Data, ETL, analytics, business intelligence software.

Environments, DBs

Below list shows where DBs and Software are installed:
  • Win10 – MySQL DB, Orange, PySpark
  • CentOS7 – Oracle 18c XE
  • Ubuntu18, headless – MariaDB, Postgresql, Hadoop, HBase, Hive, Spark/PySpark, Mahout, PrestoDB, Pentaho, Luigi, Scriptella
  • Ubuntu18 – DB2

Installation Reference


Play Time...


Orange - Analytics, data mining

If behind the FW and/or Proxy, and conda install doesn't work, then download and install:

If not, use Anaconda install steps:
> conda config --add channels conda-forge
> conda install orange3

To run:
> activate <conda environment with Orange>
> orange-canvas

Orange is very promising, great features and GUI - but not mature enough.  It only supports Postgresql DB for example.

Spark - Analytics

PrestoDB Presto is distributed SQL query engine, connecting to multiple/multi-type DBs, such as Hadoop, RDBMS, NoSQL

Pentaho - ETL, Analytics, Report - Pentaho is consist of multiple packages: ETL, Analytics, Business Intelligence.  

Flowable - Java based.  Seems pretty good.

Luigi - Python based ETL

Developed by Spotify.  Looks pretty promising.

Python Based Tools

Above are all similar - python coding based ETL.


Very different concept - using Python but shell with pipe.  Feels like IFTTT for ETL in shell running on local machine.

Scriptella - XML based ETL in Java

Written in Java,
If you're familiar with Spring Framework, Spring Batch is another option -

Worth Mentioning

DB Browser

Free, simple, and works.

Other Tools

Other Lists

No comments:

Post a Comment