This is note on installation and testing with DBs, Big Data, ETL, analytics, business intelligence software.
Environments, DBs
Below list shows where DBs and Software are installed:- Win10 – MySQL DB, Orange, PySpark
- CentOS7 – Oracle 18c XE
- Ubuntu18, headless – MariaDB, Postgresql, Hadoop, HBase, Hive, Spark/PySpark, Mahout, PrestoDB, Pentaho, Luigi, Scriptella
- Ubuntu18 – DB2
Installation Reference
- Postgresql, https://www.digitalocean.com/community/tutorials/how-to-install-and-use-postgresql-on-ubuntu-18-04
- MariaDB, https://www.itzgeek.com/how-tos/linux/ubuntu-how-tos/install-mariadb-on-ubuntu-16-04.html
- Oracle, https://blog.keithkim.com/2019/10/installing-oracle-on-centos.html
- DB2, https://blog.keithkim.com/2020/02/note-install-db2-express-c-on-ubuntu-18.html
Play Time...
Orange - Analytics, data mining
https://orange.biolab.si/If behind the FW and/or Proxy, and conda install doesn't work, then download and install:
https://download.biolab.si/download/files/
If not, use Anaconda install steps:
> conda config --add channels conda-forge
> conda install orange3
To run:
> activate <conda environment with Orange>
> orange-canvas
Orange is very promising, great features and GUI - but not mature enough. It only supports Postgresql DB for example.
Spark - Analytics
https://spark.apache.org/PrestoDB
https://prestodb.io/- Presto is distributed SQL query engine, connecting to multiple/multi-type DBs, such as Hadoop, RDBMS, NoSQLPentaho - ETL, Analytics, Report
https://www.pentaho.com/ - Pentaho is consist of multiple packages: ETL, Analytics, Business Intelligence.Flowable
https://flowable.com - Java based. Seems pretty good.Luigi - Python based ETL
https://github.com/spotify/luigiDeveloped by Spotify. Looks pretty promising.
Python Based Tools
- Bubbles - http://bubbles.databrewery.org
Programming based ETL. Haven't tested it much, but is it better than Pandas (https://pandas.pydata.org)? - Bonobo - https://www.bonobo-project.org
- Pygrametl - https://chrthomsen.github.io/pygrametl/
- PETL - https://pypi.org/project/petl/
Singer
https://www.singer.ioVery different concept - using Python but shell with pipe. Feels like IFTTT for ETL in shell running on local machine.
Scriptella - XML based ETL in Java
Written in Java, https://scriptella.orgIf you're familiar with Spring Framework, Spring Batch is another option - https://mkyong.com/tutorials/spring-batch-tutorial/
Worth Mentioning
- Apache Nifi - https://nifi.apache.org
I wasn't too impressed. It feels still very early version. - Apache Airflow - https://airflow.apache.org
Python based, doesn't run on Windows. Tools should be cross-platform.
DB Browser
- SQL Workbench, http://www.sql-workbench.net/
Other Tools
- Silk - linked data integration framework, http://silkframework.org
- OpenSemantic - https://opensemanticsearch.org/etl
Other Lists
- Java based - https://blog.panoply.io/18-etl-tools-that-do-more-with-java
- Python based - https://www.xplenty.com/blog/python-etl-2019-a-list-and-comparison-of-the-top-python-etl-tools/
No comments:
Post a Comment