Environment: Win10 64-bit
Pr-requisite: Java 8 or later, Anaconda
- Create an Anaconda environment. Mine is called "develop" for Pyspark with Python 3.7
- activate develop (or your environment name)
- conda install pyspark jupyter
- Get 'winutils.exe' from https://github.com/steveloughran/winutils/tree/master/hadoop-2.8.1
- Store winutils.exe in C:\opt\spark-hadoop\bin (or anywhere you like)
- Set up an environment variable, set HADOOP_HOME=C:\opt\spark-hadoop
- type "pyspark" to get Scala console. For Jupyter, just type "jupyter notebook"
No comments:
Post a Comment