Setup Apache Spark On Windows


download spark and unzip E.g
spark_Home=D:\udu\hk\spark-1.5.1

spark needs hadoop jars. download hadoop binaries for windows (hadoop 2.6.0) from

    http://www.barik.net/archive/2015/01/19/172716/

unzip hadoop at some locaiton e.g.

hadoop_home=D:\udu\hk\hadoop-2.6.0

If your java_home or hadoop_home path contains space charcters in it ' ', you will need to convert to path to short paths:

  • Create a batch script with following contents

@ECHO OFF
echo %~s1

  • Run the above batch script file from java_home directory to get the short path for java-home
  • Run the above batch script file from hadoop_home directory to get the short path for hadoop_home

set java_home=short path obtained from above command
set hadoop_home=short path obatained from above command.

Run following command and copy the classpath generated by the command for next step

         %HADOOP_HOME%\bin\hadoop classpath

under spark_home\conf, create a file named "spark-env.cmd" like below

@echo off
set HADOOP_HOME=D:\Utils\hadoop-2.7.1
set PATH=%HADOOP_HOME%\bin;%PATH%
set SPARK_DIST_CLASSPATH=

on Command prompt

          cd %spark_home%\bin
          set SPARK_CONF_DIR=%SPARK_HOME%\conf
          load-spark-env.cmd
          spark-shell.cmd //To start spark shell
          spark-submit.cmd   //To submit spark job

Refer below to create a spark word count example
          http://www.robertomarchetto.com/spark_java_maven_example

To run a spark job (written using Java) word count example from above URL

          spark-submit --class org.sparkexample.WordCount --master local[2]  your_spark_job_jar  Any_additional_parameters_needed_by_your_job_jar



References :

http://stackoverflow.com/questions/30906412/noclassdeffounderror-com-apache-hadoop-fs-fsdatainputstream-when-execute-spark-s

https://blogs.perficient.com/multi-shoring/blog/2015/05/07/setup-local-standalone-spark-node/

http://nishutayaltech.blogspot.com/2015/04/how-to-run-apache-spark-on-windows7-in.html