Using Hive on Spark

Running Hive on Spark rather than MapReduce, its default, can be a faster alternative. The easiest way to do this is to set the hive execution engine to spark on your Hive session.

hive --hiveconf mapreduce.job.queuename=<your_queue>
set hive.execution.engine=spark;

Next, you’ll want to set the number of executor instances, executor cores, and executor memory. For a better idea of how to find the ideal settings for this, you can take a look at this documentation. Changing these settings is essential to getting the job to run quickly. An example of doing so would be:

set spark.executor.instances=15;
set spark.executor.cores=4;
set spark.executor.memory=3g;

Then, run a query just like you would for any Hive job (such as the examples earlier in this guide), and it should run (faster) with Spark instead.

For more information on how to use Hive on Spark and its advantages, check out this post.

Leave a Reply