Flux HPC Blog: Querying data with SparkSQL

By | Data, General Interest, HPC, News

SparkSQL is a way for people to use SQL-like language to query their data with ease while taking advantage of the speed of Spark, a fast, general engine for data processing that runs over Hadoop. I wanted to test this out on a dataset I found from Walmart with their stores’ weekly sales numbers. I put the csv into our cluster’s HDFS (in /var/walmart) making it accessible to all Flux Hadoop users.

U-M telecast of XSEDE Big Data workshop

By |

XSEDE and the Pittsburgh Supercomputing Center are presenting a one day Big Data workshop. This workshop will focus on topics such as Hadoop and Spark. U-M is one of several sites around the country that will host a telecast of the session. Registration is required as space is limited.

Schedule:

11:00 Welcome
11:25 Intro to Big Data
11:45 Hadoop
12:15 Hadoop(continued)
1:00 Lunch break
2:00 Exercises
2:45 Spark
3:45 Exercises
4:15 A Big Big Data Platform
5:00 Adjourn