Hello World

One nice thing about Spark is that under the hood, it's basically just a SQL dialect.

(Example notebook can be found here)

The entry point to Spark is the spark session object. This is used to tell your Spark cluster what you want to do, whether it's reading data, writing data, or executing a query.

Python

from pyspark.sql import SparkSession

# 1) Create a spark context
spark = SparkSession.builder.getOrCreate()

# 2) You can use `sql()` to write raw sql queries
df = spark.sql("SELECT 'Hello World' as column_1")

# 3) You can use `show()` to print your dataframe
df.show()

#  +-----------+
#  |   column_1|
#  +-----------+
#  |Hello World|
#  +-----------+