SparkSession

The SparkSession is the entry point to any PySpark application. It provides methods to read data, execute SQL, manage the Spark environment, and create DataFrames.

SparkSession replaces the older SparkContext, SQLContext, and HiveContext, unifying them into a single object.

Creating a SparkSession

from pyspark.sql import SparkSession

spark = SparkSession.builder 
    .appName("MyApp") 
    .config("spark.some.config.option", "value") 
    .getOrCreate()

Common Configuration Options

appName: name of your application (shown in Spark UI).
master: `"local[*]"` for local mode (use all cores), or cluster URL.
spark.sql.shuffle.partitions: number of partitions for shuffles (default 200).

Creating a Session for Local Development

spark = SparkSession.builder.appName("local_test").master("local[*]").getOrCreate()

Stopping the Session

Always stop the session at the end of your script to free resources:

spark.stop()

Two Minute Drill

SparkSession is the main entry point for PySpark.
Use `.builder` to configure and `.getOrCreate()` to create.
`master("local[*]")` runs Spark locally.
Call `spark.stop()` to release resources.

Need more clarification?

Drop us an email at career@quipoinfotech.com

Welcome to Quipoin

Quipoin Menu

SparkSession

Need more clarification?