SparkSession
The SparkSession is the entry point to any PySpark application. It provides methods to read data, execute SQL, manage the Spark environment, and create DataFrames.
SparkSession replaces the older SparkContext, SQLContext, and HiveContext, unifying them into a single object.
Creating a SparkSession
from pyspark.sql import SparkSession
spark = SparkSession.builder
.appName("MyApp")
.config("spark.some.config.option", "value")
.getOrCreate()Common Configuration Options
- appName: name of your application (shown in Spark UI).
- master: `"local[*]"` for local mode (use all cores), or cluster URL.
- spark.sql.shuffle.partitions: number of partitions for shuffles (default 200).
Creating a Session for Local Development
spark = SparkSession.builder.appName("local_test").master("local[*]").getOrCreate()Stopping the Session
Always stop the session at the end of your script to free resources:
spark.stop()Two Minute Drill
- SparkSession is the main entry point for PySpark.
- Use `.builder` to configure and `.getOrCreate()` to create.
- `master("local[*]")` runs Spark locally.
- Call `spark.stop()` to release resources.
Need more clarification?
Drop us an email at career@quipoinfotech.com
