Installing PySpark
PySpark can be installed on your local machine for learning. This chapter covers installation on Windows, Mac, and Linux, along with setting up a virtual environment.
Prerequisites
- Python 3.8 or later.
- Java 8 or later (Spark runs on the JVM).
- 8+ GB RAM recommended for local testing.
Installing Java
Download and install OpenJDK from adoptium.net. Verify:
java --versionCreate and Activate Virtual Environment
python -m venv pyspark-env
source pyspark-env/bin/activate # Mac/Linux
pyspark-envScriptsactivate # WindowsInstall PySpark
pip install pysparkThis installs Spark and PySpark together.Verify Installation
Run Python and import PySpark:
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("Test").getOrCreate()
print(spark.version)If no errors, PySpark is ready.Two Minute Drill
- Install Java (OpenJDK) before PySpark.
- Use a virtual environment.
- Install PySpark with `pip install pyspark`.
- Test with `SparkSession.builder.getOrCreate()`.
Need more clarification?
Drop us an email at career@quipoinfotech.com
