Adding and Transforming Columns
Creating new columns and transforming existing ones is essential for data preparation. PySpark provides `withColumn()` and a rich set of built‑in functions.
Adding a Constant Column
from pyspark.sql.functions import lit
df = df.withColumn("constant", lit(100))Adding a Column from an Expression
df = df.withColumn("double_age", df["age"] * 2)
df = df.withColumn("name_upper", upper(df["name"]))Renaming a Column
df = df.withColumnRenamed("old_name", "new_name")Dropping a Column
df = df.drop("unwanted_column")Common Transformation Functions
- `upper()`, `lower()` – case conversion.
- `substring(col, pos, len)` – extract part of a string.
- `round()`, `ceil()`, `floor()` – numeric rounding.
- `when().otherwise()` – conditional logic (like SQL CASE).
Two Minute Drill
- `withColumn()` creates or replaces a column.
- `lit()` creates a constant value column.
- `withColumnRenamed()` renames a column.
- `drop()` removes a column.
Need more clarification?
Drop us an email at career@quipoinfotech.com
