Loading

Quipoin Menu

Learn • Practice • Grow

pyspark / Adding and Transforming Columns
tutorial

Adding and Transforming Columns

Creating new columns and transforming existing ones is essential for data preparation. PySpark provides `withColumn()` and a rich set of built‑in functions.

Adding a Constant Column

from pyspark.sql.functions import lit
df = df.withColumn("constant", lit(100))

Adding a Column from an Expression

df = df.withColumn("double_age", df["age"] * 2)
df = df.withColumn("name_upper", upper(df["name"]))

Renaming a Column

df = df.withColumnRenamed("old_name", "new_name")

Dropping a Column

df = df.drop("unwanted_column")

Common Transformation Functions

  • `upper()`, `lower()` – case conversion.
  • `substring(col, pos, len)` – extract part of a string.
  • `round()`, `ceil()`, `floor()` – numeric rounding.
  • `when().otherwise()` – conditional logic (like SQL CASE).


Two Minute Drill
  • `withColumn()` creates or replaces a column.
  • `lit()` creates a constant value column.
  • `withColumnRenamed()` renames a column.
  • `drop()` removes a column.

Need more clarification?

Drop us an email at career@quipoinfotech.com