Basic DataFrame Ops

Once you have a DataFrame, you need to explore it. PySpark provides several methods to inspect data, understand its structure, and get basic statistics.

Show Data

df.show(5)          # first 5 rows
df.show(10, truncate=False)  # more rows, full text

Print Schema (Columns and Types)

df.printSchema()

Descriptive Statistics

df.describe().show()

Computes count, mean, stddev, min, max for numeric columns.

Column Names and Count

df.columns
df.count()

Selecting and Filtering (Preview)

df.select("col1", "col2").show()
df.filter(df["age"] > 30).show()

These will be covered in depth in Module 2.

Two Minute Drill

`show()` displays rows; `printSchema()` shows column types.
`describe()` gives summary statistics.
`columns` and `count()` are useful metadata methods.
`select()` and `filter()` are basic transformations.

Need more clarification?

Drop us an email at career@quipoinfotech.com

Welcome to Quipoin

Quipoin Menu

Basic DataFrame Ops

Need more clarification?