Basic DataFrame Ops
Once you have a DataFrame, you need to explore it. PySpark provides several methods to inspect data, understand its structure, and get basic statistics.
Show Data
df.show(5) # first 5 rows
df.show(10, truncate=False) # more rows, full textPrint Schema (Columns and Types)
df.printSchema()Descriptive Statistics
df.describe().show()Computes count, mean, stddev, min, max for numeric columns.Column Names and Count
df.columns
df.count()Selecting and Filtering (Preview)
df.select("col1", "col2").show()
df.filter(df["age"] > 30).show()These will be covered in depth in Module 2.Two Minute Drill
- `show()` displays rows; `printSchema()` shows column types.
- `describe()` gives summary statistics.
- `columns` and `count()` are useful metadata methods.
- `select()` and `filter()` are basic transformations.
Need more clarification?
Drop us an email at career@quipoinfotech.com
