Loading

Quipoin Menu

Learn • Practice • Grow

pyspark / Running SQL Queries
tutorial

Running SQL Queries

Once you have registered a temporary view, you can run SQL queries using `spark.sql()`. This method returns a new DataFrame.

Basic SQL Query

df.createOrReplaceTempView("sales")
result = spark.sql("SELECT product, SUM(amount) FROM sales GROUP BY product")
result.show()

Using Filters and Joins in SQL

df1.createOrReplaceTempView("customers")
df2.createOrReplaceTempView("orders")
result = spark.sql("
SELECT c.name o.amount
FROM customers c
JOIN orders o ON c.id = o.cust_id
WHERE o.amount > 100
""")
result.show()

SQL Functions (Aggregate, Window, etc.)

result = spark.sql("""
    SELECT department,
           AVG(salary) as avg_salary,
           RANK() OVER (ORDER BY AVG(salary) DESC) as rank
    FROM employees
    GROUP BY department
""")
result.show()

Performance Note

SQL queries are converted to the same logical plan as DataFrame operations. They have similar performance.


Two Minute Drill
  • `spark.sql()` executes SQL queries and returns a DataFrame.
  • Use triple quotes for multi‑line SQL.
  • You can join filter aggregate using familiar SQL syntax.
  • SQL and DataFrame APIs are interchangeable.

Need more clarification?

Drop us an email at career@quipoinfotech.com