Pandas
Pandas is a high‑level data manipulation library built on NumPy. It provides two main data structures: `Series` (1‑D) and `DataFrame` (2‑D), making it easy to work with structured data (like CSV files, databases).
Installation
`pip install pandas`
Creating a DataFrame
From a dictionary or reading a CSV.
import pandas as pd
data = {
"Name": ["Alice", "Bob", "Charlie"],
"Age": [25, 30, 35],
"City": ["NYC", "LA", "Chicago"]
}
df = pd.DataFrame(data)
print(df)
# Read from CSV
# df = pd.read_csv("file.csv")
Inspecting Data
`df.head()`, `df.info()`, `df.describe()`.
Selecting Data
Use column names or boolean indexing.
ages = df["Age"]
young = df[df["Age"] < 30]
subset = df.loc[0:1, ["Name", "City"]]
Data Cleaning
Handle missing values, rename columns, drop duplicates.
df.dropna(inplace=True)
df.fillna(0, inplace=True)
df.rename(columns={"Name": "FullName"}, inplace=True)
Grouping and Aggregation
`groupby` is powerful for summarization.
grouped = df.groupby("City")["Age"].mean()
Two Minute Drill
- Pandas provides DataFrames for tabular data.
- Read data from CSV, Excel, databases.
- Select rows/columns with `[]`, `loc`, `iloc`.
- Handle missing data with `dropna`, `fillna`.
- Use `groupby` for aggregation, `pivot_table` for summaries.
Need more clarification?
Drop us an email at career@quipoinfotech.com
