Loading

Quipoin Menu

Learn • Practice • Grow

python-for-ai / Data Filtering
interview

Q1. Scenario: From a DataFrame of employees with columns 'department', 'salary', select all rows where department is 'IT' and salary > 70000.
filtered = df[(df['department'] == 'IT') & (df['salary'] > 70000)]. Use & for AND, | for OR. Parentheses required. This is boolean indexing. Use .query('department == "IT" and salary > 70000') as alternative.

Q2. Scenario: Filter rows where the 'product' column contains the substring 'phone' using str.contains.
mask = df['product'].str.contains('phone', na=False); filtered = df[mask]. na=False ignores NaN. Case-insensitive: case=False. This is used for text pattern matching.

Q3. Scenario: Select rows 10 through 20 (inclusive) using .iloc and also rows with index labels 'row5' to 'row10' using .loc.
df.iloc[10:21] (since iloc exclusive end). df.loc['row5':'row10'] inclusive of both labels. iloc for integer positions, loc for label-based. Essential for subset selection.

Q4. Scenario: From a DataFrame with columns A, B, C select only columns A and C. Also select all columns except B.
df[['A', 'C']]; df.drop('B', axis=1). Both return new DataFrames. Use inplace=True to modify original. Also df.loc[:, df.columns != 'B'] for excluding.

Q5. Scenario: Filter rows where the 'score' column is between 60 and 80 inclusive. Use between() method.
filtered = df[df['score'].between(60, 80, inclusive='both')]. This is concise and efficient. Also can do df[(df.score >= 60) & (df.score <= 80)].