Pandas Vs Sql

What is pandas?

Definition and Scope

Pandas is an open-source Python library primarily used for data manipulation and analysis. It provides data structures and functions designed to make data cleaning and analysis straightforward and efficient. The library is built on top of NumPy and integrates well with other data-centric Python libraries such as matplotlib and scikit-learn.

Key Components

Data Structures:
- Series: A one-dimensional labeled array capable of holding any data type (e.g., integers, strings, floats).
- DataFrame: A two-dimensional, size-mutable, and heterogeneous tabular data structure with labeled axes (rows and columns). This is the most commonly used object in pandas.
Functions and Methods:
- Data Manipulation: Functions like merge(), concat(), pivot(), and melt() allow complex data restructuring.
- Data Cleaning: Methods such as dropna() for handling missing values and fillna() for filling missing data.
- Data Analysis: Functions like groupby(), apply(), and statistical methods (e.g., mean(), std()) enable in-depth data exploration.
Indexing and Slicing:
- Pandas provides powerful tools for accessing data through labels and positions, which simplifies selecting and modifying data.

Applications in Business

Pandas is widely used in business for tasks such as:

Data Cleaning and Preprocessing: Preparing raw data for analysis by removing inconsistencies and filling in gaps.
Financial Analysis: Managing time-series data for tasks such as stock price analysis and portfolio risk management.
Customer Data Analysis: Aggregating and analyzing large customer datasets to identify trends, segment customers, and track key performance metrics.
Reporting and Visualization: Creating summarized data tables and visualizing data trends using plots in collaboration with visualization libraries.
Predictive Analytics: Preprocessing data for machine learning models to forecast business metrics or customer behavior.

What is SQL?

Definition and Scope

SQL (Structured Query Language) is a standard programming language designed for managing and manipulating relational databases. It is used to query, insert, update, and delete data, as well as manage database structures. SQL operates on relational databases, which store data in tables that are linked by relationships. It is essential for interacting with most database management systems (DBMS) like MySQL, PostgreSQL, Oracle, and SQL Server.

Key Components

SQL Statements:
- Data Query Language (DQL):
  - SELECT: Used to retrieve data from a database.
- Data Definition Language (DDL):
  - CREATE, ALTER, DROP: Used to define and modify database structures (e.g., tables, schemas).
- Data Manipulation Language (DML):
  - INSERT, UPDATE, DELETE: Used to modify and manage data within tables.
- Data Control Language (DCL):
  - GRANT, REVOKE: Used to control access permissions for users.
- Transaction Control Language (TCL):
  - COMMIT, ROLLBACK: Used to manage changes made during a transaction.
Clauses:
- SQL queries often use various clauses, such as:
  - WHERE: Filters records based on specified conditions.
  - ORDER BY: Sorts records.
  - GROUP BY: Groups records based on specific columns, useful for aggregation.
  - HAVING: Filters groups after aggregation.
  - JOIN: Combines rows from two or more tables based on a related column.
Indexes and Keys:
- Primary Key: A column or set of columns used to uniquely identify a record in a table.
- Foreign Key: A column that creates a relationship between two tables by referencing a primary key in another table.
- Index: A data structure used to speed up the retrieval of rows from a database table.

Applications in Business

SQL plays a crucial role in business for several key functions:

Data Management:
- Storing and Retrieving Data: SQL is used to manage business-critical data such as customer records, sales data, and inventory details in relational databases.
Reporting and Analysis:
- SQL helps businesses generate detailed reports by querying large datasets for insights on sales performance, customer behavior, and operational efficiency.
Customer Relationship Management (CRM):
- It is used to manage and query customer data, track interactions, and derive insights for better customer service and marketing strategies.
Business Intelligence:
- SQL is vital for gathering and transforming data to be analyzed in BI tools, helping companies make informed decisions.
Financial Operations:
- Financial departments use SQL to query and update accounting data, track transactions, and generate balance sheets, profit & loss statements, etc.
E-commerce:
- SQL is used for inventory management, order tracking, and processing payments by querying and updating product, customer, and transaction databases.
Data Security:
- SQL is essential in controlling access to sensitive business data through user roles and permissions (e.g., using GRANT and REVOKE).

Feature	Pandas	SQL
Definition	A Python library for data manipulation and analysis.	A query language for managing and manipulating relational databases.
Primary Use	Data analysis, manipulation, cleaning, and transformation within Python programs.	Managing and querying structured data in relational databases.
Data Structure	Works with in-memory data structures like DataFrame and Series.	Works with tables in a relational database.
Data Location	Operates on data loaded into memory (local).	Operates on data stored in a database server (remote or local).
Complexity of Queries	Suitable for complex data transformations using Python code.	Uses declarative queries (SQL syntax) for data retrieval and manipulation.
Integration with Python	Fully integrates with Python, and supports data analysis workflows with libraries like NumPy, Matplotlib, and scikit-learn.	Can be accessed via Python using libraries like `sqlite3`, `SQLAlchemy`, or `pandas` itself for querying.
Performance	Limited by memory for large datasets; slower with large data unless using Dask or similar tools for parallel processing.	Optimized for querying large datasets and can handle bigger volumes of data more efficiently.
Data Handling	Works with smaller to medium-sized datasets that fit in memory.	Designed for querying large datasets in databases that don’t fit in memory.
Operations	Supports data operations like filtering, grouping, merging, reshaping, and more.	SQL operations mainly focus on querying, inserting, updating, and deleting data.
Ease of Use	Pythonic interface, flexible and powerful for analysts familiar with Python.	Standardized syntax (SQL), widely known and used by database administrators and developers.
Transaction Management	Not natively designed for handling transactions.	Supports transaction control through COMMIT, ROLLBACK, and SAVEPOINT.
Concurrency	Single-user, in-memory operation; limited concurrency.	Supports multiple users with robust concurrency control in databases.
Data Type Flexibility	Works with mixed data types (strings, numbers, dates, etc.) in a flexible manner.	Works with fixed column types defined in the database schema (e.g., INT, VARCHAR).
Applications	Primarily used in data analysis, machine learning, reporting, and scientific computing.	Used in business applications, reporting, data management, CRM systems, and financial systems.

Skill Sets and Knowledge Areas

The Panda Skillset

1. Data Structures in pandas

Series: Understanding how to work with one-dimensional arrays of data (indexed data).
DataFrame: Mastery of two-dimensional tables with labeled axes (rows and columns), including how to create, access, and modify them.
MultiIndex: Working with hierarchical indexes for handling complex data structures (multiple levels of indexing).

2. Data Loading and Exporting

Reading Data: Importing data from various file formats such as CSV (read_csv()), Excel (read_excel()), JSON (read_json()), SQL (read_sql()), and more.
Writing Data: Exporting data to formats like CSV (to_csv()), Excel (to_excel()), and SQL databases (to_sql()).

3. Data Inspection and Exploration

Viewing Data: Using methods like .head(), .tail(), .info(), and .describe() to quickly inspect and summarize datasets.
Data Types: Checking and changing data types of columns with methods like .dtype and .astype().
Shape and Size: Using .shape, .size, and .columns to understand the size and structure of the data.

4. Data Cleaning and Transformation

Handling Missing Data: Using methods like .isnull(), .dropna(), .fillna() to detect, drop, or impute missing values.
Removing Duplicates: Using .drop_duplicates() to eliminate redundant data.
Renaming Columns: Renaming columns using .rename() to make the dataset more readable.
Data Transformation: Applying transformations using .apply(), .map(), and .applymap() for row/column-wise operations.

5. Indexing, Selection, and Filtering

Selecting Data: Accessing data with .loc[], .iloc[], .at[], .iat[] for label-based or position-based indexing.
Boolean Indexing: Filtering rows using boolean conditions, e.g., df[df['age'] > 30].
Setting Index: Using .set_index() and .reset_index() to manipulate the row index.

6. Merging and Joining Data

Merging DataFrames: Using .merge() to combine datasets based on common columns (SQL-like joins).
Concatenating DataFrames: Combining datasets along rows or columns using .concat().
Appending DataFrames: Using .append() to add rows from one DataFrame to another.

7. Grouping and Aggregating Data

GroupBy: Aggregating data using .groupby() to perform operations like sum, mean, count, etc., across groups.
Pivoting: Using .pivot_table() for reshaping data (creating a pivot table).
Aggregations: Performing complex aggregations using .agg().

8. Data Sorting and Ranking

Sorting: Sorting data using .sort_values() or .sort_index().
Ranking: Ranking data using .rank().

9. Date and Time Manipulation

DateTime Objects: Working with dates and times using pd.to_datetime().
Resampling: Changing data frequency (e.g., from daily to monthly) using .resample().
Time-based Indexing: Setting time-based indexes and using methods like .shift() and .rolling() for time series data.

10. Data Visualization

Basic Plotting: Using .plot() for quick visualizations, often integrated with matplotlib and seaborn for more detailed graphs.
Histograms, Boxplots, and More: Creating various types of plots (e.g., .hist(), .boxplot()).

11. Performance Optimization

Vectorization: Avoiding for-loops by using vectorized operations in pandas for faster performance.
Memory Management: Optimizing memory usage using appropriate data types (e.g., category type for categorical data) and .astype().

12. Advanced Features

Window Functions: Using .rolling() for moving averages and other window-based operations.
Pivot and Melt: Reshaping data using .pivot() and .melt() for long-to-wide and wide-to-long format transformations.
Crosstab: Creating cross-tabulations using pd.crosstab().

13. Error Handling

Handling Errors: Debugging pandas operations by catching exceptions (e.g., try-except blocks) and handling common errors like KeyErrors or TypeErrors.

14. Integration with Other Tools

Working with SQL: Importing data from SQL databases and writing pandas DataFrames back to SQL using pd.read_sql() and DataFrame.to_sql().
Machine Learning: Preparing data for machine learning models (e.g., using pandas for feature engineering and data preprocessing before feeding data into scikit-learn).

The Sql Skillset

A strong SQL skillset involves a comprehensive understanding of the language and its application to various database management tasks. Below is a detailed list of key skills that are important for mastering SQL:

1. Basic SQL Operations

Data Retrieval: Writing simple SELECT statements to query data from one or more tables.
Filtering Data: Using the WHERE clause to filter rows based on conditions.
Sorting Data: Sorting results with ORDER BY (ascending and descending).
Limiting Results: Using LIMIT (or TOP in some DBMS) to control the number of returned rows.

2. Joins and Relationships

Inner Join: Using INNER JOIN to retrieve data from two or more tables based on matching keys.
Outer Joins: Understanding and using LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN to fetch non-matching rows as well.
Cross Join: Using CROSS JOIN to create the Cartesian product of two tables.
Self Join: Joining a table with itself to compare rows within the same table.

3. Grouping and Aggregation

Group By: Using GROUP BY to group rows and perform aggregate functions on them (e.g., SUM(), AVG(), COUNT()).
Having: Using HAVING to filter groups after applying aggregate functions.
Aggregate Functions: Using built-in functions like SUM(), AVG(), MIN(), MAX(), COUNT() to summarize data.
Distinct: Using DISTINCT to remove duplicates from the results.

4. Data Manipulation

Inserting Data: Using INSERT INTO to add new records into a table.
Updating Data: Using UPDATE to modify existing records based on specific conditions.
Deleting Data: Using DELETE to remove rows from a table.
Bulk Operations: Inserting or updating multiple rows at once using INSERT INTO with multiple values or UPDATE with CASE statements.

5. Subqueries

Simple Subqueries: Writing subqueries within SELECT, FROM, and WHERE clauses.
Correlated Subqueries: Using subqueries that reference columns from the outer query.
Exists and In: Using EXISTS and IN to check for the presence of records in subqueries.

6. Data Types and Constraints

Data Types: Understanding and working with different SQL data types such as INT, VARCHAR, DATE, FLOAT, BOOLEAN, and custom types.
Constraints: Using PRIMARY KEY, FOREIGN KEY, UNIQUE, CHECK, and NOT NULL constraints to enforce data integrity.
Default Values: Assigning default values to columns when data is not provided.

7. Normalization and Data Modeling

Normalization: Understanding and applying normalization principles (1NF, 2NF, 3NF, etc.) to design efficient and non-redundant database schemas.
Foreign Keys: Defining relationships between tables using foreign keys to ensure referential integrity.
Indexing: Creating indexes (CREATE INDEX) on columns to speed up data retrieval, and understanding their impact on performance.
Views: Creating and using VIEWs to simplify complex queries and abstract underlying table structures.

8. Transactions and Concurrency

Transaction Control: Using BEGIN TRANSACTION, COMMIT, ROLLBACK, and SAVEPOINT to manage transactions and ensure data integrity.
ACID Properties: Understanding the concepts of Atomicity, Consistency, Isolation, and Durability in transactions.
Locking and Isolation Levels: Managing database concurrency and isolation levels to control simultaneous access (e.g., READ COMMITTED, SERIALIZABLE).

9. Stored Procedures, Functions, and Triggers

Stored Procedures: Writing reusable stored procedures to execute a sequence of SQL queries.
User-Defined Functions: Creating functions to encapsulate logic and return a value.
Triggers: Setting up triggers to automatically execute actions (e.g., INSERT, UPDATE, DELETE) when certain events occur in the database.

10. Performance Tuning

Query Optimization: Writing efficient queries by avoiding unnecessary columns, using proper joins, and understanding query execution plans.
Indexes: Creating and managing indexes to speed up query execution for frequently accessed columns.
Explain Plan: Analyzing the execution plan (EXPLAIN) to understand query performance and identify bottlenecks.
Partitioning: Using table partitioning to divide large tables into smaller, manageable pieces for performance improvement.

11. Security and Permissions

Access Control: Using GRANT and REVOKE to manage user privileges and control who can perform operations on the database.
Roles and Users: Creating and managing roles and users with different levels of access (e.g., read-only or admin).
Data Encryption: Understanding and implementing encryption for sensitive data, either at rest or during transmission.

12. Backup and Recovery

Backup Strategies: Implementing regular backup strategies using BACKUP and restoring data from backups using RESTORE.
Point-in-Time Recovery: Using transaction logs to recover data up to a specific point in time.

13. Advanced SQL Features

Window Functions: Using window functions like ROW_NUMBER(), RANK(), DENSE_RANK(), and NTILE() for advanced analytics.
Recursive Queries: Writing recursive queries using WITH and Common Table Expressions (CTEs) for hierarchical data (e.g., organizational charts or bill-of-materials).
Full-Text Search: Using full-text search capabilities to search large text-based data fields for keywords or phrases.

14. SQL for Data Integration

ETL Processes: Using SQL to integrate, transform, and load data from different sources into a data warehouse or operational database.
Data Migration: Moving data between databases or systems using INSERT INTO, SELECT INTO, or custom ETL scripts.

Overlapping skills

1. Data Selection and Filtering

pandas: Use .loc[], .iloc[], and boolean indexing to filter and select specific rows or columns from a DataFrame.
SQL: Use SELECT statements with the WHERE clause to filter records based on specific conditions.

2. Grouping and Aggregation

pandas: Use .groupby() to group data by certain columns and apply aggregation functions like sum(), mean(), count(), etc.
SQL: Use GROUP BY to group data by columns and apply aggregate functions like SUM(), AVG(), COUNT(), etc.

3. Sorting and Ordering

pandas: Use .sort_values() to sort data by one or more columns.
SQL: Use ORDER BY to sort query results by one or more columns.

4. Joining/Merging Data

pandas: Use .merge() to join two DataFrames based on common columns (similar to SQL joins).
SQL: Use INNER JOIN, LEFT JOIN, RIGHT JOIN, or FULL OUTER JOIN to combine data from two or more tables based on common columns.

5. Handling Missing Data

pandas: Use .isnull(), .dropna(), and .fillna() to detect and handle missing values in data.
SQL: Use IS NULL or IS NOT NULL to filter or check for missing values (NULLs) in a database.

6. Data Transformation

pandas: Use .apply(), .map(), and .applymap() for transforming data in columns or rows.
SQL: Use SQL functions like UPPER(), LOWER(), CONCAT(), and CAST() to transform data while querying.

7. Column Operations and Calculations

pandas: Perform column-wise calculations directly on DataFrames (e.g., df['new_column'] = df['col1'] + df['col2']).
SQL: Use arithmetic operations and expressions in SELECT statements to calculate values based on columns (e.g., SELECT col1 + col2 AS new_column FROM table).

8. Renaming Columns

pandas: Use .rename() to rename columns in a DataFrame.
SQL: Use AS to create aliases for columns in a query result (e.g., SELECT col1 AS new_col FROM table).

9. Filtering with Conditions

pandas: Use boolean indexing or .query() to filter rows based on conditions (e.g., df[df['age'] > 30]).
SQL: Use WHERE with conditional expressions (e.g., SELECT * FROM table WHERE age > 30).

10. Combining Multiple Datasets

pandas: Use .concat() to concatenate multiple DataFrames along rows or columns.
SQL: Use UNION or UNION ALL to combine rows from multiple SELECT statements.

11. Aggregation with Grouping

pandas: Use .groupby() with aggregation methods (sum(), mean(), count()) to summarize data.
SQL: Use GROUP BY with aggregate functions (SUM(), AVG(), COUNT()) to summarize grouped data.

12. Filtering Unique Values

pandas: Use .drop_duplicates() to remove duplicate rows from a DataFrame.
SQL: Use DISTINCT to return unique rows from a SELECT query.

13. Handling String Data

pandas: Use string methods (e.g., .str.contains(), .str.split(), .str.lower()) to manipulate text data in DataFrame columns.
SQL: Use string functions (e.g., CONCAT(), SUBSTRING(), LIKE, UPPER(), LOWER()) for text data manipulation.

14. Data Export and Import

pandas: Use .to_csv(), .to_sql(), .to_excel(), etc., for exporting data to different formats.
SQL: Use INSERT INTO, SELECT INTO or COPY to import/export data between databases and external files.

15. Indexing

pandas: Use .set_index() to set a DataFrame’s index for better performance and organization.
SQL: Create and manage indexes on database columns to optimize query performance (CREATE INDEX).

Job Roles, Responsibilities and Salaries

Pandas

1. Data Analyst

Responsibilities:

Collecting, processing, and cleaning large datasets.
Using pandas to perform data analysis, including data manipulation, merging, and summarizing.
Creating reports and visualizations to communicate insights using tools like Matplotlib or Seaborn.

Salaries:

Entry-Level: $50,000 – $70,000 per year.
Mid-Level: $70,000 – $90,000 per year.
Senior-Level: $90,000 – $110,000+ per year.

2. Data Scientist

Responsibilities:

Developing and deploying predictive models and using machine learning frameworks.
Data wrangling and feature engineering using pandas to prepare data for analysis.
Collaborating with stakeholders to design data-driven solutions.

Salaries:

Entry-Level: $80,000 – $100,000 per year.
Mid-Level: $100,000 – $130,000 per year.
Senior-Level: $130,000 – $160,000+ per year.

3. Machine Learning Engineer

Responsibilities:

Preparing large datasets for model training and validation using pandas.
Implementing machine learning algorithms and optimization routines.
Managing data pipelines and integrating data workflows with scalable solutions.

Salaries:

Entry-Level: $90,000 – $110,000 per year.
Mid-Level: $110,000 – $140,000 per year.
Senior-Level: $140,000 – $180,000+ per year.

4. Business Intelligence (BI) Developer

Responsibilities:

Using pandas to preprocess data and feed it into dashboards or BI tools.
Supporting data integration tasks and building ETL pipelines.
Developing scripts to extract and clean data before presenting it to decision-makers.

Salaries:

Entry-Level: $65,000 – $85,000 per year.
Mid-Level: $85,000 – $105,000 per year.
Senior-Level: $105,000 – $130,000+ per year.

5. Data Engineer

Responsibilities:

Building data pipelines and ensuring data consistency and quality using pandas and other tools.
Designing and optimizing databases for data storage and retrieval.
Collaborating with Data Scientists to provide them with clean, structured data.

Salaries:

Entry-Level: $80,000 – $100,000 per year.
Mid-Level: $100,000 – $130,000 per year.
Senior-Level: $130,000 – $160,000+ per year.

6. Financial Analyst / Quantitative Analyst

Responsibilities:

Using pandas to process financial data, perform quantitative analyses, and generate financial reports.
Automating data processing workflows and performing statistical computations.
Creating models to forecast market trends and assess risks.

Salaries:

Entry-Level: $60,000 – $80,000 per year.
Mid-Level: $80,000 – $110,000 per year.
Senior-Level: $110,000 – $140,000+ per year.

Job Roles, Responsibilities and Salaries

Sql

1. Database Administrator (DBA)

Responsibilities:

Managing and maintaining database systems for availability, performance, and security.
Implementing backup and recovery strategies.
Monitoring database performance and tuning SQL queries for efficiency.
Managing user access and permissions.

Salaries:

Entry-Level: $70,000 – $90,000 per year.
Mid-Level: $90,000 – $110,000 per year.
Senior-Level: $110,000 – $140,000+ per year.

2. Data Analyst

Responsibilities:

Writing complex SQL queries to extract, manipulate, and analyze data.
Creating reports and dashboards to support business decision-making.
Collaborating with teams to understand data needs and provide insights.

Salaries:

Entry-Level: $50,000 – $70,000 per year.
Mid-Level: $70,000 – $90,000 per year.
Senior-Level: $90,000 – $110,000+ per year.

3. Business Intelligence (BI) Developer

Responsibilities:

Using SQL to build and maintain data models, data warehouses, and OLAP cubes.
Developing ETL (Extract, Transform, Load) processes to integrate data from various sources.
Designing and generating dashboards and reports using BI tools (e.g., Power BI, Tableau).

Salaries:

Entry-Level: $65,000 – $85,000 per year.
Mid-Level: $85,000 – $110,000 per year.
Senior-Level: $110,000 – $140,000+ per year.

4. SQL Developer

Responsibilities:

Writing, optimizing, and maintaining complex SQL queries and stored procedures.
Designing and developing database schemas and structures.
Collaborating with front-end developers and data analysts for data access needs.
Ensuring database code follows best practices and security guidelines.

Salaries:

Entry-Level: $70,000 – $90,000 per year.
Mid-Level: $90,000 – $110,000 per year.
Senior-Level: $110,000 – $130,000+ per year.

5. Data Engineer

Responsibilities:

Designing and developing robust data pipelines to support data flows.
Using SQL to perform data cleansing and transformation tasks.
Collaborating with data analysts and scientists to supply structured, optimized data.

Salaries:

Entry-Level: $80,000 – $100,000 per year.
Mid-Level: $100,000 – $130,000 per year.
Senior-Level: $130,000 – $160,000+ per year.

6. ETL Developer

Responsibilities:

Designing and developing ETL processes to move and transform data between systems.
Writing SQL scripts for data extraction and transformation.
Ensuring data integrity and quality during data migration and processing.

Salaries:

Entry-Level: $70,000 – $90,000 per year.
Mid-Level: $90,000 – $110,000 per year.
Senior-Level: $110,000 – $130,000+ per year.

7. Application Developer

Responsibilities:

Integrating SQL queries within application code to interact with databases.
Collaborating with DBAs to ensure efficient data retrieval and storage.
Developing and maintaining database-driven applications using languages like C#, Java, or Python.

Salaries:

Entry-Level: $70,000 – $90,000 per year.
Mid-Level: $90,000 – $110,000 per year.
Senior-Level: $110,000 – $140,000+ per year.

8. Data Scientist

Responsibilities:

Extracting and preprocessing data using SQL for analysis and modeling.
Integrating SQL data extraction into machine learning workflows.
Collaborating with data engineers to access and use relevant datasets.

Salaries:

Entry-Level: $80,000 – $100,000 per year.
Mid-Level: $100,000 – $130,000 per year.
Senior-Level: $130,000 – $160,000+ per year.

What is pandas?

Definition and Scope

Key Components

Applications in Business

What is SQL?

Definition and Scope

Key Components

Applications in Business

Skill Sets and Knowledge Areas

The Panda Skillset

1. Data Structures in pandas

2. Data Loading and Exporting

3. Data Inspection and Exploration

4. Data Cleaning and Transformation

5. Indexing, Selection, and Filtering

6. Merging and Joining Data

7. Grouping and Aggregating Data

8. Data Sorting and Ranking

9. Date and Time Manipulation

10. Data Visualization

11. Performance Optimization

12. Advanced Features

13. Error Handling

14. Integration with Other Tools

The Sql Skillset

1. Basic SQL Operations

2. Joins and Relationships

3. Grouping and Aggregation

4. Data Manipulation

5. Subqueries

6. Data Types and Constraints

7. Normalization and Data Modeling

8. Transactions and Concurrency

9. Stored Procedures, Functions, and Triggers

10. Performance Tuning

11. Security and Permissions

12. Backup and Recovery

13. Advanced SQL Features

14. SQL for Data Integration

Overlapping skills

1. Data Selection and Filtering

2. Grouping and Aggregation

3. Sorting and Ordering

4. Joining/Merging Data

5. Handling Missing Data

6. Data Transformation

7. Column Operations and Calculations

8. Renaming Columns

9. Filtering with Conditions

10. Combining Multiple Datasets

11. Aggregation with Grouping

12. Filtering Unique Values

13. Handling String Data

14. Data Export and Import

15. Indexing

Job Roles, Responsibilities and Salaries

Pandas

1. Data Analyst

2. Data Scientist

3. Machine Learning Engineer

4. Business Intelligence (BI) Developer

5. Data Engineer

6. Financial Analyst / Quantitative Analyst

Job Roles, Responsibilities and Salaries

Sql

1. Database Administrator (DBA)

2. Data Analyst

3. Business Intelligence (BI) Developer

4. SQL Developer

5. Data Engineer

6. ETL Developer

7. Application Developer

8. Data Scientist

Data Science Interview Questions for Fresher with Answers

Top 50 Mini Project Ideas for College Students (CSE, IT, ECE) [2025 Updated]

Leave a Comment Cancel

Read Next

🔐 TLS vs SSL: Key Differences & Which One You Should Use in 2025 (Explained Simply)

Open Source vs Closed Source: 7 Key Differences Explained Clearly!