Pandas is an open-source library that is made mainly for working with relational or labeled data both easily and intuitively. It provides various data structures and operations for manipulating numerical data and time series. This library is built on the top of the NumPy library. Pandas is fast and it has high-performance & productivity for users.
Table of Content
- Getting Started
- Why Pandas is used for Data Science
Pandas was initially developed by Wes McKinney in 2008 while he was working at AQR Capital Management. He convinced the AQR to allow him to open source the Pandas. Another AQR employee, Chang She, joined as the second major contributor to the library in 2012. Over the time many versions of pandas have been released. The latest version of the pandas is 1.0.1
- Fast and efficient for manipulating and analyzing data.
- Data from different file objects can be loaded.
- Easy handling of missing data (represented as NaN) in floating point as well as non-floating point data
- Size mutability: columns can be inserted and deleted from DataFrame and higher dimensional objects
- Data set merging and joining.
- Flexible reshaping and pivoting of data sets
- Provides time-series functionality.
- Powerful group by functionality for performing split-apply-combine operations on data sets.
After the pandas has been installed into the system, you need to import the library. This module is generally imported as –
import pandas as pd
Here, pd is referred to as an alias to the Pandas. However, it is not necessary to import the library using alias, it just helps in writing less amount of code everytime a method or property is called.