At the core of NumPy is the ndarray
(n - dimensional array) object. An ndarray
is a table of elements (usually numbers), all of the same type, indexed by a tuple of non - negative integers. The number of dimensions is the rank of the array, and the shape of an array is a tuple of integers giving the size of the array along each dimension.
import numpy as np
# Create a 1 - D array
arr_1d = np.array([1, 2, 3, 4, 5])
print("1 - D Array:", arr_1d)
# Create a 2 - D array
arr_2d = np.array([[1, 2, 3], [4, 5, 6]])
print("2 - D Array:", arr_2d)
NumPy arrays have several useful attributes, such as shape
, dtype
, and ndim
.
print("Shape of 2 - D Array:", arr_2d.shape)
print("Data type of 2 - D Array:", arr_2d.dtype)
print("Number of dimensions of 2 - D Array:", arr_2d.ndim)
NumPy provides several functions for creating arrays, such as zeros
, ones
, and arange
.
# Create an array of zeros
zeros_arr = np.zeros((3, 3))
print("Array of zeros:", zeros_arr)
# Create an array of ones
ones_arr = np.ones((2, 4))
print("Array of ones:", ones_arr)
# Create an array using arange
arange_arr = np.arange(0, 10, 2)
print("Array using arange:", arange_arr)
You can perform various mathematical operations on NumPy arrays, such as addition, subtraction, multiplication, and division.
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
# Addition
add_result = a + b
print("Addition result:", add_result)
# Multiplication
mul_result = a * b
print("Multiplication result:", mul_result)
While NumPy is great for numerical computations, it lacks some features for handling structured data. Pandas fills this gap by providing data structures like Series
and DataFrame
that are more suitable for working with tabular data.
A Series
is a one - dimensional labeled array capable of holding any data type (integers, strings, floating - point numbers, Python objects, etc.). It can be thought of as a column in a table.
import pandas as pd
# Create a Series
s = pd.Series([1, 3, 5, np.nan, 6, 8])
print("Series:", s)
A DataFrame
is a two - dimensional labeled data structure with columns of potentially different types. It is similar to a spreadsheet or a SQL table.
# Create a DataFrame from a dictionary
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35]}
df = pd.DataFrame(data)
print("DataFrame:", df)
You can create a DataFrame
from various data sources, such as lists, dictionaries, and NumPy arrays.
# Create a DataFrame from a NumPy array
arr = np.array([[1, 2], [3, 4]])
df_from_np = pd.DataFrame(arr, columns=['col1', 'col2'])
print("DataFrame from NumPy array:", df_from_np)
Pandas provides powerful methods for selecting and filtering data in a DataFrame
.
# Select a column
ages = df['Age']
print("Ages column:", ages)
# Filter rows based on a condition
filtered_df = df[df['Age'] > 28]
print("Filtered DataFrame:", filtered_df)
You can perform various operations on a DataFrame
, such as adding columns, deleting columns, and sorting.
# Add a new column
df['Country'] = ['USA', 'Canada', 'UK']
print("DataFrame with new column:", df)
# Sort the DataFrame by age
sorted_df = df.sort_values(by='Age')
print("Sorted DataFrame:", sorted_df)
for
loop to add two arrays, use the +
operator.loc
and iloc
) to access and manipulate data in a DataFrame
efficiently.NumPy and Pandas are essential libraries in the Python data science ecosystem. NumPy provides the foundation for numerical computing with its powerful array objects and operations, while Pandas builds on top of NumPy to offer advanced data analysis and manipulation capabilities for structured data. By understanding the fundamental concepts, usage methods, and best practices of both libraries, you can efficiently handle and analyze various types of data in your projects.