NumPy Cheatsheet: A Comprehensive Guide
NumPy (Numerical Python) is a fundamental library in the Python ecosystem for scientific computing. It provides a powerful N - dimensional array object, along with a vast collection of high - level mathematical functions to operate on these arrays. This blog serves as a cheatsheet, which will cover the key concepts, usage, and best practices of NumPy to help you quickly master this important library.
Table of Contents#
Fundamental Concepts#
N - Dimensional Arrays#
The core of NumPy is the ndarray (N - dimensional array) object. An ndarray is a table of elements (usually numbers), all of the same type, indexed by a tuple of non - negative integers.
import numpy as np
# Create a 1 - D array
arr_1d = np.array([1, 2, 3, 4, 5])
print("1 - D array:", arr_1d)
# Create a 2 - D array
arr_2d = np.array([[1, 2, 3], [4, 5, 6]])
print("2 - D array:\n", arr_2d)Shape and Dimensions#
The shape of an array is a tuple that gives the size of each dimension of the array. The ndim attribute gives the number of dimensions of the array.
print("Shape of 1 - D array:", arr_1d.shape)
print("Dimensions of 1 - D array:", arr_1d.ndim)
print("Shape of 2 - D array:", arr_2d.shape)
print("Dimensions of 2 - D array:", arr_2d.ndim)Data Types#
NumPy arrays can have different data types such as integers, floating - point numbers, etc. You can specify the data type when creating an array.
int_arr = np.array([1, 2, 3], dtype=np.int32)
float_arr = np.array([1.1, 2.2, 3.3], dtype=np.float64)
print("Integer array data type:", int_arr.dtype)
print("Floating - point array data type:", float_arr.dtype)Usage Methods#
Array Creation#
There are multiple ways to create NumPy arrays.
From Python Lists#
list_data = [1, 2, 3]
arr_from_list = np.array(list_data)
print("Array from list:", arr_from_list)Using Built - in Functions#
# Create an array of zeros
zeros_arr = np.zeros((2, 3))
print("Array of zeros:\n", zeros_arr)
# Create an array of ones
ones_arr = np.ones((3, 2))
print("Array of ones:\n", ones_arr)
# Create an array with a range of values
range_arr = np.arange(0, 10, 2)
print("Array with a range:", range_arr)Array Indexing and Slicing#
Indexing and slicing in NumPy arrays work similarly to Python lists but can be done in multiple dimensions.
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# Access an element
element = arr[1, 2]
print("Element at position (1, 2):", element)
# Slice a sub - array
sub_arr = arr[0:2, 1:3]
print("Sub - array:\n", sub_arr)Array Operations#
NumPy arrays support a wide range of mathematical operations.
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
# Element - wise addition
add_result = a + b
print("Element - wise addition:", add_result)
# Element - wise multiplication
mul_result = a * b
print("Element - wise multiplication:", mul_result)Common Practices#
Reshaping Arrays#
You can change the shape of an array without changing its data.
arr = np.arange(9)
reshaped_arr = arr.reshape((3, 3))
print("Reshaped array:\n", reshaped_arr)Aggregation Functions#
NumPy provides many aggregation functions such as sum, mean, max, and min.
arr = np.array([[1, 2, 3], [4, 5, 6]])
print("Sum of all elements:", np.sum(arr))
print("Mean of all elements:", np.mean(arr))
print("Maximum value in the array:", np.max(arr))Boolean Indexing#
You can use boolean arrays to index and select elements from an array.
arr = np.array([1, 2, 3, 4, 5])
bool_index = arr > 3
print("Boolean index:", bool_index)
print("Elements greater than 3:", arr[bool_index])Best Practices#
Memory Efficiency#
When working with large datasets, it's important to manage memory efficiently. Avoid creating unnecessary copies of arrays.
# Instead of creating a new array for a simple operation
arr = np.array([1, 2, 3])
arr += 1 # This modifies the original array in - place
print("Modified array:", arr)Vectorization#
Vectorization is the process of performing operations on entire arrays at once, which is much faster than using traditional Python loops.
import time
# Using a loop
arr = np.arange(1000000)
start_time = time.time()
for i in range(len(arr)):
arr[i] = arr[i] * 2
end_time = time.time()
print("Time taken with loop:", end_time - start_time)
# Using vectorization
arr = np.arange(1000000)
start_time = time.time()
arr = arr * 2
end_time = time.time()
print("Time taken with vectorization:", end_time - start_time)Code Readability#
Use meaningful variable names and comments to make your NumPy code more understandable.
# Create an array representing the heights of students
student_heights = np.array([170, 175, 168, 182])
# Calculate the average height
average_height = np.mean(student_heights)
print("Average student height:", average_height)Conclusion#
NumPy is an indispensable library for scientific computing in Python. With its powerful ndarray object, efficient array operations, and a rich set of functions, it enables users to perform complex numerical tasks with ease. By mastering the fundamental concepts, usage methods, common practices, and best practices outlined in this cheatsheet, you can write efficient and effective NumPy code. Whether you are dealing with data analysis, machine learning, or any other numerical task, NumPy will be a reliable tool in your Python toolkit.
References#
- NumPy official documentation: https://numpy.org/doc/stable/
- Python for Data Analysis, 2nd Edition by Wes McKinney
The above blog provides a comprehensive overview of NumPy, including fundamental concepts, usage, common practices, and best practices. It is designed to serve as a quick reference guide for both beginners and experienced users of NumPy.