The core data structure in NumPy is the ndarray
(n - dimensional array). It is a homogeneous data structure, meaning all elements in the array must be of the same data type. For example, an array can contain only integers or only floating - point numbers.
import numpy as np
# Create a 1 - D array
one_d_array = np.array([1, 2, 3, 4, 5])
print("1 - D Array:", one_d_array)
# Create a 2 - D array
two_d_array = np.array([[1, 2, 3], [4, 5, 6]])
print("2 - D Array:", two_d_array)
NumPy arrays have several important attributes such as shape
, dtype
, and ndim
.
shape
: Returns a tuple representing the dimensions of the array.dtype
: Returns the data type of the elements in the array.ndim
: Returns the number of dimensions of the array.print("Shape of 2 - D array:", two_d_array.shape)
print("Data type of 2 - D array:", two_d_array.dtype)
print("Number of dimensions of 2 - D array:", two_d_array.ndim)
One of the most common use cases of NumPy in data science is data preprocessing. This includes tasks such as data cleaning, normalization, and reshaping.
# Generate some sample data
data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# Normalize the data
normalized_data = (data - np.mean(data)) / np.std(data)
print("Normalized data:", normalized_data)
# Reshape the data
reshaped_data = data.reshape((9,))
print("Reshaped data:", reshaped_data)
NumPy provides a wide range of mathematical functions that can be used for performing complex calculations on arrays. This includes basic arithmetic operations, trigonometric functions, and linear algebra operations.
# Perform element - wise addition
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
result = a + b
print("Element - wise addition:", result)
# Calculate the dot product
dot_product = np.dot(a, b)
print("Dot product:", dot_product)
In machine learning, NumPy is used extensively for handling and manipulating data. For example, it is used to represent feature matrices and target vectors.
# Generate a simple feature matrix and target vector
X = np.array([[1, 2], [3, 4], [5, 6]])
y = np.array([7, 8, 9])
# Calculate the mean of the feature matrix
mean_X = np.mean(X, axis = 0)
print("Mean of feature matrix:", mean_X)
NumPy is an essential library in data science due to its versatility and efficiency. It provides a powerful set of tools for handling and manipulating numerical data, making it a cornerstone for many data - related tasks. By understanding the core concepts, typical usage scenarios, common pitfalls, and best practices, data scientists can effectively leverage NumPy in real - world applications.