NumPy’s core object is the ndarray
(n - dimensional array). An ndarray
is a homogeneous multi - dimensional array of fixed - size items. All elements in an ndarray
must be of the same data type (e.g., integers, floating - point numbers).
import numpy as np
# Create a 1 - D array
arr_1d = np.array([1, 2, 3, 4, 5])
print("1 - D Array:", arr_1d)
# Create a 2 - D array
arr_2d = np.array([[1, 2, 3], [4, 5, 6]])
print("2 - D Array:\n", arr_2d)
In this code, we first import the NumPy library. Then we create a one - dimensional array and a two - dimensional array using the np.array()
function.
Pandas has two primary data structures: Series
and DataFrame
. A Series
is a one - dimensional labeled array capable of holding any data type. A DataFrame
is a two - dimensional labeled data structure with columns of potentially different types.
import pandas as pd
# Create a Series
s = pd.Series([1, 3, 5, np.nan, 6, 8])
print("Series:\n", s)
# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35]}
df = pd.DataFrame(data)
print("DataFrame:\n", df)
Here, we import the Pandas library. We create a Series
with some values including a missing value (np.nan
). Then we create a DataFrame
from a dictionary where the keys become column names and the values become column data.
import numpy as np
# Generate two random matrices
A = np.random.rand(3, 3)
B = np.random.rand(3, 3)
# Matrix multiplication
C = np.dot(A, B)
print("Matrix multiplication result:\n", C)
DataFrame
.import pandas as pd
import numpy as np
data = {'col1': [1, np.nan, 3], 'col2': [4, 5, np.nan]}
df = pd.DataFrame(data)
df_filled = df.fillna(0)
print("DataFrame after filling missing values:\n", df_filled)
DataFrame
methods.import pandas as pd
import matplotlib.pyplot as plt
data = {'Name': ['Alice', 'Bob', 'Charlie', 'Alice'],
'Score': [85, 90, 78, 92]}
df = pd.DataFrame(data)
grouped = df.groupby('Name')['Score'].mean()
grouped.plot(kind='bar')
plt.show()
import numpy as np
arr = np.array([1, 2, 3])
try:
arr[0] = 'a'
except ValueError as e:
print("Error:", e)
DataFrames
with different indexes.import numpy as np
# Using vectorization
arr = np.array([1, 2, 3])
result = arr * 2
print("Vectorized operation result:", result)
loc
) and integer - based indexing (iloc
).import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35]}
df = pd.DataFrame(data)
print("Using loc to access data:\n", df.loc[1, 'Name'])
print("Using iloc to access data:\n", df.iloc[1, 0])
In summary, NumPy and Pandas are both essential libraries in the Python data science ecosystem, but they serve different purposes. NumPy is ideal for numerical computations and memory - efficient storage of homogeneous data. Pandas, on the other hand, shines in data cleaning, preprocessing, and analysis of heterogeneous data. By understanding their core concepts, typical usage scenarios, common pitfalls, and best practices, you can choose the right library for your specific data analysis tasks and make your code more efficient and effective.