numpy.load
function, which allows you to quickly and efficiently load data stored in .npy
or .npz
files. These file formats are specifically designed to store NumPy arrays, offering benefits such as fast loading times, compact storage, and the ability to preserve array metadata. This blog post will delve into the fundamental concepts, usage methods, common practices, and best practices of numpy.load
..npy
and .npz
File Formats.npy
: This is a binary file format for storing single NumPy arrays. It preserves the array’s shape, data type, and other metadata. When you save an array using numpy.save
, it is stored in the .npy
format..npz
: This is a compressed archive file format that can store multiple NumPy arrays. It is created when you use numpy.savez
or numpy.savez_compressed
. Each array in the .npz
file can be accessed by a unique key.numpy.load
FunctionThe numpy.load
function is used to load data from .npy
or .npz
files. Its basic syntax is:
import numpy as np
data = np.load(file, mmap_mode=None, allow_pickle=False, fix_imports=True, encoding='ASCII')
file
: The path to the .npy
or .npz
file.mmap_mode
: Memory-mapping mode. If specified, the array is not loaded into memory all at once, which can be useful for large files.allow_pickle
: Whether to allow loading pickled objects. This should be set to True
only if you trust the source of the file, as pickled objects can execute arbitrary code.fix_imports
: If True
, it tries to map old Python 2 names to new Python 3 names when unpickling.encoding
: The encoding used to unpickle Python 2 strings..npy
Fileimport numpy as np
# Create a sample array
arr = np.array([1, 2, 3, 4, 5])
# Save the array to a .npy file
np.save('example.npy', arr)
# Load the array from the .npy file
loaded_arr = np.load('example.npy')
print(loaded_arr)
.npz
Fileimport numpy as np
# Create multiple sample arrays
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
# Save the arrays to a .npz file
np.savez('example.npz', array1=arr1, array2=arr2)
# Load the arrays from the .npz file
loaded_data = np.load('example.npz')
# Access individual arrays using their keys
print(loaded_data['array1'])
print(loaded_data['array2'])
import numpy as np
# Create a large sample array
large_arr = np.random.rand(1000000)
# Save the array to a .npy file
np.save('large_array.npy', large_arr)
# Load the array using memory-mapping
mmap_arr = np.load('large_array.npy', mmap_mode='r')
# Access a part of the array without loading the whole array into memory
print(mmap_arr[:10])
When loading files, it’s important to handle potential errors. For example, if the file does not exist, numpy.load
will raise a FileNotFoundError
.
import numpy as np
try:
data = np.load('nonexistent_file.npy')
except FileNotFoundError:
print("The file does not exist.")
Before loading a file, you can check its extension to determine whether it is a .npy
or .npz
file.
import numpy as np
import os
file_path = 'example.npy'
file_ext = os.path.splitext(file_path)[1]
if file_ext == '.npy':
data = np.load(file_path)
print("Loaded a .npy file.")
elif file_ext == '.npz':
data = np.load(file_path)
print("Loaded a .npz file.")
else:
print("Unsupported file format.")
As mentioned earlier, the allow_pickle
parameter should be used with caution. Only set it to True
if you trust the source of the file, as pickled objects can execute arbitrary code.
import numpy as np
# Do not use allow_pickle=True for untrusted files
try:
data = np.load('untrusted_file.npy', allow_pickle=False)
except ValueError:
print("The file may contain pickled objects. Use allow_pickle=True with caution.")
For large files, use memory-mapping (mmap_mode
) to avoid loading the entire file into memory. This can significantly reduce memory usage, especially when working with limited resources.
.npz
FilesWhen saving multiple arrays to a .npz
file, use meaningful keys to make it easier to access the arrays later.
import numpy as np
# Create sample arrays with different meanings
training_data = np.random.rand(100, 10)
testing_data = np.random.rand(20, 10)
# Save the arrays to a .npz file with meaningful keys
np.savez('data.npz', training=training_data, testing=testing_data)
# Load the arrays and access them using meaningful keys
loaded_data = np.load('data.npz')
print(loaded_data['training'])
print(loaded_data['testing'])
The numpy.load
function is a powerful tool for loading NumPy arrays stored in .npy
and .npz
files. By understanding the fundamental concepts, usage methods, common practices, and best practices, you can efficiently load and work with data in your Python projects. Remember to handle errors, consider security implications, and manage memory effectively to make the most of this function.