NumPy in Python
The Beginner's Guide
The foundational library for numerical computing in Python. Learn arrays, operations, and the real-world patterns that power the entire data science ecosystem.
If you are working with data, maths, or machine learning in Python, you will eventually need NumPy, the foundational library for numerical computing. Nearly every data and AI library in Python is built on top of it.
This guide walks you through what NumPy is, why it is so widely used, and how to create arrays, run operations on them, and apply them to real problems. Every example is short and runnable, so the fastest way to learn is to type them out yourself.
What Is NumPy?
NumPy (short for Numerical Python) is a Python library built for working with numbers at scale. At its heart is a single powerful object: the n-dimensional array, which lets you hold and process large amounts of numerical data far faster than ordinary Python.
What NumPy is used for
- Handling large multidimensional arrays and matrices
- Performing high-speed mathematical operations on whole datasets at once
- Serving as the foundation for libraries like Pandas, SciPy, TensorFlow, and scikit-learn
Key features
- Fast array operations written in optimised C under the hood
- Broadcasting, which applies operations across entire arrays without writing loops
- Rich maths: trigonometry, linear algebra, and statistics built in
- Easy integration with C, C++, and Fortran code
Installation
Install NumPy with pip from your terminal:
pip install numpy
Then import it at the top of your script. By long-standing convention it is imported as np, and you will see that everywhere:
import numpy as np
np alias. Every tutorial, answer, and codebase uses it, so following the convention makes your code instantly familiar to other Python developers.
NumPy Arrays vs Python Lists
You might wonder why you need NumPy arrays when Python already has lists. They look similar at first glance:
list1 = [1, 2, 3] array1 = np.array([1, 2, 3])
The difference shows up the moment you start doing real work. NumPy arrays are:
- More memory efficient, storing numbers in a compact, fixed type
- Far faster for calculations, because operations run in optimised C, not Python
- Built for maths, supporting matrix operations and broadcasting out of the box
array1 * 2 doubles every element with no loop. This "vectorised" style is both faster and cleaner.
Creating NumPy Arrays
There are many ways to create an array depending on what you need. These are the ones you will reach for most often:
import numpy as np # From a Python list a = np.array([1, 2, 3]) # A 2D array (rows and columns) b = np.array([[1, 2], [3, 4]]) # Filled with zeros, given a shape zeros = np.zeros((2, 3)) # Filled with ones ones = np.ones((3, 3)) # Identity matrix (1s on the diagonal) identity = np.eye(4) # A range of values: start, stop, step range_array = np.arange(0, 10, 2) # [0 2 4 6 8] # Random values between 0 and 1 random_array = np.random.rand(2, 3)
zeros((2, 3)) takes the shape as a tuple: 2 rows by 3 columns. Getting the shape right is half of working with NumPy.
Array Attributes
Every array carries information about itself. These attributes are the first thing to check when something does not behave as expected:
print(a.shape) # dimensions, e.g. (3,) or (2, 3) print(b.ndim) # number of dimensions print(a.dtype) # data type of the elements print(a.size) # total number of elements
| Attribute | Tells you |
|---|---|
.shape | The size along each dimension, as a tuple |
.ndim | How many dimensions the array has |
.dtype | The type of the elements, such as int64 or float64 |
.size | The total count of elements |
Array Operations
Element-wise arithmetic
Arithmetic operators work on the whole array at once, matching up elements by position. No loops required:
x = np.array([1, 2, 3]) y = np.array([4, 5, 6]) print(x + y) # [5 7 9] print(x * y) # [ 4 10 18] print(x ** 2) # [1 4 9]
Matrix multiplication
Element-wise * is not the same as a true matrix product. For that, use np.dot (or the @ operator):
a = np.array([[1, 2], [3, 4]]) b = np.array([[2, 0], [1, 3]]) print(np.dot(a, b)) # the matrix product print(a @ b) # same thing, shorter syntax
a * b multiplies matching elements. a @ b (or np.dot) does real matrix multiplication. They give different results, so be sure you know which one you want.
Indexing and Slicing
Accessing elements works like Python lists, but extends naturally into multiple dimensions.
a = np.array([10, 20, 30, 40]) print(a[0]) # 10 first element print(a[1:3]) # [20 30] a slice
b = np.array([[1, 2], [3, 4], [5, 6]]) print(b[1][0]) # 3 row 1, column 0 print(b[:, 1]) # [2 4 6] every row, column 1
b[:, 1], the colon says "all rows" and the 1 says "column 1". This row-and-column selection is one of the most useful things NumPy offers over plain lists.
Reshaping and Iterating
You can loop over an array, but more often you will want to change its shape without changing its data.
a = np.array([[1, 2, 3], [4, 5, 6]]) # Iterating goes row by row for row in a: print(row) # Reshape to 3 rows, 2 columns (same 6 values) reshaped = a.reshape(3, 2) # Flatten back down to a single 1D array flat = a.flatten()
Useful Functions and Statistics
NumPy ships with fast functions for the calculations you run constantly. They operate on the whole array at once:
a = np.array([1, 2, 3, 4, 5]) print(np.sum(a)) # 15 print(np.mean(a)) # 3.0 print(np.std(a)) # standard deviation print(np.min(a), np.max(a)) # 1 5
| Function | Returns |
|---|---|
np.sum | The total of all elements |
np.mean | The average |
np.std | The standard deviation (spread of the data) |
np.min / np.max | The smallest and largest values |
Boolean Masking and Filtering
This is one of NumPy's most powerful features. You can filter an array using a condition, and NumPy returns only the elements that match:
a = np.array([1, 2, 3, 4, 5]) # Keep only values greater than 2 print(a[a > 2]) # [3 4 5]
The condition a > 2 produces an array of True and False values, and using it inside the brackets keeps only the elements where the condition is true. This replaces what would be a loop and an if statement in plain Python.
Linear Algebra and Random Tools
Random and statistical tools
NumPy's random module generates data for simulations, testing, and machine learning. Set a seed first if you want the same "random" numbers every run:
np.random.seed(42) # makes results reproducible # Uniform floats in [0, 1) print(np.random.rand(3, 2)) # Random integers from 1 to 9 print(np.random.randint(1, 10, size=(2, 3))) # Samples from a normal distribution print(np.random.normal(0, 1, size=5))
Linear algebra
The numpy.linalg module handles the matrix maths behind machine learning and scientific computing:
from numpy.linalg import inv, eig, det matrix = np.array([[2, 1], [3, 4]]) print(inv(matrix)) # inverse print(det(matrix)) # determinant print(eig(matrix)) # eigenvalues and eigenvectors
When to Use NumPy
| Task | NumPy? | Why |
|---|---|---|
| Basic maths | Yes | Fast and easy on whole arrays |
| Large datasets | Yes | Efficient memory use |
| Scientific computing | Yes | Linear algebra, statistics, and more |
| Machine learning prep | Yes | Feature vectors and normalisation |
| Working with Pandas | Yes | Interoperable with DataFrames |
Real-world use cases
Data preprocessing
Cleaning and normalising data before analysis or training.
Image processing
Images are just arrays of pixel values, perfect for NumPy.
Numerical simulations
Physics, finance, and modelling that run on heavy maths.
Matrix algebra
Solving systems of equations and transformations.
AI and ML pipelines
Feature vectors feeding into models of every kind.
Try It Yourself
Put it all together with this short exercise. Type it out, run it, then tweak the numbers and see what changes:
# Create an array of 10 random numbers between 1 and 100 arr = np.random.randint(1, 101, size=10) # Print the mean and standard deviation print("Mean:", np.mean(arr)) print("Std Dev:", np.std(arr)) # Sort and print the array print("Sorted:", np.sort(arr))
Final Thoughts
NumPy is essential for anyone working in Python on data analysis, AI, machine learning, or scientific computation. It is not just fast, it also gives you clean, readable code, and it quietly powers most of the data science ecosystem. Master it and everything that builds on top becomes far easier.
