Python · Data Science

NumPy in Python
The Beginner's Guide

The foundational library for numerical computing in Python. Learn arrays, operations, and the real-world patterns that power the entire data science ecosystem.

Beginner friendly 11 sections ~25 min read

If you are working with data, maths, or machine learning in Python, you will eventually need NumPy, the foundational library for numerical computing. Nearly every data and AI library in Python is built on top of it.

This guide walks you through what NumPy is, why it is so widely used, and how to create arrays, run operations on them, and apply them to real problems. Every example is short and runnable, so the fastest way to learn is to type them out yourself.

What you'll learn

01 What is NumPy?
02 Installation
03 Arrays vs Lists
04 Creating Arrays
05 Array Attributes
06 Array Operations
07 Indexing and Slicing
08 Reshaping and Iterating
09 Useful Functions and Stats
10 Boolean Masking
11 Linear Algebra and Random

What Is NumPy?

NumPy (short for Numerical Python) is a Python library built for working with numbers at scale. At its heart is a single powerful object: the n-dimensional array, which lets you hold and process large amounts of numerical data far faster than ordinary Python.

What NumPy is used for

Handling large multidimensional arrays and matrices
Performing high-speed mathematical operations on whole datasets at once
Serving as the foundation for libraries like Pandas, SciPy, TensorFlow, and scikit-learn

Key features

Fast array operations written in optimised C under the hood
Broadcasting, which applies operations across entire arrays without writing loops
Rich maths: trigonometry, linear algebra, and statistics built in
Easy integration with C, C++, and Fortran code

Why it matters

Almost the entire Python data and machine learning ecosystem sits on top of NumPy. Learn it well and Pandas, scikit-learn, and the rest become much easier to pick up.

Installation

Install NumPy with pip from your terminal:

terminal

pip install numpy

Then import it at the top of your script. By long-standing convention it is imported as np, and you will see that everywhere:

import numpy as np

Tip

Stick with the np alias. Every tutorial, answer, and codebase uses it, so following the convention makes your code instantly familiar to other Python developers.

NumPy Arrays vs Python Lists

You might wonder why you need NumPy arrays when Python already has lists. They look similar at first glance:

list1  = [1, 2, 3]
array1 = np.array([1, 2, 3])

The difference shows up the moment you start doing real work. NumPy arrays are:

More memory efficient, storing numbers in a compact, fixed type
Far faster for calculations, because operations run in optimised C, not Python
Built for maths, supporting matrix operations and broadcasting out of the box

The mental shift

With a list you loop over items one at a time. With a NumPy array you operate on the whole thing at once: array1 * 2 doubles every element with no loop. This "vectorised" style is both faster and cleaner.

Creating NumPy Arrays

There are many ways to create an array depending on what you need. These are the ones you will reach for most often:

import numpy as np

# From a Python list
a = np.array([1, 2, 3])

# A 2D array (rows and columns)
b = np.array([[1, 2], [3, 4]])

# Filled with zeros, given a shape
zeros = np.zeros((2, 3))

# Filled with ones
ones = np.ones((3, 3))

# Identity matrix (1s on the diagonal)
identity = np.eye(4)

# A range of values: start, stop, step
range_array = np.arange(0, 10, 2)   # [0 2 4 6 8]

# Random values between 0 and 1
random_array = np.random.rand(2, 3)

Shapes are tuples

Notice that zeros((2, 3)) takes the shape as a tuple: 2 rows by 3 columns. Getting the shape right is half of working with NumPy.

Array Attributes

Every array carries information about itself. These attributes are the first thing to check when something does not behave as expected:

print(a.shape)   # dimensions, e.g. (3,) or (2, 3)
print(b.ndim)    # number of dimensions
print(a.dtype)   # data type of the elements
print(a.size)    # total number of elements

Attribute	Tells you
`.shape`	The size along each dimension, as a tuple
`.ndim`	How many dimensions the array has
`.dtype`	The type of the elements, such as `int64` or `float64`
`.size`	The total count of elements

Array Operations

Element-wise arithmetic

Arithmetic operators work on the whole array at once, matching up elements by position. No loops required:

x = np.array([1, 2, 3])
y = np.array([4, 5, 6])

print(x + y)    # [5 7 9]
print(x * y)    # [ 4 10 18]
print(x ** 2)   # [1 4 9]

Matrix multiplication

Element-wise * is not the same as a true matrix product. For that, use np.dot (or the @ operator):

a = np.array([[1, 2], [3, 4]])
b = np.array([[2, 0], [1, 3]])

print(np.dot(a, b))   # the matrix product
print(a @ b)          # same thing, shorter syntax

Common mix-up

a * b multiplies matching elements. a @ b (or np.dot) does real matrix multiplication. They give different results, so be sure you know which one you want.

Indexing and Slicing

Accessing elements works like Python lists, but extends naturally into multiple dimensions.

a = np.array([10, 20, 30, 40])

print(a[0])     # 10   first element
print(a[1:3])   # [20 30]   a slice

b = np.array([[1, 2], [3, 4], [5, 6]])

print(b[1][0])   # 3    row 1, column 0
print(b[:, 1])   # [2 4 6]   every row, column 1

The colon means "everything"

In b[:, 1], the colon says "all rows" and the 1 says "column 1". This row-and-column selection is one of the most useful things NumPy offers over plain lists.

Reshaping and Iterating

You can loop over an array, but more often you will want to change its shape without changing its data.

a = np.array([[1, 2, 3], [4, 5, 6]])

# Iterating goes row by row
for row in a:
    print(row)

# Reshape to 3 rows, 2 columns (same 6 values)
reshaped = a.reshape(3, 2)

# Flatten back down to a single 1D array
flat = a.flatten()

Reshape rules

The new shape must hold the same number of elements. A 2 by 3 array (6 values) can become 3 by 2 or 6 by 1, but not 2 by 2. NumPy will raise an error if the totals do not match.

Useful Functions and Statistics

NumPy ships with fast functions for the calculations you run constantly. They operate on the whole array at once:

a = np.array([1, 2, 3, 4, 5])

print(np.sum(a))    # 15
print(np.mean(a))   # 3.0
print(np.std(a))    # standard deviation
print(np.min(a), np.max(a))  # 1 5

Function	Returns
`np.sum`	The total of all elements
`np.mean`	The average
`np.std`	The standard deviation (spread of the data)
`np.min` / `np.max`	The smallest and largest values

Boolean Masking and Filtering

This is one of NumPy's most powerful features. You can filter an array using a condition, and NumPy returns only the elements that match:

a = np.array([1, 2, 3, 4, 5])

# Keep only values greater than 2
print(a[a > 2])    # [3 4 5]

The condition a > 2 produces an array of True and False values, and using it inside the brackets keeps only the elements where the condition is true. This replaces what would be a loop and an if statement in plain Python.

Why it is everywhere

Boolean masking is the backbone of data cleaning. Filtering rows, removing outliers, and selecting samples all use this exact pattern, including in Pandas later on.

Linear Algebra and Random Tools

Random and statistical tools

NumPy's random module generates data for simulations, testing, and machine learning. Set a seed first if you want the same "random" numbers every run:

np.random.seed(42)   # makes results reproducible

# Uniform floats in [0, 1)
print(np.random.rand(3, 2))

# Random integers from 1 to 9
print(np.random.randint(1, 10, size=(2, 3)))

# Samples from a normal distribution
print(np.random.normal(0, 1, size=5))

Linear algebra

The numpy.linalg module handles the matrix maths behind machine learning and scientific computing:

from numpy.linalg import inv, eig, det

matrix = np.array([[2, 1], [3, 4]])

print(inv(matrix))   # inverse
print(det(matrix))   # determinant
print(eig(matrix))   # eigenvalues and eigenvectors

When to Use NumPy

Task	NumPy?	Why
Basic maths	Yes	Fast and easy on whole arrays
Large datasets	Yes	Efficient memory use
Scientific computing	Yes	Linear algebra, statistics, and more
Machine learning prep	Yes	Feature vectors and normalisation
Working with Pandas	Yes	Interoperable with DataFrames

Real-world use cases

Data preprocessing

Cleaning and normalising data before analysis or training.

Image processing

Images are just arrays of pixel values, perfect for NumPy.

Numerical simulations

Physics, finance, and modelling that run on heavy maths.

Matrix algebra

Solving systems of equations and transformations.

AI and ML pipelines

Feature vectors feeding into models of every kind.

Try It Yourself

Put it all together with this short exercise. Type it out, run it, then tweak the numbers and see what changes:

exercise.py

# Create an array of 10 random numbers between 1 and 100
arr = np.random.randint(1, 101, size=10)

# Print the mean and standard deviation
print("Mean:", np.mean(arr))
print("Std Dev:", np.std(arr))

# Sort and print the array
print("Sorted:", np.sort(arr))

Make it your own

Try changing the range, the size, or filtering with a boolean mask before sorting. Experimenting is the fastest way to make these patterns stick.

Final Thoughts

NumPy is essential for anyone working in Python on data analysis, AI, machine learning, or scientific computation. It is not just fast, it also gives you clean, readable code, and it quietly powers most of the data science ecosystem. Master it and everything that builds on top becomes far easier.

Where does NumPy fit?

NumPy is one stage of a much bigger journey. The complete Python roadmap shows you exactly what to learn before and after it, from the basics all the way to deploying real applications.

View the Python roadmap

NumPy in PythonThe Beginner's Guide