Python · Data Science

NumPy in Python
The Beginner's Guide

The foundational library for numerical computing in Python. Learn arrays, operations, and the real-world patterns that power the entire data science ecosystem.

Beginner friendly 11 sections ~25 min read

If you are working with data, maths, or machine learning in Python, you will eventually need NumPy, the foundational library for numerical computing. Nearly every data and AI library in Python is built on top of it.

This guide walks you through what NumPy is, why it is so widely used, and how to create arrays, run operations on them, and apply them to real problems. Every example is short and runnable, so the fastest way to learn is to type them out yourself.

01

What Is NumPy?

NumPy (short for Numerical Python) is a Python library built for working with numbers at scale. At its heart is a single powerful object: the n-dimensional array, which lets you hold and process large amounts of numerical data far faster than ordinary Python.

What NumPy is used for

  • Handling large multidimensional arrays and matrices
  • Performing high-speed mathematical operations on whole datasets at once
  • Serving as the foundation for libraries like Pandas, SciPy, TensorFlow, and scikit-learn

Key features

  • Fast array operations written in optimised C under the hood
  • Broadcasting, which applies operations across entire arrays without writing loops
  • Rich maths: trigonometry, linear algebra, and statistics built in
  • Easy integration with C, C++, and Fortran code
Why it matters
Almost the entire Python data and machine learning ecosystem sits on top of NumPy. Learn it well and Pandas, scikit-learn, and the rest become much easier to pick up.

02

Installation

Install NumPy with pip from your terminal:

terminal
pip install numpy

Then import it at the top of your script. By long-standing convention it is imported as np, and you will see that everywhere:

import numpy as np
Tip
Stick with the np alias. Every tutorial, answer, and codebase uses it, so following the convention makes your code instantly familiar to other Python developers.

03

NumPy Arrays vs Python Lists

You might wonder why you need NumPy arrays when Python already has lists. They look similar at first glance:

list1  = [1, 2, 3]
array1 = np.array([1, 2, 3])

The difference shows up the moment you start doing real work. NumPy arrays are:

  • More memory efficient, storing numbers in a compact, fixed type
  • Far faster for calculations, because operations run in optimised C, not Python
  • Built for maths, supporting matrix operations and broadcasting out of the box
The mental shift
With a list you loop over items one at a time. With a NumPy array you operate on the whole thing at once: array1 * 2 doubles every element with no loop. This "vectorised" style is both faster and cleaner.

04

Creating NumPy Arrays

There are many ways to create an array depending on what you need. These are the ones you will reach for most often:

import numpy as np

# From a Python list
a = np.array([1, 2, 3])

# A 2D array (rows and columns)
b = np.array([[1, 2], [3, 4]])

# Filled with zeros, given a shape
zeros = np.zeros((2, 3))

# Filled with ones
ones = np.ones((3, 3))

# Identity matrix (1s on the diagonal)
identity = np.eye(4)

# A range of values: start, stop, step
range_array = np.arange(0, 10, 2)   # [0 2 4 6 8]

# Random values between 0 and 1
random_array = np.random.rand(2, 3)
Shapes are tuples
Notice that zeros((2, 3)) takes the shape as a tuple: 2 rows by 3 columns. Getting the shape right is half of working with NumPy.

05

Array Attributes

Every array carries information about itself. These attributes are the first thing to check when something does not behave as expected:

print(a.shape)   # dimensions, e.g. (3,) or (2, 3)
print(b.ndim)    # number of dimensions
print(a.dtype)   # data type of the elements
print(a.size)    # total number of elements
AttributeTells you
.shapeThe size along each dimension, as a tuple
.ndimHow many dimensions the array has
.dtypeThe type of the elements, such as int64 or float64
.sizeThe total count of elements

06

Array Operations

Element-wise arithmetic

Arithmetic operators work on the whole array at once, matching up elements by position. No loops required:

x = np.array([1, 2, 3])
y = np.array([4, 5, 6])

print(x + y)    # [5 7 9]
print(x * y)    # [ 4 10 18]
print(x ** 2)   # [1 4 9]

Matrix multiplication

Element-wise * is not the same as a true matrix product. For that, use np.dot (or the @ operator):

a = np.array([[1, 2], [3, 4]])
b = np.array([[2, 0], [1, 3]])

print(np.dot(a, b))   # the matrix product
print(a @ b)          # same thing, shorter syntax
Common mix-up
a * b multiplies matching elements. a @ b (or np.dot) does real matrix multiplication. They give different results, so be sure you know which one you want.

07

Indexing and Slicing

Accessing elements works like Python lists, but extends naturally into multiple dimensions.

1D
a = np.array([10, 20, 30, 40])

print(a[0])     # 10   first element
print(a[1:3])   # [20 30]   a slice
2D
b = np.array([[1, 2], [3, 4], [5, 6]])

print(b[1][0])   # 3    row 1, column 0
print(b[:, 1])   # [2 4 6]   every row, column 1
The colon means "everything"
In b[:, 1], the colon says "all rows" and the 1 says "column 1". This row-and-column selection is one of the most useful things NumPy offers over plain lists.

08

Reshaping and Iterating

You can loop over an array, but more often you will want to change its shape without changing its data.

a = np.array([[1, 2, 3], [4, 5, 6]])

# Iterating goes row by row
for row in a:
    print(row)

# Reshape to 3 rows, 2 columns (same 6 values)
reshaped = a.reshape(3, 2)

# Flatten back down to a single 1D array
flat = a.flatten()
Reshape rules
The new shape must hold the same number of elements. A 2 by 3 array (6 values) can become 3 by 2 or 6 by 1, but not 2 by 2. NumPy will raise an error if the totals do not match.

09

Useful Functions and Statistics

NumPy ships with fast functions for the calculations you run constantly. They operate on the whole array at once:

a = np.array([1, 2, 3, 4, 5])

print(np.sum(a))    # 15
print(np.mean(a))   # 3.0
print(np.std(a))    # standard deviation
print(np.min(a), np.max(a))  # 1 5
FunctionReturns
np.sumThe total of all elements
np.meanThe average
np.stdThe standard deviation (spread of the data)
np.min / np.maxThe smallest and largest values

10

Boolean Masking and Filtering

This is one of NumPy's most powerful features. You can filter an array using a condition, and NumPy returns only the elements that match:

a = np.array([1, 2, 3, 4, 5])

# Keep only values greater than 2
print(a[a > 2])    # [3 4 5]

The condition a > 2 produces an array of True and False values, and using it inside the brackets keeps only the elements where the condition is true. This replaces what would be a loop and an if statement in plain Python.

Why it is everywhere
Boolean masking is the backbone of data cleaning. Filtering rows, removing outliers, and selecting samples all use this exact pattern, including in Pandas later on.

11

Linear Algebra and Random Tools

Random and statistical tools

NumPy's random module generates data for simulations, testing, and machine learning. Set a seed first if you want the same "random" numbers every run:

np.random.seed(42)   # makes results reproducible

# Uniform floats in [0, 1)
print(np.random.rand(3, 2))

# Random integers from 1 to 9
print(np.random.randint(1, 10, size=(2, 3)))

# Samples from a normal distribution
print(np.random.normal(0, 1, size=5))

Linear algebra

The numpy.linalg module handles the matrix maths behind machine learning and scientific computing:

from numpy.linalg import inv, eig, det

matrix = np.array([[2, 1], [3, 4]])

print(inv(matrix))   # inverse
print(det(matrix))   # determinant
print(eig(matrix))   # eigenvalues and eigenvectors

12

When to Use NumPy

TaskNumPy?Why
Basic mathsYesFast and easy on whole arrays
Large datasetsYesEfficient memory use
Scientific computingYesLinear algebra, statistics, and more
Machine learning prepYesFeature vectors and normalisation
Working with PandasYesInteroperable with DataFrames

Real-world use cases

Data preprocessing

Cleaning and normalising data before analysis or training.

Image processing

Images are just arrays of pixel values, perfect for NumPy.

Numerical simulations

Physics, finance, and modelling that run on heavy maths.

Matrix algebra

Solving systems of equations and transformations.

AI and ML pipelines

Feature vectors feeding into models of every kind.


13

Try It Yourself

Put it all together with this short exercise. Type it out, run it, then tweak the numbers and see what changes:

exercise.py
# Create an array of 10 random numbers between 1 and 100
arr = np.random.randint(1, 101, size=10)

# Print the mean and standard deviation
print("Mean:", np.mean(arr))
print("Std Dev:", np.std(arr))

# Sort and print the array
print("Sorted:", np.sort(arr))
Make it your own
Try changing the range, the size, or filtering with a boolean mask before sorting. Experimenting is the fastest way to make these patterns stick.

Final Thoughts

NumPy is essential for anyone working in Python on data analysis, AI, machine learning, or scientific computation. It is not just fast, it also gives you clean, readable code, and it quietly powers most of the data science ecosystem. Master it and everything that builds on top becomes far easier.

Where does NumPy fit?
NumPy is one stage of a much bigger journey. The complete Python roadmap shows you exactly what to learn before and after it, from the basics all the way to deploying real applications.
View the Python roadmap
Scroll to Top