The `numpy` module

What is the `numpy` module?

A collection of many functions is called a module. One of the most useful modules in Python is called numpy (numerical Python) – it contains many functions to deal with numerical programming. This is technically an extension to the Core Python functionality we’ve been focussing on so far but now comes as standard in most Python installations.

The numpy module builds on the core functionality but also adds additional features including:

It is performant which means it is well optimised
It offers additional numerical computing tools
It adds an additional object called an n-dimensional array

Numpy arrays vs lists

One thing we can use the numpy module for is to create a new object called a numpy array. This is another data structure, in addition to the in-built Python types we’ve been learning about, and is similiar to a list.

Numpy arrays	Numpy module (and arrays) are a Python extension (but often come as standard) Ordered Mutable Less flexible - One data type per array Allows implicit element-wise operations Generally quicker (optimised) More memory efficient	Lists	Lists are part of Python in-built functionality Ordered Mutable Very flexible - All types in any list Needs explicit element-wise operations Generally slower performance Less memory efficient

When using these objects, list objects are highly flexible, in both content and shape whereas numpy.array objects are much more strict and require every item to be the same type and often work best when they have a consistent shape (e.g. 2x3 grid).

Numpy arrays

numpy.array objects are mutable, ordered container objects but must contain a specific object type and have n-dimensional shape.

To use the numpy module we first need to import it.

The as part of this import statement gives us a shorthand to use in the code when we want to access numpy, in this case np. This is the convention most often used for the numpy module. import statements themselves are the way we access additional Python modules such as numpy or matplotlib.

One way to create a numpy.array is from a list:

where we need the np. at the start of the function to tell python to access the numpy module.

We can also index and slice numpy.arrays in a similar way to other iterable objects (i.e. objects with length like lists):

And a numpy.array has an additional properties (attributes) called dtype which tells us what is contained within the array and shape which tells us the dimensions of the array.

Element-wise operations

The numpy module itself also provides some additional tools and syntax to complete simple operations more succinctly. For instance, we’ve shown before one way to act on every item in a list using a for loop:

There is actually a short hand for creating a new list using a for loop for very simple operations called a list comprehension.

But this is still more complex than using a numpy.array, where the same operation can be performed using an operator directly on the whole array:

Operation speed

For large numbers of elements the time difference between operations using lists and numpy.arrays can start to be measurable. We can quickly check this my importing the time module:

Comparing the two operations we can see that performing this operation with the list takes longer than within a numpy.array (this is highly variable though):

You may recall, when we first introduced list and dict objects, we also mentioned other Python objects which were similar but with some differences in functionality (tuple and set objects). In Python, as in many languages, there are often many tools which can be used to complete a task and it’s up to you to choose the correct tool for the job. Overall, list objects may be more appropriate when you need to store a set of strings or if you don’t know the number of elements in advance (appending to a list is faster than appending to an numpy.array due to the way the data is stored in memory). Whereas numpy.array objects would be more appropriate when performance is a factor or for simpler numerical operations.

Working with `numpy`

To use the numpy module we always need to start by using an import statement. In this case we import the numpy module and use the shorthand np:

We’ve seen that we can apply operators directly to a numpy.array:

Similarly you can use additional functions provided by the numpy module to do something to each element in the array. For example you can apply a square root:

Or perform a reductive operation such as calculating the mean of all the elements:

We can also apply mathematical operations over the whole array. For instance we can look at the np.cos function which produces applies the cosine function element-wise:

The help states that this wants an array-like object and wants the input in radians. We can write this as:

If we look at arr1 we can see that this has not been updated by the application of these operations - when using this functionality a copy of the array is returned which you can choose to re-assign to the original variable name or create a new variable:

Element-wise operations on 1D arrays

Element-wise operations in numpy allow you to perform arithmetic or mathematical functions on each corresponding element of arrays. For example, if you have two arrays of the same length, arr1 and arr2, you can add them directly: arr1 + arr2. This will produce a new array where each element is the sum of the elements at the same position in the original arrays. Similarly, you can use other operators (-, *, /) or numpy functions (np.sqrt(arr1), np.cos(arr1)) to apply operations to each element individually. The arrays must have compatible shapes for these operations.

When 1D arrays have different lengths, you need to be careful about the operations you perform. Element-wise operations: Operations such as arr1 + arr3 or arr1 * arr3 require arrays to have the same length or compatible shapes. If the lengths differ, numpy will raise a ValueError due to shape mismatch.

Basic operations on 1D arrays

Summing all elements in a 1D numpy array can be done with np.sum(arr1).

For cumulative summing, use np.cumsum(arr1), which returns an array where each element is the sum of all previous elements.

Sorting is performed with np.sort(arr1), which returns a sorted copy of the array.

To concatenate two arrays, use np.concatenate([arr1, arr2]). This joins the arrays end-to-end, creating a new array containing all elements from both arrays in order. Concatenation is useful for combining datasets or extending arrays.

To find unique elements, use np.unique(arr1), which returns an array of the distinct values in arr1. These operations are efficient and commonly used for data analysis.

--- title: The `numpy` module jupyter: python3 --- ## What is the `numpy` module? A collection of many functions is called a *module*. One of the most useful modules in Python is called *numpy* (**num**erical **Py**thon) – it contains many functions to deal with numerical programming. This is technically an extension to the Core Python functionality we've been focussing on so far but now comes as standard in most Python installations. The `numpy` module builds on the core functionality but also adds additional features including: - It is *performant* which means it is well optimised - It offers additional *numerical computing tools* - It adds an additional object called an *n-dimensional array* ### Numpy arrays vs lists One thing we can use the `numpy` module for is to create a new object called a *numpy array*. This is another data structure, in addition to the in-built Python types we've been learning about, and is similiar to a list. <table style="font-size:0.95em;font-family:Arial, Helvetica, sans-serif;border-spacing:5px;border-collapse:initial"> <tr> <th style="background-color:lavender"> Numpy arrays <td style="width:50%;text-align:left;vertical-align:top"> Numpy module (and arrays) are a Python extension (but often come as standard) Ordered Mutable Less flexible - One data type per array Allows implicit element-wise operations Generally quicker (optimised) More memory efficient <th style="background-color:linen"> Lists <td style="width:50%;text-align:left;vertical-align:top"> Lists are part of Python in-built functionality Ordered Mutable Very flexible - All types in any list Needs explicit element-wise operations Generally slower performance Less memory efficient </tr> </table> When using these objects, `list` objects are highly flexible, in both content and shape whereas `numpy.array` objects are much more strict and require every item to be the same type and often work best when they have a consistent shape (e.g. 2x3 grid). ### Numpy arrays `numpy.array` objects are mutable, ordered container objects but must contain a specific object type and have n-dimensional shape. To use the `numpy` module we first need to *import* it. ```{pyodide} import numpy as np ``` The `as` part of this import statement gives us a shorthand to use in the code when we want to access numpy, in this case `np`. This is the convention most often used for the numpy module. `import` statements themselves are the way we access additional Python modules such as `numpy` or `matplotlib`. One way to create a `numpy.array` is from a `list`: ```{pyodide} list1 = [1.,1.,2.,3.,5.,8.] arr1 = np.array(list1) ``` where we need the `np.` at the start of the function to tell python to access the `numpy` module. We can also index and slice `numpy.arrays` in a similar way to other iterable objects (i.e. objects with length like `lists`): ```{pyodide} print(arr1[0]) print(arr1[2:-1]) ``` And a `numpy.array` has an additional properties (*attributes*) called *dtype* which tells us what is contained within the array and *shape* which tells us the dimensions of the array. ```{pyodide} print(arr1.dtype) print(arr1.shape) ``` ### Element-wise operations The `numpy` module itself also provides some additional tools and syntax to complete simple operations more succinctly. For instance, we've shown before one way to act on every item in a `list` using a `for` loop: ```{pyodide} list2 = [] for item in list1: list2.append(item*4) print(list2) ``` There is actually a short hand for creating a new list using a `for` loop for very simple operations called a *list comprehension*. ```{pyodide} list2 = [item*4 for item in list1] print(list2) ``` But this is still more complex than using a `numpy.array`, where the same operation can be performed using an operator directly on the whole array: ```{pyodide} arr2 = arr1*4 print(arr2) ``` ### Operation speed For large numbers of elements the time difference between operations using `lists` and `numpy.arrays` can start to be measurable. We can quickly check this my importing the `time` module: ```{pyodide} import time num_range = 100000 ``` ```{pyodide} time1 = time.time() list_out = [item*4 for item in range(num_range)] time2 = time.time() list_time = time2-time1 ``` ```{pyodide} time1 = time.time() arr_out = np.arange(num_range)*4 time2 = time.time() arr_time = time2-time1 ``` Comparing the two operations we can see that performing this operation with the `list` takes longer than within a `numpy.array` (this is highly variable though): ```{pyodide} print(f"Array operation is {list_time/arr_time:.0f} times faster for {num_range:,} numbers") ``` You may recall, when we first introduced `list` and `dict` objects, we also mentioned other Python objects which were similar but with some differences in functionality (`tuple` and `set` objects). In Python, as in many languages, there are often many tools which can be used to complete a task and it's up to you to choose the correct tool for the job. Overall, `list` objects may be more appropriate when you need to store a set of strings or if you don't know the number of elements in advance (appending to a `list` is faster than appending to an `numpy.array` due to the way the data is stored in memory). Whereas `numpy.array` objects would be more appropriate when performance is a factor or for simpler numerical operations. ## Working with `numpy` To use the `numpy` module we always need to start by using an import statement. In this case we import the `numpy` module and use the shorthand `np`: ```{pyodide} import numpy as np ``` ```{pyodide} arr1 = np.array([1.,1.,2.,3.,5.,8.]) ``` We've seen that we can apply operators directly to a `numpy.array`: ```{pyodide} arr1*3/2 + 5 ``` Similarly you can use additional functions provided by the `numpy` module to do something to each element in the array. For example you can apply a square root: ```{pyodide} print(np.sqrt(arr1)) ``` Or perform a reductive operation such as calculating the mean of all the elements: ```{pyodide} print(np.mean(arr1)) ``` We can also apply mathematical operations over the whole array. For instance we can look at the `np.cos` function which produces applies the cosine function element-wise: ```{pyodide} np.cos? ``` The help states that this wants an *array-like* object and wants the input in radians. We can write this as: ```{pyodide} print(np.cos(arr1)) ``` If we look at `arr1` we can see that this has not been updated by the application of these operations - when using this functionality a copy of the array is returned which you can choose to re-assign to the original variable name or create a new variable: ```{pyodide} print(arr1) arr2 = arr1*3/2 + 5 print(arr2) ``` ## Element-wise operations on 1D arrays Element-wise operations in numpy allow you to perform arithmetic or mathematical functions on each corresponding element of arrays. For example, if you have two arrays of the same length, `arr1` and `arr2`, you can add them directly: `arr1 + arr2`. This will produce a new array where each element is the sum of the elements at the same position in the original arrays. Similarly, you can use other operators (`-`, `*`, `/`) or numpy functions (`np.sqrt(arr1)`, `np.cos(arr1)`) to apply operations to each element individually. The arrays must have compatible shapes for these operations. ```{pyodide} # Element-wise addition of arr1 and arr2 added = arr1 + arr2 print(added) # Element-wise subtraction of arr1 and arr2 subtracted = arr1 - arr2 print(subtracted) ``` ```{pyodide} # Element-wise multiplication and division of arr1 and arr2 multiplied = arr1 * arr2 divided = arr1 / arr2 print("Element-wise multiplication:", multiplied) print("Element-wise division:", divided) ``` When 1D arrays have different lengths, you need to be careful about the operations you perform. **Element-wise operations**: Operations such as `arr1 + arr3` or `arr1 * arr3` require arrays to have the same length or compatible shapes. If the lengths differ, numpy will raise a `ValueError` due to shape mismatch. ```{pyodide} # This will raise a ValueError because arr1 and arr3 have different lengths result = arr1 + arr3 ``` ## Basic operations on 1D arrays Summing all elements in a 1D numpy array can be done with `np.sum(arr1)`. For cumulative summing, use `np.cumsum(arr1)`, which returns an array where each element is the sum of all previous elements. Sorting is performed with `np.sort(arr1)`, which returns a sorted copy of the array. To concatenate two arrays, use `np.concatenate([arr1, arr2])`. This joins the arrays end-to-end, creating a new array containing all elements from both arrays in order. Concatenation is useful for combining datasets or extending arrays. To find unique elements, use `np.unique(arr1)`, which returns an array of the distinct values in `arr1`. These operations are efficient and commonly used for data analysis. ```{pyodide} # Summing all elements in arr1 total_sum = np.sum(arr1) print("Sum of arr1:", total_sum) # Cumulative summing of arr1 cumulative_sum = np.cumsum(arr1) print("Cumulative sum of arr1:", cumulative_sum) # Sorting arr1 sorted_arr = np.sort(arr1) print("Sorted arr1:", sorted_arr) # Combine arr1 and arr2 into a single array combined = np.concatenate([arr1, arr2]) print("Combined array:", combined) # Finding unique elements in combined unique_elements = np.unique(combined) print("Unique elements in combined:", unique_elements) ```

What is the numpy module?