In the last article, we started with the NumPy package and discussed its advantages, the performance it offers. In this article, we discuss how to create a NumPy array.
Let’s just visualize what the high-dimensional array looks like. In the picture below, we have an array that contains 4 numbers, so this is a 1-D array with 4 numbers
We can use 1-D arrays to represent things like time-series data for instance what is the temperature per hour, it becomes a 1-D array.
If we expand it one dimension, it becomes a 2-D array wherein the below case there are 4 columns and 3 rows
Let’s say we have some spatial data where for every point in space we are collecting some numbers for instance temperature within a room can be a 2-dimensional array.
If we expand this further into another dimension, then we can think of it as a 3-dimensional array (consider the two arrays in the below image as being stacked together, this way there is another dimension that comes into the picture)
One example would be if we are measuring temperature across multiple floors, so the first array in the above image could be the spatial distribution of the temperature on floor 1, and the other array could be the spatial distribution of temperature on floor 2.
We as humans can only imagine 3 dimensions, but we often surprisingly work with high dimensional data, so instead of 3D, if we think of a 4D array, the way to do is just to continue this process inductively to draw 2 of the arrays in the above image becoming the 4th axis(so we just take multiple 3D arrays and stack them), and we can repeat this recursively to add 5th, 6th, 7th dimension and so on.
Let’s draw the 3D array differently.
We often talk about dimensions, in the above case the array has dimensions of 3, but we would like to number these dimensions.
The way we do that is we start off in the dimension where we started off initially with our 1D array, so we started putting 1 row and multiple columns as part of 1D, so in this case, Dimension 2 is along that first axis where we started off(so we are counting dimensions backward), so Dimension 2 is along this first axis of different columns here.
Dimension 1 is along the rows, this was the next thing that we added, we added multiple rows for a given set of columns like that and that becomes Dimension 1.
And Dimension 0 which is the first dimension the way we order it is the last dimension we added which is here the fact that there are two of these 2D matrices stacked up.
So, just to re-iterate we are going to index our three dimensions backward in the order we added them. This is how we index our dimensions.
Size of the array
Let’s take the above case, on Dimension 0 we have two possible locations(2 matrices stacked up — the size along Dimension 0 is 2), along dimension 1 which comes next, we have 3 different rows(3 possible values) so the size along this dimension would be 3, and then dimension 2 here has 4 different columns and therefore the size along this dimension would be 4. The overall size of this 3-dimensional array would be ‘2 X 3 X 4’.
Again, the ordering is that the dimension which was added last is mentioned first followed by the dimension which was added before that, and finally the one that is added first.
This is also called the ‘shape’ of the NumPy array and say we have an array named ‘arr’, we can do ‘arr.shape’ and it’ll give us a value as (2, 3, 4) as a tuple.
We can make this idea crystal clear by taking some points within the space(we have 24 points in this 3D array) and try to find what their co-ordinates are:
Let’s take the number 14 in the array which can be accessed by specifying a combination of three indices(here also we write the indices first for Dimension 0 then for Dimension 1 followed by the indices for Dimension 2).
For dimension 0, the value 14 is on this backward plane(the first plane is at index 0 and the backward plane is at index 1), on dimension 1, we have three rows and 14 is on the first row(indexing starts at 0) so its index would be 0 along that dimension and along the dimension 2 there are 4 columns and the value 14 is in the second column(which has the index as 1). So, if the name of this 3D array is ‘A’, then the value 14 can be accessed using:
We can access the values in a similar way:
Let’s talk about indexing slices of this 3D array, say if we want to access all the 12 numbers in the first plane(along dimension 0) in our 3D array, the way to do that would be:
All the points in the first plane are at index 0 as far as dimension 3 is concerned, and then we are iterating over all possible values of dimension 1 and dimension 2, meaning all 3 rows and all 4 columns. This is represented by putting colons for the respective dimension.
Another example would be:
We can refer to the part of the data as well instead of taking the entire slice using partial indices
With this, we can refer to any arbitrary slice of numbers within an n-dimensional array in terms of these index notations. When we refer to such a slice, this itself becomes an output that is a smaller n-d array. So, the output of this indexing is itself an n-d array and for the below case, the output would be of size ‘2 X 1 X 2’ i.e 2 along the dimension 0, 1 along dimension 1, and 2 along dimension 2. So, this becomes a smaller n-d array.
Creating np arrays
arange(n) : this function returns all integers from 0 all the way up to ‘n-1’.
As is clear from the above snippet, the representation of the NumPy array is similar to a list, it’s type is ‘numpy.ndarray’, ‘nd’ again is for ’n’ dimensional array.
The other way to create this array would be to create a list first and then convert it to NumPy array
Now, there are a bunch of attributes of a NumPy array, for example, using the ‘dtype’ attribute tells the data type of individual elements of the array. So, in the below snippet, the ‘dtype’ reflects as ‘int64’ which means that the numbers in the list are represented as integers and not just integers but integers that have space of 64 bytes per number
We can use the ‘ndim’ attribute, which tells us the number of dimensions, in our case, this particular array has a single dimension.
‘.shape’ attribute returns the shape of the array, in this case, we have only 1 dimension and the number of elements along that direction is only 5 so we get the output as ‘5,’
‘size’ attribute tells the number of elements in the array, in case of one-dimensional array, the value returned by ‘shape’ and ‘size’ matches
‘itemsize’ attribute tells the number of bytes required to store one of these items in the array(in our case, the item data type is int64 and it requires 8 bytes to be stored). This is also important as it gives an idea of how much space it takes to store our data as we go to high dimension arrays.
If we take the initial array and change the first entry from 0 to 0.0, we can see that NumPy then prints each of the entries with a decimal point, and the ‘dtype’ attribute tells us that NumPy treats these values as ‘float64’
The type of the array is inferred depending on the type that is given at the time of creation.
Let’s create a 2-dimensional array by passing in a list of lists.
If we check the dimension of the array using ‘ndim’, we expect a value of 2 as it has 2 dimensions
‘shape’ attribute gives a value of (2, 3) as we have 2 rows and 3 columns.
‘size’ attribute gives the output as 6 which is the total number this 2D array contains.
Let’s create a 3D array:
So, we have two planes, each plane itself is a ‘2 X 3’ array, so overall we have a ‘2 X 2 X 3’ sized array
‘shape’ attribute gives ‘2 X 2 X 3’ as the output.
‘ndim’ gives the output as 3 as we have 3 dimensions in this case.
‘size’ will give the output as 12 as we have a total of 12 items in this case.
In a similar way, we can create 4D, 5D arrays as well.
Other ways of creating arrays
Let’s say we need an array with a value of 1 at every point, we can create it easily using ‘np.ones(size)’ function. Size is typically a tuple and this function will create an array of ones that has the same shape as the passed argument
Let’s say we want a matrix that contains the value 1729 everywhere, then we can create the ones matrix and then multiply it with 1729 (this value is broadcasted to all the elements)
Similarly, ‘np.zeros(shape)’ creates a matrix that contains all 0's
Another important type of NumPy arrays are those which are created with random values(usually used when we want to simulate something)
randn(size) — gives us a series of numbers which are sampled from a normal distribution with mean 0 and variance 1. ’n’ in the function name denotes that it samples from a normal distribution.
We can change the mean and the variance by doing some operations on this array, for instance, say if we wanted a different mean, we can simply add that mean to this array and the distribution changes and if we want a different standard deviation, we can simply multiply a constant with this array.
rand(size) — this generates numbers uniformily sampled between 0 and 1 and each number is equally likely to be sampled
randint(lower limit, upper limit, size) — this function gives the random integers between the lower and the upper limit, and the number of integers to be drawn is specified using the size argument
arange(start, end, step) — generates integers from the start value to the end value(exclusive) and two consecutive entries are separated by the step size
linspace(start, end, number of points to generate) — it creates a linear space between two extremes(here end is inclusive) for example in the below case, we want to draw 10 equally spaced numbers between 7 and 70.
In all the functions that we discussed, the data points were numbers, we can have NumPy arrays of other types as well for example boolean, strings.
For the case where data type is a string, NumPy shows it as ‘U3’, so ‘U’ stands for Uni-code and it has a length of 3, so it is inferring this type and this type will then be used to efficiently store these strings in the D-RAM.
We can convert the NumPy array with the data type of string to other data types as well, for example in the below case, initially, we have an array of strings which are then converted into an array of floats.
In this article, we have discussed various ways of creating NumPy arrays, explored some of the commonly used attributes of the NumPy arrays, then we saw how we can easily create the arrays corresponding to ones and zeros and then the broadcasting property of NumPy arrays when we multiply a scalar with an array, how to generate random values, integers, equally spaced values in a given range, array of booleans and strings.