Datasets available from various sources for data analysis tasks are not always clean and in a structured format like database tables. One such example of the dataset is tree-like datasets (**JSON**). The tree-like datasets (**JSON**) are nested datasets that can have more than one level and different levels have a different set of elements. It accepts arrays as well as dictionaries. Python has famous fast and optimized libraries like numpy and pandas to work with arrays and structured datasets. But numpy arrays are fixed-length rectangular tables that can not be used to represent variable-length arrays. We can not have a numpy array where the first row has 2 elements, the second has 4, and so on.

The most common way to represent tree-like data structure is JSON but operations on JSON are not fast and optimized like numpy array operations. In order to solve the problem of working with tree-like data structures with a speed of numpy-like operations, **awkward array** library was designed. The library **awkward** array was originally designed for high-energy particle physics which has complicated tree-like data structures which can not be flattened to numpy arrays. The datasets of particle physics are huge and the people working with them needed a library that can work with such datasets like numpy handles arrays. The **awkward array** let us work with tree-like nested data structure as if we are working with numpy arrays. The code designed using **awkward array** to work with tree-like data structure is fast like numpy code working on arrays.

As a part of this tutorial, we'll explain how to create and work with **awkward arrays** with simple examples. We'll try to cover as much API as possible of the library. Below we have highlighted important sections of the tutorial to give an overview of the material that we have covered.

- Awkward Array Creation
- Awkward Array Indexing/Slicing
- Awkward Array Attributes
- Normal Array Operations
- Simple Statistics on Awkward Arrays

We can install **awkward array** simply using the below **pip** command.

**pip install awkward**

We have imported an **awkward array** library and printed the version that we'll be using in our tutorial.

In [1]:

```
import awkward as ak
print("Awkward Array Version : {}".format(ak.__version__))
```

In this section, we'll explain how we can create **Awkward arrays**. There are various ways to create **awkward** arrays. We'll try to explain the majority of them. The arrays that we create in this section will be used in upcoming sections where we explain indexing and other features.

Below we have created normal JSON-like data using our python constructs. We have lists inside of the list. There are 4 lists inside of the list. All list has one or more dictionaries present in them. The keys of dictionaries are **x**, **y**. The values of key **x** is a list of variable lengths. The values of key **y** is another dictionary with key **z**. The values of key **z** is a list of variable lengths. We can notice from the below data structure that it does not strictly follow any structure. The list has a different number of elements, the keys of dictionaries are present in some elements and not in some. If we have a very large dataset with such a structure and we have to loop through it to perform some stats then it'll take a lot of time. Thanks to **Awkward array** we can work on it as a numpy array with less line of code and faster performance. We'll be converting the below data structure to **Awkward array** next.

In [2]:

```
arr = [
[ {"x": [1,2,3], "y": {"z": [4,5,6] }},
{"x": [7,8,9,10], "y": {"z": [11,12] }}],
[ {"x": [13,14,15,16,17,18]},
{"y": {"z": [19,] }}],
[],
[ {"x": [20,21,22,23], "y": {"z": [24,25,26,27,28] }},
{"x": [29,30,31,32,33,34,35,36],},
{"y": {"z": [37,38] }}
]
]
```

The most common way to create an **Awkward array** is by using **Array()** constructor available from **awkward**. This constructor can accept a list of data types like numpy array, python lists, python dictionaries, iterators, strings, etc as input and creates an **Awkward array** from it.

Below we have created an awkward array using **Array()** constructor by giving it the data structure we had created in the previous cell. We have also printed the array for display purposes.

In [3]:

```
ak_arr = ak.Array(arr)
ak_arr
```

Out[3]:

The **awkward array** has an important attribute named **type** which returns type information about the underlying array. We can notice from the type information below that says there are 4 elements in an array of variable-sized, Each element has a dictionary with keys **x** and **y**. The values of key **x** is variable length and values of key **y** is a dictionary with key **z**. The type also specifies the type of element which is **int64**.

Please make a **NOTE** that **awkward array** also let us mix different data types which is another plus point of using it.

In [4]:

```
ak_arr.type
```

Out[4]:

We can also create an **awkward array** using **from_iter()** method available from the library. It accepts python iterable as input. Below we have given our data structure from earlier as input and it works fine with the method which creates **awkward array** from it.

In [5]:

```
ak_arr2 = ak.from_iter(iter(arr))
ak_arr2
```

Out[5]:

In [6]:

```
ak_arr2.type
```

Out[6]:

The **awkward array** library provides methods like NumPy’s to create an array. It provides a method named **one_like()** which takes as input another **awkward array** as input and returns a new **awkward array** where all elements are replaced with 1.

Below we have given our **awkward array** from previous cells as input to a method and we can notice that all integer elements are replaced with 1.

In [7]:

```
ak.ones_like(ak_arr)
```

Out[7]:

The **zeros_like()** method works exactly like **ones_like()** with only difference that it replaces all elements with zeros.

In [8]:

```
ak.zeros_like(ak_arr)
```

Out[8]:

The **full_like()** method works like **ones_like()** and **zeros_like()** but replaces input array elements with the element specified as second argument of the method call.

Below we have replaced all array values with 100.

In [9]:

```
ak.full_like(ak_arr, 100)
```

Out[9]:

The **from_json()** method takes as input JSON file name or JSON formatted contents as string. It then converts it to **awkward array** and returns it.

Below we have loaded **GeoJSON** file which has information about US states. We have downloaded the file and kept it in a **datasets** folder. The **GeoJSON** dataset is a good example where individual elements do not follow any structure. The Polygons and Multi Polygons representing different states can have a different number of elements. The US states **GeoJSON** dataset can be easily downloaded from the internet with a simple google search.

We can notice from the output printed that it’s of type **Record** the reason behind this is that file has one dictionary where actual data is kept in **features** key of that dictionary. When we access **features** key of the dictionary below in the next cell, we can notice that it prints **awkward array**.

In [10]:

```
us_states = ak.from_json("datasets/us-states.json")
us_states
```

Out[10]:

In [11]:

```
us_states["features"]
```

Out[11]:

Below we have loaded the contents of the **GeoJSON** file first as a string and then created an **awkward array** from it using **from_json()** method. The results will be the same as if we have given the input file name.

In [12]:

```
file_contents = open("datasets/us-states.json").read()
us_states = ak.from_json(file_contents)
us_states
```

Out[12]:

In [13]:

```
us_states["features"]
```

Out[13]:

The **from_numpy()** method lets us create **awkward array** from the numpy array.

Below we have created a simple **awkward array** from a random data numpy array.

In [14]:

```
import numpy as np
ak.from_numpy(np.random.rand(10,10))
```

Out[14]:

In this section, we have created another **awkward array** using **GeoJSON** dataset that we'll be using in upcoming sections of our tutorial for explanation purposes.

Below we have first loaded the US states **GeoJSON** dataset as a python dictionary. We have then read information about the US states population from a CSV file (US States Population 2018). We have populated the **GeoJSON** data with state population using a simple loop.

We have then printed the first and last few characters of our final **GeoJSON** dataset for displaying the contents of it.

In [15]:

```
import numpy as np
import pandas as pd
import json
us_states = json.load(open("datasets/us-states.json"))
print("Data Keys : {}\n".format(us_states.keys()))
us_states = us_states["features"]
df = pd.read_csv("datasets/State Populations.csv")
state_to_population = dict(zip(df["State"], df["2018 Population"]))
for feature in us_states:
state_name = feature["properties"]["name"]
feature["properties"]["population"] = state_to_population.get(state_name, None)
print("Data Overview (Start): ")
print(json.dumps(us_states, indent=4)[:500])
print("\nData Overview (End): ")
print(json.dumps(us_states, indent=4)[-500:])
```

As **GeoJSON** datasets include information about geometry it represents, the geometries can be represented using a different kind of **shapely** objects like Polygon, Multi-Polygon, Lines, etc. In our case, as we have loaded US states **GeoJSON** data, it has state boundaries represented using either **Polygon** or **MultiPolygon**.

Below we have retrieved the first element of our dataset and then printed information about it. The first element is Polygon. We have printed information about coordinates of Polygon as well as property name which holds state name.

In the next cell below, we have retrieved the second element of our dataset and printed information about it which is MultiPolygon.

Please notice the difference between the shape of **Polygon** and **MultiPolygon**.

In [16]:

```
print("Shape Type : {}".format(us_states[0]["geometry"]["type"]))
polygon = us_states[0]["geometry"]["coordinates"]
print("State Name : {}\n".format(us_states[0]["properties"]["name"]))
print("Polygon Shape : {}".format(np.array(polygon).shape))
print("Polygon Data : ")
print(polygon)
```

In [17]:

```
print("Shape Type : {}".format(us_states[1]["geometry"]["type"]))
multi_polygon = us_states[1]["geometry"]["coordinates"]
print("State Name : {}\n".format(us_states[1]["properties"]["name"]))
print("Number of Polygons : {}\n".format(len(multi_polygon)))
print("Polygon 0 Shape : {}".format(np.array(multi_polygon[0]).shape))
print("Polygon 0 Data : {}".format(multi_polygon[0]))
print("\nPolygon 1 Shape : {}".format(np.array(multi_polygon[1]).shape))
print("Polygon 1 Data : {}".format(multi_polygon[1]))
print("\nLast Polygon Shape : {}".format(np.array(multi_polygon[-1]).shape))
print("Last Polygon Data : {}".format(multi_polygon[-1]))
```

At last, we have created an **awkward array** from our **GeoJSON** data that we loaded and displayed information about in previous cells. We'll be using this **awkward array** in our upcoming sections to explain a few examples.

In [18]:

```
ak_us_states = ak.Array(us_states)
ak_us_states
```

Out[18]:

Below we have printed type information of our **awkward array** created from **GeoJSON** dataset.

- We can notice that it shows data has 50 entries each of type dictionary.
- The dictionary has keys named
**type, id, properties, and geometry**. - The values present inside key
**properties**is another dictionary with keys**name**and**population**. - The values present inside key
**coordinates**is another dictionary with keys**type**and**coordinates**. - The
**coordinates**key has variable-length data.

Please feel free to look above where we printed the first and last few characters of the dataset to match it with the data type printed below.

In [19]:

```
ak_us_states.type
```

Out[19]:

In this section, we'll explain with examples how we can perform indexing through our **awkward array** just like we do with numpy arrays. We'll be using two major arrays we created earlier for example purposes.

In this example, we'll be performing indexing on our first array which we had created at the beginning of the tutorial.

Below we have accessed the 0th element of our awkward array using simple integer indexing.

In [20]:

```
ak_arr[0]
```

Out[20]:

Whenever we are indexing an **awkward array** and the next elements to index in dimensions are dictionaries then we can give keys of dictionary inside square brackets to access contents inside of dictionary.

Below we have first accessed the 0th element of the awkward array which will bring an array that will have 2 elements inside of it. Both elements are dictionaries with keys **x** and **y**. We have then provided the second dimension as **x** to index the dictionary. This will bring **x** values of both dictionaries.

In [21]:

```
ak_arr[0, "x"]
```

Out[21]:

**Awkward array** let us access elements of the first dictionary in the array by directly calling them with keys of the dictionary. The first dictionary can not be at the first level in array but it can be 2-3 levels down as well.

Below we have accessed all **x** values of our array by simply calling awkward array with key **x**. This will return value inside of array with key **x** for all elements. It'll also follow the level structure. The dictionary with key **x** was inside of elements of the main array hence we can notice in output there are 2 brackets before elements of **x**.

In [22]:

```
ak_arr["x"]
```

Out[22]:

In the next cell, we have explained how we can access all **x** values with numpy like array indexing. This will give the same result as our previous cell.

In [23]:

```
x = ak_arr[:, :, "x"]
print(x)
```

In the below cell, we have explained how we can get the 0th element of the lists that are present in all **x** keys. We have treated the first two dimensions as a numpy array and given **:** to select all elements. All elements will be dictionaries so we have given string **'x'** to select all **x** values which are lists and then we gave index **0** to select 0th element from all **x** lists.

In [24]:

```
x = ak_arr[:, :, "x", 0]
print(x)
```

In the next cell, we have explained how we can get the 0th element from all lists inside of our main list.

In [25]:

```
x = ak_arr[:, :, 0]
print(x)
```

In the below cell, we have tried to retrieve the first two elements of each list present in the main array with the key **x**.

In [26]:

```
x = ak_arr[:, :, "x", 0:2]
print(x)
```

In the next cell, we have first taken the 0th element from the list which will be a list of two dictionaries, then we have selected the first dictionary from it and retrieved **'y'** key from it which is again a dictionary.

In [27]:

```
x = ak_arr[0, 0, "y"]
print(x)
```

In the next cell, we have taken the 0th element of our array which is a list of two dictionaries, then we have selected both dictionaries using **:** operator, and at last, we have retrieved element **'y'** from both dictionaries.

In [28]:

```
x = ak_arr[0, :, "y"]
print(x)
```

In the next cell, we have retrieved **'z'** element from both dictionaries of 1st element from **'y'** key.

In [29]:

```
x = ak_arr[0, :, "y", "z"]
print(x)
```

In the next cell, we have retrieved first all **'z'** key values using the same indexing as our previous cell and at last, we have retrieved 1st elements from all list of **'z'** values.

In [30]:

```
x = ak_arr[0, :, "y", "z", 1]
print(x)
```

In the next cell, we have selected all elements from the first dimension which will be all arrays inside of our main array, followed by all elements of those arrays which will be all dictionaries. We have then retrieved **'y'** key value for all dictionaries, followed by all **'z'** key values for all those **'y'** values because the values of **'y'** key is again a dictionary. Then we have retrieved the 0th value from all **'z'** key values.

In [31]:

```
x = ak_arr[:, :, "y", "z", 0]
print(x)
```

Our next cell retrieved the last element of all **'z'** key arrays. It uses almost the same indexing as our previous cell with only one difference at last.

In [32]:

```
x = ak_arr[:, :, "y", "z", -1]
print(x)
```

In the next cell, we have explained how we can access elements of the first dictionary of our **awkward array** by treating key values as attributes of an array object. We have retrieved all **'x'** values. This is almost same as the code **ak_arr['x']**.

In [33]:

```
ak_arr.x
```

Out[33]:

In the next cell, we have retrieved all **'x'** values from 1st element of the array.

In [34]:

```
ak_arr.x[0]
```

Out[34]:

We can also retrieve **'y'** key values because **'y'** key is in the same dictionary and at the same level as **'x'** key.

In [35]:

```
ak_arr.y
```

Out[35]:

Below we have retrieved the value of the **'z'** key by treating it as an attribute.

In [36]:

```
ak_arr.y.z
```

Out[36]:

In the next cell, we have further created a few more simple examples of indexing.

In [37]:

```
ak_arr.y.z[0], ak_arr.y.z[1], ak_arr.y.z[2], ak_arr.y.z[3]
```

Out[37]:

In this section, we'll explain indexing on our **awkward array** which we created by loading contents from the **GeoJSON** file.

In the below cell, we have first retrieved all elements (using **':'** to select all values in the first dimension) of our **awkward array** loaded from **GeoJSON** which is a list of dictionaries, and then taken **'id'** key from all lists. This will return us an array of all state codes of the United States.

In [38]:

```
ak_us_states[:, "id"]
```

Out[38]:

In the below cell, we have first retrieved all elements of our **awkward array**, we have then retrieved **'properties'** key values for all of them. The values of **'properties'** key are again dictionaries hence we have retrieved **'name'** key value from them which will return us the name of US states.

In [39]:

```
x = ak_us_states[:, "properties", "name"]
print(x)
```

In the next cell, we have retrieved **'type'** key values from **'geometry'** key values for all dictionaries of our array. This will help us see the geometry type of all states. We can notice that the geometry type for Alabama is Polygon whereas the geometry type for Alaska is Multi-Polygon.

In [40]:

```
x = ak_us_states[:, "geometry", "type"]
print(x)
```

In the below cell, we have retrieved values of **'coordinates'** key which is inside of **'geometry'** dictionary for all dictionaries of our **awkward array**.

In [41]:

```
x = ak_us_states[:, "geometry", "coordinates"]
print(x)
```

In the below cell, we have retrieved the 0th value of all **'coordinates'** keys which are inside of **'geometry'** key for all dictionaries of our **awkward array**.

In [42]:

```
x = ak_us_states[:, "geometry", "coordinates", 0]
print(x)
```

In the below cell, we have taken the 0th element of our **awkward array** which is the dictionary. We have then retrieved **'coordinates'** key value inside of **'geometry'** key of that dictionary. Then we have taken the 0th element of that value. We know that values that are stored inside of **'coordinates'** key are multi-dimensional arrays storing Polygon or Multi-Polygon geometry data. For our first dictionary, the geometry is a Polygon of 3-dimensional shape. We have taken the 0th element from this 3-dimensional array which will be another 2-dimensional array as we can see in the output.

In [43]:

```
x = ak_us_states[0, "geometry", "coordinates", 0]
print(x)
```

In the below cell, we have retrieved the first value of the 3-dimensional array present inside of the value of **'coordinates'** key. We have given three times 0 inside of brackets to follow 3 dimensions.

In [44]:

```
x = ak_us_states[0, "geometry", "coordinates", 0, 0, 0]
print(x)
```

In the below cell, we have tried to retrieve the 0th element for all 3-dimensional or 4-dimensional arrays present inside of **'coordinates'** key. We have given first dimension indexing as **':'** to select all dictionaries from our array. We can notice that sometimes our first element is a single float and sometimes, it's an array of two floats. The reason behind this is that for Polygon shapes, the coordinates are represented using a 3-dimensional array and for Multi-Polygon shapes, the coordinates are represented using a 4-dimensional array.

In [45]:

```
x = ak_us_states[:, "geometry", "coordinates", 0, 0, 0]
print(x)
```

In the next two cells, we have explained how we can retrieve information by treating the dictionary key as an attribute of the **awkward array**.

In [46]:

```
ak_us_states.geometry.coordinates
```

Out[46]:

In [47]:

```
ak_us_states.geometry.coordinates[0][0][0]
```

Out[47]:

In this section, we'll introduce a few useful properties or attributes available for our **awkward array** object which can be useful when working with them.

We can retrieve the fields/keys of our first dictionary inside of our **awkward array** using **fields** attribute.

Below we have printed **fields** information for both arrays which we created earlier. We can notice that for the first array it only includes fields **'x'** and **'y'**, it does not include key **'z'** which is inside of key **'y'**. The same is the case with our second **awkward array** of GeoJSON data.

In [48]:

```
ak_arr.fields
```

Out[48]:

In [49]:

```
ak_us_states.fields
```

Out[49]:

The **nbytes** attribute returns a number of bytes required to store an array in memory.

In [50]:

```
ak_arr.nbytes
```

Out[50]:

In [51]:

```
ak_us_states.nbytes
```

Out[51]:

The **ndim** attribute returns the number of dimensions of our **awkward array**.

In [52]:

```
ak_arr.ndim
```

Out[52]:

In [53]:

```
ak_us_states.ndim
```

Out[53]:

The type attribute as we had discussed earlier returns the type of our **awkward array**.

In [54]:

```
ak_arr.type
```

Out[54]:

In [55]:

```
ak_us_states.type
```

Out[55]:

In the next cell, we have explained that we can treat keys of our first dictionary inside of **awkward array** as attributes of an array.

In [56]:

```
ak_arr.x
```

Out[56]:

In [57]:

```
ak_arr.y
```

Out[57]:

In [58]:

```
ak_us_states.geometry
```

Out[58]:

In [59]:

```
ak_us_states.id
```

Out[59]:

In this section, we'll explain commonly performing operations with arrays like filtering entries, flattening arrays, combining arrays, checking for nulls, counting non-zero elements, conditional operations, etc. We'll be using our arrays which we created earlier to explain various functions in this section.

**Awkward array** lets us filter our arrays based on some conditions. It let us filter arrays just like we filter rows of the pandas’ data frame.

Below we have created a condition that takes all entries of our array and then takes **'x'** key for each entry. It then takes the 0th element of all values of key **'x'** and compares it with value 20. It returns True if the value is greater than or equal to 20.

In the next cell, we have filtered our main array based on the condition that we created below. We have also printed the result of filtering our main array.

In [60]:

```
ak_arr[:, "x", :, 0] >= 20
```

Out[60]:

In [61]:

```
x = ak_arr[ak_arr[:, "x", :, 0] >= 20]
print(x)
print(x["x"])
```

In the below cell, we have created another condition where we check for the last element of the value of key **'z'** of our **awkward array**. We then filter our main array based on the result of this condition.

In [62]:

```
ak_arr[:, "y", "z", :, -1] >= 20
```

Out[62]:

In [70]:

```
x = ak_arr[ak_arr[:, "y", "z", :, -1] >= 20]
print(x)
print(x["x"])
```

Below we have explained another example of filtering an **awkward array**. This time we have filtered our **GeoJSON** array. We have created a condition that checks for the Polygon geometry type of each entry of our array. We filter our main **awkward array** to keep only entries where the geometry type is Polygon. We have then also printed the count of entries which has Polygon geometry and count of total elements of our array for comparison. We can notice from the results that there are 43 Polygon geometries from our total 50 geometries. We have also printed state IDs to compare which state has Polygon geometry and which has Multi-Polygon geometry.

In [89]:

```
x = ak_us_states[ak_us_states["geometry", "type"] == "Polygon"]
print("Number of States with Polygon Geometry : {}".format(len(x["id"])))
print("Number of States with Polygon and MultiPolygon Geometry : {}".format(len(ak_us_states["id"])))
print()
print(x["id"])
print(ak_us_states["id"])
```

In the below cell, we have divided entries of our **GeoJSON awkward array** into two categories based on condition. The first condition check for entries where the population is greater than **1M** and the second condition checks for entries where the population is less than **1M**. We have then printed the count of states where the population is greater than **1M** and where it's less than **1M**.

In [91]:

```
x = ak_us_states[ak_us_states["properties", "population"] > 1e6]
y = ak_us_states[ak_us_states["properties", "population"] <= 1e6]
print("Number of States with Population Greater Than 1M : {}".format(len(x["id"])))
print("Number of States with Population Less Than 1M : {}".format(len(y["id"])))
print()
print(x["properties", "population"])
print(y["properties", "population"])
```

In the below cell we have again filtered our main array to keep only entries where geometry is Polygon. We have then taken all latitudes and longitudes of all elements in separate arrays.

In [124]:

```
x = ak_us_states[ak_us_states["geometry", "type"] == "Polygon"]
latitudes = x["geometry", "coordinates", :, :, :, 0]
longitudes = x["geometry", "coordinates", :, :, :, 1]
print("Polygon State Latitudes : {}".format(latitudes))
print("Polygon State Longitudes : {}".format(longitudes))
```

We can create a copy of any **awkward array** by just calling **copy()** function on it.

In [190]:

```
ak_arr2 = ak.copy(ak_arr)
ak_arr2
```

Out[190]:

We can count the number of elements in an array using **count()** function. If you want to retrieve non-zero entries then use **count_nonzero()** function.

In [82]:

```
ak.count(ak_arr)
```

Out[82]:

In [81]:

```
ak.count(ak_arr["x"]), ak.count(ak_arr["y"]), ak.count(ak_arr["y"]["z"])
```

Out[81]:

In [88]:

```
ak.count(ak_us_states), ak.count(ak_us_states["properties"]["population"])
```

Out[88]:

In [77]:

```
ak.count_nonzero(ak_arr["x"])
```

Out[77]:

We noticed in many of our examples that whenever we retrieve arrays from our main array using indexing or conditions, it maintains the structure of the main array the majority of the time. This can create arrays with multiple levels. We can flatten such arrays with **flatten()** function. We have explained below how we can use it.

In [91]:

```
x = ak.flatten(ak_arr["x"])
print(x)
x = ak.flatten(x)
print(x)
```

The **ravel()** function works exactly like **flatten()** function and can be used to flatten an array.

In [177]:

```
x = ak.ravel(ak_arr["x"])
print(x)
```

In [249]:

```
ak.ravel(ak_arr[:, "x", :, 0])
```

Out[249]:

We can concatenate more than one **awkward array** using **concatenate()** function. Below we have explained with simple examples how we can concatenate arrays.

In [63]:

```
print(ak_arr["x"])
print(ak_arr["y"])
print(ak_arr["y"]["z"])
```

In [179]:

```
x = ak.concatenate((ak_arr["x"], ak_arr["y"]))
print(x)
```

In [180]:

```
x = ak.concatenate((ak_arr["x"], ak_arr["y"]["z"]))
print(x)
```

We can retrieve first elements from our **awkward array** using **firsts()** function. Below we have retrieved the first elements from our awkward array. We have first printed the original array and then all the first elements for comparison.

In [207]:

```
x = ak.firsts(ak_arr)
print("All Elements : ")
for elem in ak_arr:
print(elem)
print("\nFirst Elements")
for elem in x:
print(elem)
```

We can check for **Null / NaNs** in our **awkward array** using **is_none()** function.

In [208]:

```
ak.is_none(ak_arr)
```

Out[208]:

In [334]:

```
print(ak_arr["x"])
print(ak.is_none(ak_arr["x"], axis=1))
```

In [281]:

```
print(ak_arr["x"])
print(ak_arr["y", "z"])
```

The **zip()** function works almost exactly like the python version of it.

In [289]:

```
print("First Array : ", ak_arr["x", 0,0])
print("Second Array : ", ak_arr["y", "z", 0,0])
print("Zipped Array : ", ak.zip((ak_arr["x", 0, 0], ak_arr["y", "z", 0, 0])))
```

We can also perform the conditional operation on the array using **where()** function just like we do with numpy array. Below we have created three arrays that are the same as our first array but elements of them are replaced with ones, zeros, and 100. We'll be using these arrays to explain **where()** function.

In [294]:

```
ak_arr_ones = ak.ones_like(ak_arr)
ak_arr_zeros = ak.zeros_like(ak_arr)
ak_arr_hundred = ak.full_like(ak_arr, 100)
```

Below we have created a simple condition which checks for each entry inside of key **'x'** of our array and returns True if the value is greater than or equal to 20 else returns False. We'll be using this condition inside of **where()** function. We have also printed the output of the condition to make comparison easy.

In [308]:

```
condition = ak_arr["x"] >= 20
for elem in condition:
print(elem)
```

Below we have given the condition which we created in the previous cell as the first input to our **where()** function followed by **'x'** values of the original array and **'x'** values of zeros array.

We have then printed the elements of the resulted array. We can notice that at all places where entries were greater than or equal to 20 were kept and all other entries were set as zero.

In [309]:

```
x = ak.where(condition, ak_arr["x"], ak_arr_zeros["x"])
for elem in x:
print(elem)
```

Below we have created another condition where we are checking for all **'z'** key values for the condition greater than or equal to 10.

Then in the cell below, we have replaced all entries in the array which are greater less than or equal to 10 with hundred.

In [310]:

```
condition = ak_arr[:, "y", "z"] >= 10
for elem in condition:
print(elem)
```

In [311]:

```
x = ak.where(condition, ak_arr[:,"y","z"], ak_arr_hundred[:,"y","z"])
for elem in x:
print(elem)
```

In this section, we'll explain how we can perform simple statistics like mean, variance, standard deviation, addition, etc on our **awkward array** entries.

We can retrieve the minimum element of **awkward array** just like numpy using **min()** method. We can also retrieve minimum elements at a particular axis by providing **axis** value.

Below we have first displayed array, then displayed a minimum of that array followed by minimum elements at the first axis.

In [353]:

```
print("Original Array : ", ak_arr[:, "x",:, 0])
print("Minimum Element : ", ak.min(ak_arr[:, "x",:, 0]))
print("Minimum Elements (Axis=1) : ", ak.min(ak_arr[:, "x",:, 0], axis=1))
```

We can also retrieve an index of the minimum element just like numpy using **argmin()** function.

In [379]:

```
print("Original Array : ", ak_arr[:, "x",:, 0])
print("Minimum Element Index : ", ak.argmin(ak_arr[:, "x",:, 0]))
print("Minimum Elements Index (Axis=1) : ", ak.argmin(ak_arr[:, "x",:, 0], axis=1))
```

Just like minimum, we can retrieve maximum elements using **max()** method and index of maximum elements using **argmax()** method.

In [354]:

```
print("Original Array : ", ak_arr[:, "x",:, 0])
print("Maximum Element : ", ak.max(ak_arr[:, "x",:, 0]))
print("Maximum Elements (Axis=1) : ", ak.max(ak_arr[:, "x",:, 0], axis=1))
```

In [380]:

```
print("Original Array : ", ak_arr[:, "x",:, 0])
print("Maximum Element Index : ", ak.argmax(ak_arr[:, "x",:, 0]))
print("Maximum Elements Index (Axis=1) : ", ak.argmax(ak_arr[:, "x",:, 0], axis=1))
```

We can calculate the mean of an **awkward array** using **mean()** function.

In [368]:

```
print("Original Array : ", ak_arr[:, "x",:, 0])
print("Mean : ", ak.mean(ak_arr[:, "x",:, 0]))
print("Mean (Axis=1) : ", ak.mean(ak_arr[:, "x",:, 0], axis=1))
```

We can add elements of an array using **sum()** function. Just like other functions, we can perform addition at a particular axis as well.

In [369]:

```
print("Original Array : ", ak_arr[:, "x",:, 0])
print("Sum : ", ak.sum(ak_arr[:, "x",:, 0]))
print("Sum (Axis=1) : ", ak.sum(ak_arr[:, "x",:, 0], axis=1))
```

The **std()** function can be used to calculate standard deviation.

In [381]:

```
print("Original Array : ", ak_arr[:, "x",:, 0])
print("Standard Deviation : ", ak.std(ak_arr[:, "x",:, 0]))
print("Standard Deviation (Axis=1) : ", ak.std(ak_arr[:, "x",:, 0], axis=1))
```

The **var()** function can be used to calculate the variance of the array.

In [382]:

```
print("Original Array : ", ak_arr[:, "x",:, 0])
print("Variance : ", ak.var(ak_arr[:, "x",:, 0]))
print("Variance (Axis=1) : ", ak.var(ak_arr[:, "x",:, 0], axis=1))
```

We can sort elements of the array using **sort()** method. We can sort elements in descending order by setting **ascending** argument to **False**.

In [387]:

```
print("Original Array : ", ak_arr[:, "x",:, 0])
print("Sorted Array Descending : ", ak.sort(ak_arr[:, "x",:, 0], ascending=False))
```

This ends our small tutorial explaining how we can create **awkward array** to work with tree-like data structures using numpy-like idioms. Please feel free to let us know your views in the comments section.

**Thank You** for visiting our website. If you like our work, please support us so that we can keep on creating new tutorials/blogs on interesting topics (like AI, ML, Data Science, Python, Digital Marketing, SEO, etc.) that can help people learn new things faster. You can support us by clicking on the **Coffee** button at the bottom right corner. We would appreciate even if you can give a thumbs-up to our article in the comments section below.

If you want to

- provide some suggestions on topic
- share your views
- include some details in tutorial
- suggest some new topics on which we should create tutorials/blogs

Sunny Solanki

Numba @stencil Decorator: Guide to Improve Performance of Code involving Stencil Kernels

Numba @guvectorize Decorator: Generalized Universal Functions

Simple Guide to Understand Pandas Multi-Level / Hierarchical Index

xarray (Dataset) : Multi-Dimensional Labelled Arrays