Updated On : Nov-11,2021 Time Investment : ~30 mins

xarray: Simple Guide to Labeled N-Dimensional Array (DataArray)

Xarray is a python library that lets us create N-dimensional arrays just like numpy but it let us name the dimension of the N-dimensional array as well. Apart from letting us specify a name for dimensions, it let us specify coordinates data for each dimension. It also lets us record some attributes with our n-dimensional array. All the operations that we perform on a numpy array using integer indexing can be performed on xarray array as well but all those operations can be performed using dimension names as well. The code written using xarray becomes more intuitive as we use dimension names instead of integer indexing. The concept of dimensions, coordinates, and attributes will become more clear when we explain arrays with examples below.

Xarray provides two important data structures to store data.

  • DataArray - It's a data structure that is used to represent an N-dimensional array.
  • Dataset - It's a data structure that is used to represent a multi-dimensional array which is dict-like container holding DataArray objects. The DataArray objects are aligned across shared dimensions.

As a part of this tutorial, we'll be discussing only DataArray data structure. We'll explain with simple examples how to create them, perform indexing, normal array operations, and simple statistics. If you have come to learn about Dataset data structure then please feel free to check the below tutorial where we have covered it in detail with examples.

Below we have highlighted important sections of the tutorial to give an overview of the material covered.

Important Sections of Tutorial

  1. DataArray Creation
    • Creation From Numpy Array
    • DataArray with Attributes
    • Creation From Pandas Series
    • Creation From DataFrame
  2. Indexing DataArray
    • Numpy Like Integer Indexing
    • Pandas Like Indexing using .loc Property
    • Integer Indexing using isel() Function
    • Indexing Based on Dimension Data using sel() Function
  3. Normal Array Operations
  4. Simple Statistics

We have imported all necessary libraries at the beginning of our tutorial.

import xarray as xr

print("Xarray Version : {}".format(xr.__version__))
Xarray Version : 0.20.1
import numpy as np

print("Numpy Version : {}".format(np.__version__))
Numpy Version : 1.20.3
import pandas as pd

print("Pandas Version : {}".format(pd.__version__))
Pandas Version : 1.3.4

1. DataArray Creation

In this section, we'll explain various ways of creating a xarray DataArray object. We'll explore different methods available from xarray to create arrays.

Creation From Numpy Array

The first and the most simple way to create a DataArray is by using DataArray() constructor available from xarray. We can provide a numpy array or python list, pandas series object, and pandas dataframe object to this constructor to create DataArray object. Below we have highlighted the signature of DataArray() constructor for reference purposes.


  • DataArray(data, dims=None,coords=None,attrs=None,name=None) - This constructor takes as input numpy array, python list, pandas series or pandas dataframe and creates an instance of DataArray. All other parameters are optional.
    • The dims parameter accepts a list of names specified as strings to define dimension names for each dimension of the array. For 1D array we need to provide a list with one name, for 2D array we need to provide a list with 2 names, for 3D we need to provide a list with 3 names, and so on.
    • The coords parameter accepts dictionary specifying values for each dimension which will be used when indexing an array. The key of the dictionary is the name of the dimension and the value is a list of the same length as the number of values in that dimension. E.g - For 2D array of shape 3x5, we can provide a dictionary with 2 dimensions where one will have a list of 3 values and the other will have a list of 5 values.
    • The attrs parameter accepts a dictionary which will be a list of attributes that we want to attach with this array describing it.
    • The name parameter accepts string specifying the name of the array.

Below we have created our first xarray DataArray using a random numpy array of shape (5,). As it is 1D array, we have given dims parameter with a single name. We have given index name to the single dimension of our array.

arr = xr.DataArray(data=np.random.rand(5), dims=["index"])

arr
<xarray.DataArray (index: 5)>
array([0.97636211, 0.76268531, 0.53293316, 0.38971404, 0.84243048])
Dimensions without coordinates: index
xarray.DataArray
  • index: 5
  • 0.9764 0.7627 0.5329 0.3897 0.8424
    array([0.97636211, 0.76268531, 0.53293316, 0.38971404, 0.84243048])

    Below we have created another example where we have created a 2D DataArray of shape 4x5 using a numpy array of random numbers. We have specified two-dimension names this time as we have 2D array.

    arr = xr.DataArray(data=np.random.rand(4,5), dims=["index", "columns"])
    
    arr
    
    <xarray.DataArray (index: 4, columns: 5)>
    array([[0.48648517, 0.12542794, 0.4972441 , 0.69972002, 0.32564098],
           [0.94822908, 0.87763739, 0.20857022, 0.9199263 , 0.88037042],
           [0.62336462, 0.11829816, 0.27168636, 0.77116992, 0.77662334],
           [0.76880574, 0.53286298, 0.06375732, 0.38386554, 0.04482307]])
    Dimensions without coordinates: index, columns
    xarray.DataArray
    • index: 4
    • columns: 5
    • 0.4865 0.1254 0.4972 0.6997 0.3256 ... 0.5329 0.06376 0.3839 0.04482
      array([[0.48648517, 0.12542794, 0.4972441 , 0.69972002, 0.32564098],
             [0.94822908, 0.87763739, 0.20857022, 0.9199263 , 0.88037042],
             [0.62336462, 0.11829816, 0.27168636, 0.77116992, 0.77662334],
             [0.76880574, 0.53286298, 0.06375732, 0.38386554, 0.04482307]])

      We can access data of our array anytime using data attribute of DataArray object.

      arr.data
      
      array([[0.48648517, 0.12542794, 0.4972441 , 0.69972002, 0.32564098],
             [0.94822908, 0.87763739, 0.20857022, 0.9199263 , 0.88037042],
             [0.62336462, 0.11829816, 0.27168636, 0.77116992, 0.77662334],
             [0.76880574, 0.53286298, 0.06375732, 0.38386554, 0.04482307]])

      Other array attributes like dtype, shape, size, ndim, nbytes which are available for numpy array are also available for DataArray. The nbytes attribute returns a total number of bytes taken by an array which is 160 (20*8) in this case (20 floats elements each of size 8 bytes).

      arr.dtype
      
      dtype('float64')
      arr.nbytes
      
      160
      arr.ndim
      
      2
      arr.shape
      
      (4, 5)
      arr.size
      
      20
      arr.sizes
      
      Frozen({'index': 4, 'columns': 5})

      Below we have created another example explaining how we can create DataArray of 3D shape.

      arr = xr.DataArray(data=np.random.rand(2,3,4), dims=["index", "columns", "items"])
      
      arr
      
      <xarray.DataArray (index: 2, columns: 3, items: 4)>
      array([[[0.47468654, 0.30231721, 0.65516318, 0.92652759],
              [0.4320954 , 0.97064867, 0.63535385, 0.6786689 ],
              [0.85087508, 0.40156857, 0.83255594, 0.67374223]],
      
             [[0.78386596, 0.63289745, 0.78499957, 0.62841028],
              [0.21529929, 0.03341366, 0.12401273, 0.79578469],
              [0.68887276, 0.63861678, 0.19319422, 0.83450311]]])
      Dimensions without coordinates: index, columns, items
      xarray.DataArray
      • index: 2
      • columns: 3
      • items: 4
      • 0.4747 0.3023 0.6552 0.9265 0.4321 ... 0.6889 0.6386 0.1932 0.8345
        array([[[0.47468654, 0.30231721, 0.65516318, 0.92652759],
                [0.4320954 , 0.97064867, 0.63535385, 0.6786689 ],
                [0.85087508, 0.40156857, 0.83255594, 0.67374223]],
        
               [[0.78386596, 0.63289745, 0.78499957, 0.62841028],
                [0.21529929, 0.03341366, 0.12401273, 0.79578469],
                [0.68887276, 0.63861678, 0.19319422, 0.83450311]]])

        In all our previous examples, we only specified dimension names of DataArray but we did not specify coordinates for those dimensions. Now, we'll explain how we can include coordinates for the dimensions of an array.

        Below we have created a 2D DataArray using a random numpy array. We have specified coordinates of our array by providing a dictionary to coords parameter. We have defined two dimensions of data (index, columns). The index represents the first dimension of size 4 and columns represents the second dimension of size 5. We have provided a simple python list of size 4 for index dimension and a list of strings for columns dimension. Apart from specifying coordinates, we have also specified the name of an array using name parameter.

        When we define an array using dimension values like this, we can access subarray and elements of an array using these values for indexing. We'll be explaining how we can use these values to perform indexing in the upcoming section of the tutorial.

        arr1 = xr.DataArray(data=np.random.rand(4,5), dims=['index','columns'],
                            coords={"index": [0,1,2,3], "columns": list("ABCDE")},
                            name="Array1"
                           )
        
        arr1
        
        <xarray.DataArray 'Array1' (index: 4, columns: 5)>
        array([[0.57868507, 0.78605464, 0.90389917, 0.85013705, 0.5950187 ],
               [0.46849765, 0.07263884, 0.20157703, 0.99471873, 0.93488703],
               [0.93084546, 0.24244413, 0.82591196, 0.81989938, 0.68520558],
               [0.24271528, 0.62774479, 0.66185214, 0.41166893, 0.50476117]])
        Coordinates:
          * index    (index) int64 0 1 2 3
          * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
        xarray.DataArray
        'Array1'
        • index: 4
        • columns: 5
        • 0.5787 0.7861 0.9039 0.8501 0.595 ... 0.6277 0.6619 0.4117 0.5048
          array([[0.57868507, 0.78605464, 0.90389917, 0.85013705, 0.5950187 ],
                 [0.46849765, 0.07263884, 0.20157703, 0.99471873, 0.93488703],
                 [0.93084546, 0.24244413, 0.82591196, 0.81989938, 0.68520558],
                 [0.24271528, 0.62774479, 0.66185214, 0.41166893, 0.50476117]])
          • index
            (index)
            int64
            0 1 2 3
            array([0, 1, 2, 3])
          • columns
            (columns)
            <U1
            'A' 'B' 'C' 'D' 'E'
            array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

        Below we have created another DataArray using a random numpy array. This time we have specified index dimension values as a list of strings, unlike our previous examples where values were a list of integers.

        We'll be using these arrays during the indexing section to explain indexing in different ways using these coordinate values.

        arr2 = xr.DataArray(data=np.random.rand(4,5),
                            dims=['index','columns'],
                            coords={"index": ['0','1','2','3'], "columns": list("ABCDE")},
                            name="Array2"
                           )
        
        arr2
        
        <xarray.DataArray 'Array2' (index: 4, columns: 5)>
        array([[0.07511355, 0.60393655, 0.74898288, 0.25581543, 0.79114767],
               [0.73421379, 0.7067142 , 0.24650569, 0.2074986 , 0.41164924],
               [0.50616351, 0.64518492, 0.66608194, 0.16975831, 0.67817385],
               [0.36282616, 0.63435477, 0.56852942, 0.81083044, 0.46026918]])
        Coordinates:
          * index    (index) <U1 '0' '1' '2' '3'
          * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
        xarray.DataArray
        'Array2'
        • index: 4
        • columns: 5
        • 0.07511 0.6039 0.749 0.2558 0.7911 ... 0.6344 0.5685 0.8108 0.4603
          array([[0.07511355, 0.60393655, 0.74898288, 0.25581543, 0.79114767],
                 [0.73421379, 0.7067142 , 0.24650569, 0.2074986 , 0.41164924],
                 [0.50616351, 0.64518492, 0.66608194, 0.16975831, 0.67817385],
                 [0.36282616, 0.63435477, 0.56852942, 0.81083044, 0.46026918]])
          • index
            (index)
            <U1
            '0' '1' '2' '3'
            array(['0', '1', '2', '3'], dtype='<U1')
          • columns
            (columns)
            <U1
            'A' 'B' 'C' 'D' 'E'
            array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

        Below we have created another DataArray of shape 4x5 whose data is a random numpy array. This time we have specified index dimension value as a list of dates. We have used the pandas date_range() function to create a list of dates starting from 2020-1-1.

        arr3 = xr.DataArray(data=np.random.rand(4,5),
                            dims=['index','columns'],
                            coords={"index": pd.date_range(start="2021-01-01", freq="D", periods=4),
                                    "columns": list("ABCDE")},
                            name="Array3"
                           )
        
        arr3
        
        <xarray.DataArray 'Array3' (index: 4, columns: 5)>
        array([[0.39792208, 0.79787484, 0.94760726, 0.01103115, 0.34796905],
               [0.21345645, 0.89753226, 0.00395103, 0.66829528, 0.11539251],
               [0.94518946, 0.21601817, 0.05817   , 0.49979745, 0.89442209],
               [0.00257528, 0.57121823, 0.67385832, 0.87298376, 0.36179141]])
        Coordinates:
          * index    (index) datetime64[ns] 2021-01-01 2021-01-02 2021-01-03 2021-01-04
          * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
        xarray.DataArray
        'Array3'
        • index: 4
        • columns: 5
        • 0.3979 0.7979 0.9476 0.01103 0.348 ... 0.5712 0.6739 0.873 0.3618
          array([[0.39792208, 0.79787484, 0.94760726, 0.01103115, 0.34796905],
                 [0.21345645, 0.89753226, 0.00395103, 0.66829528, 0.11539251],
                 [0.94518946, 0.21601817, 0.05817   , 0.49979745, 0.89442209],
                 [0.00257528, 0.57121823, 0.67385832, 0.87298376, 0.36179141]])
          • index
            (index)
            datetime64[ns]
            2021-01-01 ... 2021-01-04
            array(['2021-01-01T00:00:00.000000000', '2021-01-02T00:00:00.000000000',
                   '2021-01-03T00:00:00.000000000', '2021-01-04T00:00:00.000000000'],
                  dtype='datetime64[ns]')
          • columns
            (columns)
            <U1
            'A' 'B' 'C' 'D' 'E'
            array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

        DataArray with Attributes

        In this section, we have explained how we can create an array with attributes.

        We have created a DataArray of shape 4x5 using a random numpy array. We have specified dimensions and coordinates like we were doing till now. Apart from that, we have provided a dictionary to attrs parameter explaining our dataset. We can describe our data, dimensions, and coordinates in this dictionary.

        arr = xr.DataArray(
                            data=np.random.rand(4,5),
                            dims=['index','columns'],
                            coords={"index": ['0','1','2','3'], "columns": list("ABCDE")},
                            attrs={"index": "X-Dimension of Data",
                                   "columns": "Y-Dimension of Data",
                                   "info": "Pandas DataFrame",
                                   "long_name": "Random Data",
                                   "units": "Unknown"
                                  },
                            name="Array"
                          )
        
        arr
        
        <xarray.DataArray 'Array' (index: 4, columns: 5)>
        array([[0.38733228, 0.23109638, 0.66964265, 0.6708009 , 0.95829975],
               [0.7713564 , 0.1166787 , 0.6483082 , 0.75409353, 0.76900532],
               [0.54839661, 0.72950701, 0.65034097, 0.92334631, 0.70863973],
               [0.0655017 , 0.56941354, 0.59030199, 0.5371372 , 0.45977435]])
        Coordinates:
          * index    (index) <U1 '0' '1' '2' '3'
          * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
        Attributes:
            index:      X-Dimension of Data
            columns:    Y-Dimension of Data
            info:       Pandas DataFrame
            long_name:  Random Data
            units:      Unknown
        xarray.DataArray
        'Array'
        • index: 4
        • columns: 5
        • 0.3873 0.2311 0.6696 0.6708 0.9583 ... 0.5694 0.5903 0.5371 0.4598
          array([[0.38733228, 0.23109638, 0.66964265, 0.6708009 , 0.95829975],
                 [0.7713564 , 0.1166787 , 0.6483082 , 0.75409353, 0.76900532],
                 [0.54839661, 0.72950701, 0.65034097, 0.92334631, 0.70863973],
                 [0.0655017 , 0.56941354, 0.59030199, 0.5371372 , 0.45977435]])
          • index
            (index)
            <U1
            '0' '1' '2' '3'
            array(['0', '1', '2', '3'], dtype='<U1')
          • columns
            (columns)
            <U1
            'A' 'B' 'C' 'D' 'E'
            array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')
        • index :
          X-Dimension of Data
          columns :
          Y-Dimension of Data
          info :
          Pandas DataFrame
          long_name :
          Random Data
          units :
          Unknown

        We can access attributes of our DataArray using attrs attribute anytime.

        arr.attrs
        
        {'index': 'X-Dimension of Data',
         'columns': 'Y-Dimension of Data',
         'info': 'Pandas DataFrame',
         'long_name': 'Random Data',
         'units': 'Unknown'}
        arr.attrs["index"]
        
        'X-Dimension of Data'
        arr.attrs["long_name"]
        
        'Random Data'

        Creation From Pandas Series

        In this section, we have explained how we can create DataArray from the pandas series.

        Below we have first created a pandas series with index and data.

        ser = pd.Series([1,2,3,4], index=list("ABCD"),name="col")
        
        ser
        
        A    1
        B    2
        C    3
        D    4
        Name: col, dtype: int64

        We can create DataArray by just giving pandas series as input. It'll take dimension and coordinate data based on index values of series.

        arr_ser = xr.DataArray(ser)
        
        arr_ser
        
        <xarray.DataArray 'col' (dim_0: 4)>
        array([1, 2, 3, 4])
        Coordinates:
          * dim_0    (dim_0) object 'A' 'B' 'C' 'D'
        xarray.DataArray
        'col'
        • dim_0: 4
        • 1 2 3 4
          array([1, 2, 3, 4])
          • dim_0
            (dim_0)
            object
            'A' 'B' 'C' 'D'
            array(['A', 'B', 'C', 'D'], dtype=object)

        Creation From DataFrame

        In this section, we have explained how we can create DataArray from pandas dataframe.

        Below we have created pandas dataframe with random data. We have also provided dataframe index values and column names.

        df = pd.DataFrame(np.random.rand(4,5), index=[0,1,2,3], columns=list("ABCDE"))
        
        df
        
        A B C D E
        0 0.236578 0.285889 0.370095 0.357964 0.162042
        1 0.324387 0.495267 0.203329 0.352109 0.566172
        2 0.163010 0.381800 0.082297 0.831716 0.842050
        3 0.559487 0.871914 0.340260 0.459081 0.346937

        We can create DataArray from pandas dataframe directly. It'll take dimension and coordinate values based on index and column names of pandas dataframe.

        arr_df = xr.DataArray(df)
        
        arr_df
        
        <xarray.DataArray (dim_0: 4, dim_1: 5)>
        array([[0.23657771, 0.28588863, 0.37009544, 0.35796388, 0.16204199],
               [0.32438665, 0.49526733, 0.20332903, 0.35210868, 0.56617198],
               [0.16300996, 0.38179992, 0.08229747, 0.83171561, 0.8420505 ],
               [0.55948712, 0.87191389, 0.34025972, 0.45908091, 0.34693702]])
        Coordinates:
          * dim_0    (dim_0) int64 0 1 2 3
          * dim_1    (dim_1) object 'A' 'B' 'C' 'D' 'E'
        xarray.DataArray
        • dim_0: 4
        • dim_1: 5
        • 0.2366 0.2859 0.3701 0.358 0.162 ... 0.8719 0.3403 0.4591 0.3469
          array([[0.23657771, 0.28588863, 0.37009544, 0.35796388, 0.16204199],
                 [0.32438665, 0.49526733, 0.20332903, 0.35210868, 0.56617198],
                 [0.16300996, 0.38179992, 0.08229747, 0.83171561, 0.8420505 ],
                 [0.55948712, 0.87191389, 0.34025972, 0.45908091, 0.34693702]])
          • dim_0
            (dim_0)
            int64
            0 1 2 3
            array([0, 1, 2, 3])
          • dim_1
            (dim_1)
            object
            'A' 'B' 'C' 'D' 'E'
            array(['A', 'B', 'C', 'D', 'E'], dtype=object)

        2. Indexing DataArray

        In this section, we'll explain how we can perform indexing operations on xarray DataArray. We can do normal numpy indexing using integers as well as indexing using coordinate values that we specified when creating arrays. We'll be performing indexing on arrays that we created during the array creation section earlier.

        Numpy Like Integer Indexing

        In this section, we have performed normal numpy-like integer indexing on our xarray DataArray.

        Below we have accessed the 0th element of our 2D array which we created earlier.

        arr1[0]
        
        <xarray.DataArray 'Array1' (columns: 5)>
        array([0.57868507, 0.78605464, 0.90389917, 0.85013705, 0.5950187 ])
        Coordinates:
            index    int64 0
          * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
        xarray.DataArray
        'Array1'
        • columns: 5
        • 0.5787 0.7861 0.9039 0.8501 0.595
          array([0.57868507, 0.78605464, 0.90389917, 0.85013705, 0.5950187 ])
          • index
            ()
            int64
            0
            array(0)
          • columns
            (columns)
            <U1
            'A' 'B' 'C' 'D' 'E'
            array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

        Below we have accessed all elements of the first dimension and the 0th elements of the second dimension. This will be like accessing 1 column of 2D array.

        arr1[:, 0]
        
        <xarray.DataArray 'Array1' (index: 4)>
        array([0.57868507, 0.46849765, 0.93084546, 0.24271528])
        Coordinates:
          * index    (index) int64 0 1 2 3
            columns  <U1 'A'
        xarray.DataArray
        'Array1'
        • index: 4
        • 0.5787 0.4685 0.9308 0.2427
          array([0.57868507, 0.46849765, 0.93084546, 0.24271528])
          • index
            (index)
            int64
            0 1 2 3
            array([0, 1, 2, 3])
          • columns
            ()
            <U1
            'A'
            array('A', dtype='<U1')

        Below we have accessed the 0th and 1st row of our data.

        arr1[[0,1]]
        
        <xarray.DataArray 'Array1' (index: 2, columns: 5)>
        array([[0.57868507, 0.78605464, 0.90389917, 0.85013705, 0.5950187 ],
               [0.46849765, 0.07263884, 0.20157703, 0.99471873, 0.93488703]])
        Coordinates:
          * index    (index) int64 0 1
          * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
        xarray.DataArray
        'Array1'
        • index: 2
        • columns: 5
        • 0.5787 0.7861 0.9039 0.8501 0.595 0.4685 0.07264 0.2016 0.9947 0.9349
          array([[0.57868507, 0.78605464, 0.90389917, 0.85013705, 0.5950187 ],
                 [0.46849765, 0.07263884, 0.20157703, 0.99471873, 0.93488703]])
          • index
            (index)
            int64
            0 1
            array([0, 1])
          • columns
            (columns)
            <U1
            'A' 'B' 'C' 'D' 'E'
            array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

        Below we have accessed the 0th and 1st column of our 2D array.

        arr1[:,[0,1]]
        
        <xarray.DataArray 'Array1' (index: 4, columns: 2)>
        array([[0.57868507, 0.78605464],
               [0.46849765, 0.07263884],
               [0.93084546, 0.24244413],
               [0.24271528, 0.62774479]])
        Coordinates:
          * index    (index) int64 0 1 2 3
          * columns  (columns) <U1 'A' 'B'
        xarray.DataArray
        'Array1'
        • index: 4
        • columns: 2
        • 0.5787 0.7861 0.4685 0.07264 0.9308 0.2424 0.2427 0.6277
          array([[0.57868507, 0.78605464],
                 [0.46849765, 0.07263884],
                 [0.93084546, 0.24244413],
                 [0.24271528, 0.62774479]])
          • index
            (index)
            int64
            0 1 2 3
            array([0, 1, 2, 3])
          • columns
            (columns)
            <U1
            'A' 'B'
            array(['A', 'B'], dtype='<U1')

        Below we have accessed 2D array of shape 2x2 from our original 4x5 array.

        arr1[[1,2],[0,1]]
        
        <xarray.DataArray 'Array1' (index: 2, columns: 2)>
        array([[0.46849765, 0.07263884],
               [0.93084546, 0.24244413]])
        Coordinates:
          * index    (index) int64 1 2
          * columns  (columns) <U1 'A' 'B'
        xarray.DataArray
        'Array1'
        • index: 2
        • columns: 2
        • 0.4685 0.07264 0.9308 0.2424
          array([[0.46849765, 0.07263884],
                 [0.93084546, 0.24244413]])
          • index
            (index)
            int64
            1 2
            array([1, 2])
          • columns
            (columns)
            <U1
            'A' 'B'
            array(['A', 'B'], dtype='<U1')

        Pandas Like Indexing using .loc Property

        The xarray DataArray provided loc property which we can use to index arrays as we do with pandas dataframe. The loc property let us specify coordinates values that we had provided when we created the array. The coordinates values can be of any type (string, date, time, etc), not only integer.

        Below we have accessed the first element of the first dimension of our DataArray which we created earlier.

        arr1.loc[0]
        
        <xarray.DataArray 'Array1' (columns: 5)>
        array([0.57868507, 0.78605464, 0.90389917, 0.85013705, 0.5950187 ])
        Coordinates:
            index    int64 0
          * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
        xarray.DataArray
        'Array1'
        • columns: 5
        • 0.5787 0.7861 0.9039 0.8501 0.595
          array([0.57868507, 0.78605464, 0.90389917, 0.85013705, 0.5950187 ])
          • index
            ()
            int64
            0
            array(0)
          • columns
            (columns)
            <U1
            'A' 'B' 'C' 'D' 'E'
            array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

        Below we have accessed the sub-array by using loc property. We have accessed the sub-array which crosses the 0th element of the first dimension and the first two values of the second dimension. We have used string values for indexing DataArray this time.

        arr1.loc[0, ["A","B"]]
        
        <xarray.DataArray 'Array1' (columns: 2)>
        array([0.57868507, 0.78605464])
        Coordinates:
            index    int64 0
          * columns  (columns) <U1 'A' 'B'
        xarray.DataArray
        'Array1'
        • columns: 2
        • 0.5787 0.7861
          array([0.57868507, 0.78605464])
          • index
            ()
            int64
            0
            array(0)
          • columns
            (columns)
            <U1
            'A' 'B'
            array(['A', 'B'], dtype='<U1')

        Below we have accessed the first value of the 0th dimension of our DataArray which we created earlier using loc property. We have a string value to access the value.

        arr2.loc['0']
        
        <xarray.DataArray 'Array2' (columns: 5)>
        array([0.07511355, 0.60393655, 0.74898288, 0.25581543, 0.79114767])
        Coordinates:
            index    <U1 '0'
          * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
        xarray.DataArray
        'Array2'
        • columns: 5
        • 0.07511 0.6039 0.749 0.2558 0.7911
          array([0.07511355, 0.60393655, 0.74898288, 0.25581543, 0.79114767])
          • index
            ()
            <U1
            '0'
            array('0', dtype='<U1')
          • columns
            (columns)
            <U1
            'A' 'B' 'C' 'D' 'E'
            array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

        Below we have accessed another sub-array from our original DataArray using all indices as string values inside of loc property.

        arr2.loc['0', ["A","B","C"]]
        
        <xarray.DataArray 'Array2' (columns: 3)>
        array([0.07511355, 0.60393655, 0.74898288])
        Coordinates:
            index    <U1 '0'
          * columns  (columns) <U1 'A' 'B' 'C'
        xarray.DataArray
        'Array2'
        • columns: 3
        • 0.07511 0.6039 0.749
          array([0.07511355, 0.60393655, 0.74898288])
          • index
            ()
            <U1
            '0'
            array('0', dtype='<U1')
          • columns
            (columns)
            <U1
            'A' 'B' 'C'
            array(['A', 'B', 'C'], dtype='<U1')

        Below we have accessed the sub-array from our array where we had first dimension coordinates specified as date values. We have specified the date value as a string.

        arr3.loc["2021-1-1", ['A','B']]
        
        <xarray.DataArray 'Array3' (columns: 2)>
        array([0.39792208, 0.79787484])
        Coordinates:
            index    datetime64[ns] 2021-01-01
          * columns  (columns) <U1 'A' 'B'
        xarray.DataArray
        'Array3'
        • columns: 2
        • 0.3979 0.7979
          array([0.39792208, 0.79787484])
          • index
            ()
            datetime64[ns]
            2021-01-01
            array('2021-01-01T00:00:00.000000000', dtype='datetime64[ns]')
          • columns
            (columns)
            <U1
            'A' 'B'
            array(['A', 'B'], dtype='<U1')

        Below we have created another example where we are accessing sub-array from our array with date dimension. We have specified list dates as strings this time to access the sub-array.

        arr3.loc[["2021-1-1","2021-1-3"], ['A','B']]
        
        <xarray.DataArray 'Array3' (index: 2, columns: 2)>
        array([[0.39792208, 0.79787484],
               [0.94518946, 0.21601817]])
        Coordinates:
          * index    (index) datetime64[ns] 2021-01-01 2021-01-03
          * columns  (columns) <U1 'A' 'B'
        xarray.DataArray
        'Array3'
        • index: 2
        • columns: 2
        • 0.3979 0.7979 0.9452 0.216
          array([[0.39792208, 0.79787484],
                 [0.94518946, 0.21601817]])
          • index
            (index)
            datetime64[ns]
            2021-01-01 2021-01-03
            array(['2021-01-01T00:00:00.000000000', '2021-01-03T00:00:00.000000000'],
                  dtype='datetime64[ns]')
          • columns
            (columns)
            <U1
            'A' 'B'
            array(['A', 'B'], dtype='<U1')

        In this example, we have accessed sub-array from our date dimension array by providing date dimension coordinates as a list of dates. We have created a list of 3 dates using date_range() function and provided it to filter first dimension values.

        three_days = pd.date_range(start="2021-1-1",periods=3)
        
        arr3.loc[three_days, ["A","B","C"]]
        
        <xarray.DataArray 'Array3' (index: 3, columns: 3)>
        array([[0.39792208, 0.79787484, 0.94760726],
               [0.21345645, 0.89753226, 0.00395103],
               [0.94518946, 0.21601817, 0.05817   ]])
        Coordinates:
          * index    (index) datetime64[ns] 2021-01-01 2021-01-02 2021-01-03
          * columns  (columns) <U1 'A' 'B' 'C'
        xarray.DataArray
        'Array3'
        • index: 3
        • columns: 3
        • 0.3979 0.7979 0.9476 0.2135 0.8975 0.003951 0.9452 0.216 0.05817
          array([[0.39792208, 0.79787484, 0.94760726],
                 [0.21345645, 0.89753226, 0.00395103],
                 [0.94518946, 0.21601817, 0.05817   ]])
          • index
            (index)
            datetime64[ns]
            2021-01-01 2021-01-02 2021-01-03
            array(['2021-01-01T00:00:00.000000000', '2021-01-02T00:00:00.000000000',
                   '2021-01-03T00:00:00.000000000'], dtype='datetime64[ns]')
          • columns
            (columns)
            <U1
            'A' 'B' 'C'
            array(['A', 'B', 'C'], dtype='<U1')

        Integer Indexing using isel() Function

        The xarray DataArray has a method named isel() which lets us specify dimension values as integers and access the sub-array of the original array based on values provided to it.

        In order to perform indexing using isel() method, we can provide dimension names and their values either as a dictionary or we can provide them as if they are parameters of the methods as well. We'll explain with examples below how we can use this method to perform indexing to make things clear.

        Below we have retrieved the 0th element of the 'index' dimension of the array using isel() method. We have provided value to the dimension as if it is a parameter of the method.

        arr1.isel(index=0)
        
        <xarray.DataArray 'Array1' (columns: 5)>
        array([0.57868507, 0.78605464, 0.90389917, 0.85013705, 0.5950187 ])
        Coordinates:
            index    int64 0
          * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
        xarray.DataArray
        'Array1'
        • columns: 5
        • 0.5787 0.7861 0.9039 0.8501 0.595
          array([0.57868507, 0.78605464, 0.90389917, 0.85013705, 0.5950187 ])
          • index
            ()
            int64
            0
            array(0)
          • columns
            (columns)
            <U1
            'A' 'B' 'C' 'D' 'E'
            array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

        Below we have recreated our previous example by providing coordinate value for dimension as a dictionary. This has the same effect as the previous cell.

        arr1.isel({'index':0})
        
        <xarray.DataArray 'Array1' (columns: 5)>
        array([0.57868507, 0.78605464, 0.90389917, 0.85013705, 0.5950187 ])
        Coordinates:
            index    int64 0
          * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
        xarray.DataArray
        'Array1'
        • columns: 5
        • 0.5787 0.7861 0.9039 0.8501 0.595
          array([0.57868507, 0.78605464, 0.90389917, 0.85013705, 0.5950187 ])
          • index
            ()
            int64
            0
            array(0)
          • columns
            (columns)
            <U1
            'A' 'B' 'C' 'D' 'E'
            array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

        Below we have tried to retrieve 2D array of shape 2x4 using isel() method. We have provided two coordinate values for the 'index' dimension and 4 coordinates values for the 'columns' dimension.

        arr1.isel(index=[0,1], columns=[0,1,2,3])
        
        <xarray.DataArray 'Array1' (index: 2, columns: 4)>
        array([[0.57868507, 0.78605464, 0.90389917, 0.85013705],
               [0.46849765, 0.07263884, 0.20157703, 0.99471873]])
        Coordinates:
          * index    (index) int64 0 1
          * columns  (columns) <U1 'A' 'B' 'C' 'D'
        xarray.DataArray
        'Array1'
        • index: 2
        • columns: 4
        • 0.5787 0.7861 0.9039 0.8501 0.4685 0.07264 0.2016 0.9947
          array([[0.57868507, 0.78605464, 0.90389917, 0.85013705],
                 [0.46849765, 0.07263884, 0.20157703, 0.99471873]])
          • index
            (index)
            int64
            0 1
            array([0, 1])
          • columns
            (columns)
            <U1
            'A' 'B' 'C' 'D'
            array(['A', 'B', 'C', 'D'], dtype='<U1')

        Below we have recreated our previous example by providing coordinate values as a dictionary.

        arr1.isel({'index':[0,1], 'columns':[0,1,2,3]})
        
        <xarray.DataArray 'Array1' (index: 2, columns: 4)>
        array([[0.57868507, 0.78605464, 0.90389917, 0.85013705],
               [0.46849765, 0.07263884, 0.20157703, 0.99471873]])
        Coordinates:
          * index    (index) int64 0 1
          * columns  (columns) <U1 'A' 'B' 'C' 'D'
        xarray.DataArray
        'Array1'
        • index: 2
        • columns: 4
        • 0.5787 0.7861 0.9039 0.8501 0.4685 0.07264 0.2016 0.9947
          array([[0.57868507, 0.78605464, 0.90389917, 0.85013705],
                 [0.46849765, 0.07263884, 0.20157703, 0.99471873]])
          • index
            (index)
            int64
            0 1
            array([0, 1])
          • columns
            (columns)
            <U1
            'A' 'B' 'C' 'D'
            array(['A', 'B', 'C', 'D'], dtype='<U1')

        Indexing Based on Dimension Data using sel() Function

        The xarray DataArray provides a method named sel() which works like isel() but it can accept the actual value of coordinates to access sub-arrays rather than integer indexing. We can provide values as either dictionary or as if they are parameters of the method.

        Below we have retrieved a sub-array of shape 3x5 from our original array using sel() method. The 'index' dimension has coordinate values as integers hence we have provided them as integers.

        arr1.sel(index=[0,1,2])
        
        <xarray.DataArray 'Array1' (index: 3, columns: 5)>
        array([[0.57868507, 0.78605464, 0.90389917, 0.85013705, 0.5950187 ],
               [0.46849765, 0.07263884, 0.20157703, 0.99471873, 0.93488703],
               [0.93084546, 0.24244413, 0.82591196, 0.81989938, 0.68520558]])
        Coordinates:
          * index    (index) int64 0 1 2
          * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
        xarray.DataArray
        'Array1'
        • index: 3
        • columns: 5
        • 0.5787 0.7861 0.9039 0.8501 0.595 ... 0.2424 0.8259 0.8199 0.6852
          array([[0.57868507, 0.78605464, 0.90389917, 0.85013705, 0.5950187 ],
                 [0.46849765, 0.07263884, 0.20157703, 0.99471873, 0.93488703],
                 [0.93084546, 0.24244413, 0.82591196, 0.81989938, 0.68520558]])
          • index
            (index)
            int64
            0 1 2
            array([0, 1, 2])
          • columns
            (columns)
            <U1
            'A' 'B' 'C' 'D' 'E'
            array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

        Below we have tried to access the sub-array of shape 3x5 from our original array using sel() method. This time we have provided coordinate values as a list of strings because original arrays have 'index' dimension values stored as integers.

        arr2.sel(index=['0','1','2'])
        
        <xarray.DataArray 'Array2' (index: 3, columns: 5)>
        array([[0.07511355, 0.60393655, 0.74898288, 0.25581543, 0.79114767],
               [0.73421379, 0.7067142 , 0.24650569, 0.2074986 , 0.41164924],
               [0.50616351, 0.64518492, 0.66608194, 0.16975831, 0.67817385]])
        Coordinates:
          * index    (index) <U1 '0' '1' '2'
          * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
        xarray.DataArray
        'Array2'
        • index: 3
        • columns: 5
        • 0.07511 0.6039 0.749 0.2558 0.7911 ... 0.6452 0.6661 0.1698 0.6782
          array([[0.07511355, 0.60393655, 0.74898288, 0.25581543, 0.79114767],
                 [0.73421379, 0.7067142 , 0.24650569, 0.2074986 , 0.41164924],
                 [0.50616351, 0.64518492, 0.66608194, 0.16975831, 0.67817385]])
          • index
            (index)
            <U1
            '0' '1' '2'
            array(['0', '1', '2'], dtype='<U1')
          • columns
            (columns)
            <U1
            'A' 'B' 'C' 'D' 'E'
            array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

        Below we have accessed another 3x3 array from our original array using sel() method. We have provided coordinate values for both dimensions as a list of strings.

        arr2.sel(index=['0','1','2'], columns=['A','C','E'])
        
        <xarray.DataArray 'Array2' (index: 3, columns: 3)>
        array([[0.07511355, 0.74898288, 0.79114767],
               [0.73421379, 0.24650569, 0.41164924],
               [0.50616351, 0.66608194, 0.67817385]])
        Coordinates:
          * index    (index) <U1 '0' '1' '2'
          * columns  (columns) <U1 'A' 'C' 'E'
        xarray.DataArray
        'Array2'
        • index: 3
        • columns: 3
        • 0.07511 0.749 0.7911 0.7342 0.2465 0.4116 0.5062 0.6661 0.6782
          array([[0.07511355, 0.74898288, 0.79114767],
                 [0.73421379, 0.24650569, 0.41164924],
                 [0.50616351, 0.66608194, 0.67817385]])
          • index
            (index)
            <U1
            '0' '1' '2'
            array(['0', '1', '2'], dtype='<U1')
          • columns
            (columns)
            <U1
            'A' 'C' 'E'
            array(['A', 'C', 'E'], dtype='<U1')

        Below we have created another example demonstrating the use of sel() method. We are accessing a sub-array of dimension which holds dates.

        arr3.sel(index=["2021-1-1","2021-1-2", "2021-1-3"], columns=['A','B'])
        
        <xarray.DataArray 'Array3' (index: 3, columns: 2)>
        array([[0.39792208, 0.79787484],
               [0.21345645, 0.89753226],
               [0.94518946, 0.21601817]])
        Coordinates:
          * index    (index) datetime64[ns] 2021-01-01 2021-01-02 2021-01-03
          * columns  (columns) <U1 'A' 'B'
        xarray.DataArray
        'Array3'
        • index: 3
        • columns: 2
        • 0.3979 0.7979 0.2135 0.8975 0.9452 0.216
          array([[0.39792208, 0.79787484],
                 [0.21345645, 0.89753226],
                 [0.94518946, 0.21601817]])
          • index
            (index)
            datetime64[ns]
            2021-01-01 2021-01-02 2021-01-03
            array(['2021-01-01T00:00:00.000000000', '2021-01-02T00:00:00.000000000',
                   '2021-01-03T00:00:00.000000000'], dtype='datetime64[ns]')
          • columns
            (columns)
            <U1
            'A' 'B'
            array(['A', 'B'], dtype='<U1')

        Below we have created one more example demonstrating the use of sel() method. We have created a list of dates using the pandas date_range() function to access the sub-array based on it. We have provided this list of dates to the 'index' dimension of an array. For other 'columns' dimension, we have provided a list of 3 strings.

        three_days = pd.date_range(start="2021-1-1",periods=3)
        
        arr3.sel(index=three_days, columns=['A','B', 'C'])
        
        <xarray.DataArray 'Array3' (index: 3, columns: 3)>
        array([[0.39792208, 0.79787484, 0.94760726],
               [0.21345645, 0.89753226, 0.00395103],
               [0.94518946, 0.21601817, 0.05817   ]])
        Coordinates:
          * index    (index) datetime64[ns] 2021-01-01 2021-01-02 2021-01-03
          * columns  (columns) <U1 'A' 'B' 'C'
        xarray.DataArray
        'Array3'
        • index: 3
        • columns: 3
        • 0.3979 0.7979 0.9476 0.2135 0.8975 0.003951 0.9452 0.216 0.05817
          array([[0.39792208, 0.79787484, 0.94760726],
                 [0.21345645, 0.89753226, 0.00395103],
                 [0.94518946, 0.21601817, 0.05817   ]])
          • index
            (index)
            datetime64[ns]
            2021-01-01 2021-01-02 2021-01-03
            array(['2021-01-01T00:00:00.000000000', '2021-01-02T00:00:00.000000000',
                   '2021-01-03T00:00:00.000000000'], dtype='datetime64[ns]')
          • columns
            (columns)
            <U1
            'A' 'B' 'C'
            array(['A', 'B', 'C'], dtype='<U1')

        3. Normal Array Operations

        In this section, we'll explain some of the commonly performed operations with arrays like addition, multiplication with scalar, transpose, dot product, null elements check, etc. We'll try to explain as many simple operations as possible with simple examples.

        Transpose

        We can retrieve the transpose of an array by calling T attribute on the array or by calling transpose() method on it.

        arr1_transpose = arr1.T # arr1.transpose() works same
        
        arr1_transpose
        
        <xarray.DataArray 'Array1' (columns: 5, index: 4)>
        array([[0.57868507, 0.46849765, 0.93084546, 0.24271528],
               [0.78605464, 0.07263884, 0.24244413, 0.62774479],
               [0.90389917, 0.20157703, 0.82591196, 0.66185214],
               [0.85013705, 0.99471873, 0.81989938, 0.41166893],
               [0.5950187 , 0.93488703, 0.68520558, 0.50476117]])
        Coordinates:
          * index    (index) int64 0 1 2 3
          * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
        xarray.DataArray
        'Array1'
        • columns: 5
        • index: 4
        • 0.5787 0.4685 0.9308 0.2427 0.7861 ... 0.595 0.9349 0.6852 0.5048
          array([[0.57868507, 0.46849765, 0.93084546, 0.24271528],
                 [0.78605464, 0.07263884, 0.24244413, 0.62774479],
                 [0.90389917, 0.20157703, 0.82591196, 0.66185214],
                 [0.85013705, 0.99471873, 0.81989938, 0.41166893],
                 [0.5950187 , 0.93488703, 0.68520558, 0.50476117]])
          • index
            (index)
            int64
            0 1 2 3
            array([0, 1, 2, 3])
          • columns
            (columns)
            <U1
            'A' 'B' 'C' 'D' 'E'
            array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

        We can easily multiply, add, subtract and perform many other operations using scalar.

        arr1 * 10
        
        <xarray.DataArray 'Array1' (index: 4, columns: 5)>
        array([[5.78685073, 7.8605464 , 9.03899168, 8.50137048, 5.95018697],
               [4.68497649, 0.72638835, 2.01577033, 9.94718734, 9.34887027],
               [9.30845463, 2.42444133, 8.25911956, 8.19899378, 6.85205578],
               [2.42715276, 6.27744792, 6.61852136, 4.11668931, 5.04761172]])
        Coordinates:
          * index    (index) int64 0 1 2 3
          * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
        xarray.DataArray
        'Array1'
        • index: 4
        • columns: 5
        • 5.787 7.861 9.039 8.501 5.95 4.685 ... 2.427 6.277 6.619 4.117 5.048
          array([[5.78685073, 7.8605464 , 9.03899168, 8.50137048, 5.95018697],
                 [4.68497649, 0.72638835, 2.01577033, 9.94718734, 9.34887027],
                 [9.30845463, 2.42444133, 8.25911956, 8.19899378, 6.85205578],
                 [2.42715276, 6.27744792, 6.61852136, 4.11668931, 5.04761172]])
          • index
            (index)
            int64
            0 1 2 3
            array([0, 1, 2, 3])
          • columns
            (columns)
            <U1
            'A' 'B' 'C' 'D' 'E'
            array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

        We can add arrays of the same shape only if dimension names and coordinate values match between them.

        That's the reason below we are adding our first array to itself to demonstrate array addition because all our arrays created earlier have different coordinate values.

        arr1 + arr1
        
        <xarray.DataArray 'Array1' (index: 4, columns: 5)>
        array([[1.15737015, 1.57210928, 1.80779834, 1.7002741 , 1.19003739],
               [0.9369953 , 0.14527767, 0.40315407, 1.98943747, 1.86977405],
               [1.86169093, 0.48488827, 1.65182391, 1.63979876, 1.37041116],
               [0.48543055, 1.25548958, 1.32370427, 0.82333786, 1.00952234]])
        Coordinates:
          * index    (index) int64 0 1 2 3
          * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
        xarray.DataArray
        'Array1'
        • index: 4
        • columns: 5
        • 1.157 1.572 1.808 1.7 1.19 0.937 ... 0.4854 1.255 1.324 0.8233 1.01
          array([[1.15737015, 1.57210928, 1.80779834, 1.7002741 , 1.19003739],
                 [0.9369953 , 0.14527767, 0.40315407, 1.98943747, 1.86977405],
                 [1.86169093, 0.48488827, 1.65182391, 1.63979876, 1.37041116],
                 [0.48543055, 1.25548958, 1.32370427, 0.82333786, 1.00952234]])
          • index
            (index)
            int64
            0 1 2 3
            array([0, 1, 2, 3])
          • columns
            (columns)
            <U1
            'A' 'B' 'C' 'D' 'E'
            array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')
        arr + arr2
        
        <xarray.DataArray (index: 4, columns: 5)>
        array([[0.46244583, 0.83503292, 1.41862553, 0.92661634, 1.74944742],
               [1.50557018, 0.8233929 , 0.89481389, 0.96159213, 1.18065456],
               [1.05456012, 1.37469192, 1.31642291, 1.09310462, 1.38681358],
               [0.42832786, 1.20376832, 1.1588314 , 1.34796764, 0.92004354]])
        Coordinates:
          * index    (index) <U1 '0' '1' '2' '3'
          * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
        xarray.DataArray
        • index: 4
        • columns: 5
        • 0.4624 0.835 1.419 0.9266 1.749 ... 0.4283 1.204 1.159 1.348 0.92
          array([[0.46244583, 0.83503292, 1.41862553, 0.92661634, 1.74944742],
                 [1.50557018, 0.8233929 , 0.89481389, 0.96159213, 1.18065456],
                 [1.05456012, 1.37469192, 1.31642291, 1.09310462, 1.38681358],
                 [0.42832786, 1.20376832, 1.1588314 , 1.34796764, 0.92004354]])
          • index
            (index)
            <U1
            '0' '1' '2' '3'
            array(['0', '1', '2', '3'], dtype='<U1')
          • columns
            (columns)
            <U1
            'A' 'B' 'C' 'D' 'E'
            array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

        argmax()

        We can retrieve an index of the maximum element in the array using argmax() method.

        Below we have retrieved the index of the maximum element of one of our arrays.

        max_index = arr1.argmax()
        
        max_index
        
        <xarray.DataArray 'Array1' ()>
        array(8)
        xarray.DataArray
        'Array1'
        • 8
          array(8)

          We can call item() method on an array with one element to access it.

          We can use the same item() method with index to retrieve an element at that index value. Below we are retrieving the maximum element using item() method.

          arr1.item(max_index.item())
          
          0.9947187341846935

          The item() method can also accept a tuple of indices for arrays with more than one dimension to extract the individual element.

          arr1.item((0,0))
          
          0.5786850732755588

          As we had said earlier, the majority of array operations which we perform on a numpy array can be performed on xarray DataArray as well. But the major difference is that DataArray let us perform those operations based on dimension name and axis index both whereas numpy array let us perform an operation based only on-axis.

          Below we have tried to get indices of maximum values across 'index' dimension of an array.

          max_indices = arr1.argmax(dim='index', skipna=True)
          
          max_indices
          
          <xarray.DataArray 'Array1' (columns: 5)>
          array([2, 0, 0, 1, 1])
          Coordinates:
            * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
          xarray.DataArray
          'Array1'
          • columns: 5
          • 2 0 0 1 1
            array([2, 0, 0, 1, 1])
            • columns
              (columns)
              <U1
              'A' 'B' 'C' 'D' 'E'
              array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

          idxmax()

          The idxmax() method works exactly like argmax() method with only difference that index values are returned as floats instead of integers.

          max_indices = arr1.idxmax(dim='index',skipna=True)
          
          max_indices
          
          <xarray.DataArray 'index' (columns: 5)>
          array([2., 0., 0., 1., 1.])
          Coordinates:
            * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
          xarray.DataArray
          'index'
          • columns: 5
          • 2.0 0.0 0.0 1.0 1.0
            array([2., 0., 0., 1., 1.])
            • columns
              (columns)
              <U1
              'A' 'B' 'C' 'D' 'E'
              array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

          argmin()

          The argmin() method can be used to retrieve an index of minimum values.

          Below we have retrieved indices of minimum values across 'columns' dimension.

          There is idxmin() method as well which works exactly like this method.

          min_indices = arr1.argmin(dim='columns')
          
          min_indices
          
          <xarray.DataArray 'Array1' (index: 4)>
          array([0, 1, 1, 0])
          Coordinates:
            * index    (index) int64 0 1 2 3
          xarray.DataArray
          'Array1'
          • index: 4
          • 0 1 1 0
            array([0, 1, 1, 0])
            • index
              (index)
              int64
              0 1 2 3
              array([0, 1, 2, 3])

          isnull()

          The isnull() method detect Nan/None values in array. It returns an array of the same size as the original array with boolean values indicating the presence/absence of Nan/None values.

          arr1.isnull()
          
          <xarray.DataArray 'Array1' (index: 4, columns: 5)>
          array([[False, False, False, False, False],
                 [False, False, False, False, False],
                 [False, False, False, False, False],
                 [False, False, False, False, False]])
          Coordinates:
            * index    (index) int64 0 1 2 3
            * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
          xarray.DataArray
          'Array1'
          • index: 4
          • columns: 5
          • False False False False False False ... False False False False False
            array([[False, False, False, False, False],
                   [False, False, False, False, False],
                   [False, False, False, False, False],
                   [False, False, False, False, False]])
            • index
              (index)
              int64
              0 1 2 3
              array([0, 1, 2, 3])
            • columns
              (columns)
              <U1
              'A' 'B' 'C' 'D' 'E'
              array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

          where()

          The where() method lets us perform the conditional operation on an array. Its first argument is condition and the second argument is a value that should be taken in the case where the condition evaluates to False.

          Below we have printed two of our earlier arrays as a reference as we'll be testing where() function on them.

          arr, arr2
          
          (<xarray.DataArray 'Array' (index: 4, columns: 5)>
           array([[0.38733228, 0.23109638, 0.66964265, 0.6708009 , 0.95829975],
                  [0.7713564 , 0.1166787 , 0.6483082 , 0.75409353, 0.76900532],
                  [0.54839661, 0.72950701, 0.65034097, 0.92334631, 0.70863973],
                  [0.0655017 , 0.56941354, 0.59030199, 0.5371372 , 0.45977435]])
           Coordinates:
             * index    (index) <U1 '0' '1' '2' '3'
             * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
           Attributes:
               index:      X-Dimension of Data
               columns:    Y-Dimension of Data
               info:       Pandas DataFrame
               long_name:  Random Data
               units:      Unknown,
           <xarray.DataArray 'Array2' (index: 4, columns: 5)>
           array([[0.07511355, 0.60393655, 0.74898288, 0.25581543, 0.79114767],
                  [0.73421379, 0.7067142 , 0.24650569, 0.2074986 , 0.41164924],
                  [0.50616351, 0.64518492, 0.66608194, 0.16975831, 0.67817385],
                  [0.36282616, 0.63435477, 0.56852942, 0.81083044, 0.46026918]])
           Coordinates:
             * index    (index) <U1 '0' '1' '2' '3'
             * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E')

          Below we have called where() method on arr array checking for a condition where the value of an array is greater than 0.5. Whenever value is greater than 0.5 take value from arr else take value from arr2.

          arr.where(arr > 0.5, arr2)
          
          <xarray.DataArray 'Array' (index: 4, columns: 5)>
          array([[0.07511355, 0.60393655, 0.66964265, 0.6708009 , 0.95829975],
                 [0.7713564 , 0.7067142 , 0.6483082 , 0.75409353, 0.76900532],
                 [0.54839661, 0.72950701, 0.65034097, 0.92334631, 0.70863973],
                 [0.36282616, 0.56941354, 0.59030199, 0.5371372 , 0.46026918]])
          Coordinates:
            * index    (index) <U1 '0' '1' '2' '3'
            * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
          Attributes:
              index:      X-Dimension of Data
              columns:    Y-Dimension of Data
              info:       Pandas DataFrame
              long_name:  Random Data
              units:      Unknown
          xarray.DataArray
          'Array'
          • index: 4
          • columns: 5
          • 0.07511 0.6039 0.6696 0.6708 0.9583 ... 0.5694 0.5903 0.5371 0.4603
            array([[0.07511355, 0.60393655, 0.66964265, 0.6708009 , 0.95829975],
                   [0.7713564 , 0.7067142 , 0.6483082 , 0.75409353, 0.76900532],
                   [0.54839661, 0.72950701, 0.65034097, 0.92334631, 0.70863973],
                   [0.36282616, 0.56941354, 0.59030199, 0.5371372 , 0.46026918]])
            • index
              (index)
              <U1
              '0' '1' '2' '3'
              array(['0', '1', '2', '3'], dtype='<U1')
            • columns
              (columns)
              <U1
              'A' 'B' 'C' 'D' 'E'
              array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')
          • index :
            X-Dimension of Data
            columns :
            Y-Dimension of Data
            info :
            Pandas DataFrame
            long_name :
            Random Data
            units :
            Unknown

          Below we have explained the usage of where() method with another example.

          arr.where(arr2 > 0.5, arr)
          
          <xarray.DataArray 'Array' (index: 4, columns: 5)>
          array([[0.38733228, 0.23109638, 0.66964265, 0.6708009 , 0.95829975],
                 [0.7713564 , 0.1166787 , 0.6483082 , 0.75409353, 0.76900532],
                 [0.54839661, 0.72950701, 0.65034097, 0.92334631, 0.70863973],
                 [0.0655017 , 0.56941354, 0.59030199, 0.5371372 , 0.45977435]])
          Coordinates:
            * index    (index) <U1 '0' '1' '2' '3'
            * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
          Attributes:
              index:      X-Dimension of Data
              columns:    Y-Dimension of Data
              info:       Pandas DataFrame
              long_name:  Random Data
              units:      Unknown
          xarray.DataArray
          'Array'
          • index: 4
          • columns: 5
          • 0.3873 0.2311 0.6696 0.6708 0.9583 ... 0.5694 0.5903 0.5371 0.4598
            array([[0.38733228, 0.23109638, 0.66964265, 0.6708009 , 0.95829975],
                   [0.7713564 , 0.1166787 , 0.6483082 , 0.75409353, 0.76900532],
                   [0.54839661, 0.72950701, 0.65034097, 0.92334631, 0.70863973],
                   [0.0655017 , 0.56941354, 0.59030199, 0.5371372 , 0.45977435]])
            • index
              (index)
              <U1
              '0' '1' '2' '3'
              array(['0', '1', '2', '3'], dtype='<U1')
            • columns
              (columns)
              <U1
              'A' 'B' 'C' 'D' 'E'
              array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')
          • index :
            X-Dimension of Data
            columns :
            Y-Dimension of Data
            info :
            Pandas DataFrame
            long_name :
            Random Data
            units :
            Unknown

          dot()

          We can perform the dot product of two arrays using dot() method. We can perform dot products based on dimension names as well.

          Below we have performed dot product of two arrays based on dimension 'columns' present in both.

          xr.dot(arr, arr2, dims=["columns"])
          
          <xarray.DataArray (index: 4)>
          array([1.59997017, 1.28164447, 1.81875229, 1.36772713])
          Coordinates:
            * index    (index) <U1 '0' '1' '2' '3'
          xarray.DataArray
          • index: 4
          • 1.6 1.282 1.819 1.368
            array([1.59997017, 1.28164447, 1.81875229, 1.36772713])
            • index
              (index)
              <U1
              '0' '1' '2' '3'
              array(['0', '1', '2', '3'], dtype='<U1')

          Below we have performed dot product of two array-based on dimension 'index' present in both.

          xr.dot(arr, arr2, dims=["index"])
          
          <xarray.DataArray (columns: 5)>
          array([0.89677849, 1.05390316, 1.43014696, 0.92034748, 1.76691796])
          Coordinates:
            * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
          xarray.DataArray
          • columns: 5
          • 0.8968 1.054 1.43 0.9203 1.767
            array([0.89677849, 1.05390316, 1.43014696, 0.92034748, 1.76691796])
            • columns
              (columns)
              <U1
              'A' 'B' 'C' 'D' 'E'
              array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

          Below we have performed bot product without specifying any dimension name.

          xr.dot(arr,arr2)
          
          <xarray.DataArray ()>
          array(6.06809405)
          xarray.DataArray
          • 6.068
            array(6.06809405)

            drop()

            The drop() method can be used to drop values in an array based on dimension and coordinates of dimension. It accepts two values as input. The first value is a list of coordinates and the second value is the dimension name, it then drops those values of dimension which has specified coordinates.

            Below we have dropped values of 'index' dimension who has coordinate values [0,1].

            arr1.drop(labels=[0,1], dim="index")
            
            <xarray.DataArray 'Array1' (index: 2, columns: 5)>
            array([[0.93084546, 0.24244413, 0.82591196, 0.81989938, 0.68520558],
                   [0.24271528, 0.62774479, 0.66185214, 0.41166893, 0.50476117]])
            Coordinates:
              * index    (index) int64 2 3
              * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
            xarray.DataArray
            'Array1'
            • index: 2
            • columns: 5
            • 0.9308 0.2424 0.8259 0.8199 0.6852 0.2427 0.6277 0.6619 0.4117 0.5048
              array([[0.93084546, 0.24244413, 0.82591196, 0.81989938, 0.68520558],
                     [0.24271528, 0.62774479, 0.66185214, 0.41166893, 0.50476117]])
              • index
                (index)
                int64
                2 3
                array([2, 3])
              • columns
                (columns)
                <U1
                'A' 'B' 'C' 'D' 'E'
                array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

            Below we have created another example demonstrating the use of drop() method in cases where coordinate values are not integers.

            arr2.drop(labels=['0','1'], dim="index")
            
            <xarray.DataArray 'Array2' (index: 2, columns: 5)>
            array([[0.50616351, 0.64518492, 0.66608194, 0.16975831, 0.67817385],
                   [0.36282616, 0.63435477, 0.56852942, 0.81083044, 0.46026918]])
            Coordinates:
              * index    (index) <U1 '2' '3'
              * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
            xarray.DataArray
            'Array2'
            • index: 2
            • columns: 5
            • 0.5062 0.6452 0.6661 0.1698 0.6782 0.3628 0.6344 0.5685 0.8108 0.4603
              array([[0.50616351, 0.64518492, 0.66608194, 0.16975831, 0.67817385],
                     [0.36282616, 0.63435477, 0.56852942, 0.81083044, 0.46026918]])
              • index
                (index)
                <U1
                '2' '3'
                array(['2', '3'], dtype='<U1')
              • columns
                (columns)
                <U1
                'A' 'B' 'C' 'D' 'E'
                array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

            Below we have created another example demonstrating the use of drop() method. This time we are dropping values across 'columns' dimension of our array.

            arr1.drop(labels=["D","E"], dim="columns")
            
            <xarray.DataArray 'Array1' (index: 4, columns: 3)>
            array([[0.57868507, 0.78605464, 0.90389917],
                   [0.46849765, 0.07263884, 0.20157703],
                   [0.93084546, 0.24244413, 0.82591196],
                   [0.24271528, 0.62774479, 0.66185214]])
            Coordinates:
              * index    (index) int64 0 1 2 3
              * columns  (columns) <U1 'A' 'B' 'C'
            xarray.DataArray
            'Array1'
            • index: 4
            • columns: 3
            • 0.5787 0.7861 0.9039 0.4685 0.07264 ... 0.8259 0.2427 0.6277 0.6619
              array([[0.57868507, 0.78605464, 0.90389917],
                     [0.46849765, 0.07263884, 0.20157703],
                     [0.93084546, 0.24244413, 0.82591196],
                     [0.24271528, 0.62774479, 0.66185214]])
              • index
                (index)
                int64
                0 1 2 3
                array([0, 1, 2, 3])
              • columns
                (columns)
                <U1
                'A' 'B' 'C'
                array(['A', 'B', 'C'], dtype='<U1')

            drop_isel()

            The drop_isel() method works like the drop method but it let us specify coordinate values as integers instead of original coordinate values which can be of other data type as well.

            The drop_isel() method works like isel() method and lets us specify coordinates of dimension either as a dictionary or as if they are parameters of the method.

            Below we have dropped elements from the array whose coordinate value is 0 for dimension 'index'.

            arr1.drop_isel({"index":0})
            
            <xarray.DataArray 'Array1' (index: 3, columns: 5)>
            array([[0.46849765, 0.07263884, 0.20157703, 0.99471873, 0.93488703],
                   [0.93084546, 0.24244413, 0.82591196, 0.81989938, 0.68520558],
                   [0.24271528, 0.62774479, 0.66185214, 0.41166893, 0.50476117]])
            Coordinates:
              * index    (index) int64 1 2 3
              * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
            xarray.DataArray
            'Array1'
            • index: 3
            • columns: 5
            • 0.4685 0.07264 0.2016 0.9947 0.9349 ... 0.6277 0.6619 0.4117 0.5048
              array([[0.46849765, 0.07263884, 0.20157703, 0.99471873, 0.93488703],
                     [0.93084546, 0.24244413, 0.82591196, 0.81989938, 0.68520558],
                     [0.24271528, 0.62774479, 0.66185214, 0.41166893, 0.50476117]])
              • index
                (index)
                int64
                1 2 3
                array([1, 2, 3])
              • columns
                (columns)
                <U1
                'A' 'B' 'C' 'D' 'E'
                array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

            Below we have dropped elements from the array whose coordinate values are 0 and 1 for dimension 'index'.

            arr1.drop_isel({"index":[0,1]})
            
            <xarray.DataArray 'Array1' (index: 2, columns: 5)>
            array([[0.93084546, 0.24244413, 0.82591196, 0.81989938, 0.68520558],
                   [0.24271528, 0.62774479, 0.66185214, 0.41166893, 0.50476117]])
            Coordinates:
              * index    (index) int64 2 3
              * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
            xarray.DataArray
            'Array1'
            • index: 2
            • columns: 5
            • 0.9308 0.2424 0.8259 0.8199 0.6852 0.2427 0.6277 0.6619 0.4117 0.5048
              array([[0.93084546, 0.24244413, 0.82591196, 0.81989938, 0.68520558],
                     [0.24271528, 0.62774479, 0.66185214, 0.41166893, 0.50476117]])
              • index
                (index)
                int64
                2 3
                array([2, 3])
              • columns
                (columns)
                <U1
                'A' 'B' 'C' 'D' 'E'
                array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

            Below we have created another example demonstrating the use of drop_isel() method to drop values across multiple dimensions of the array.

            arr1.drop_isel({"index":[0,1], "columns": [2,3,4]})
            
            <xarray.DataArray 'Array1' (index: 2, columns: 2)>
            array([[0.93084546, 0.24244413],
                   [0.24271528, 0.62774479]])
            Coordinates:
              * index    (index) int64 2 3
              * columns  (columns) <U1 'A' 'B'
            xarray.DataArray
            'Array1'
            • index: 2
            • columns: 2
            • 0.9308 0.2424 0.2427 0.6277
              array([[0.93084546, 0.24244413],
                     [0.24271528, 0.62774479]])
              • index
                (index)
                int64
                2 3
                array([2, 3])
              • columns
                (columns)
                <U1
                'A' 'B'
                array(['A', 'B'], dtype='<U1')

            drop_sel()

            The drop_sel() method works exactly like drop_isel() with only difference that it accepts original coordinate values of dimension instead of integer values.

            Below we have dropped elements from the array whose coordinate values is [0,1] for dimension 'index' and ["C","D","E"] for dimension 'columns'.

            arr1.drop_sel({"index":[0,1], "columns": ["C","D","E"]})
            
            <xarray.DataArray 'Array1' (index: 2, columns: 2)>
            array([[0.93084546, 0.24244413],
                   [0.24271528, 0.62774479]])
            Coordinates:
              * index    (index) int64 2 3
              * columns  (columns) <U1 'A' 'B'
            xarray.DataArray
            'Array1'
            • index: 2
            • columns: 2
            • 0.9308 0.2424 0.2427 0.6277
              array([[0.93084546, 0.24244413],
                     [0.24271528, 0.62774479]])
              • index
                (index)
                int64
                2 3
                array([2, 3])
              • columns
                (columns)
                <U1
                'A' 'B'
                array(['A', 'B'], dtype='<U1')

            Below we have created another example demonstrating the use of drop_sel() method across multiple dimensions.

            arr2.drop_sel({"index":['0','1'], "columns": ["C","D","E"]})
            
            <xarray.DataArray 'Array2' (index: 2, columns: 2)>
            array([[0.50616351, 0.64518492],
                   [0.36282616, 0.63435477]])
            Coordinates:
              * index    (index) <U1 '2' '3'
              * columns  (columns) <U1 'A' 'B'
            xarray.DataArray
            'Array2'
            • index: 2
            • columns: 2
            • 0.5062 0.6452 0.3628 0.6344
              array([[0.50616351, 0.64518492],
                     [0.36282616, 0.63435477]])
              • index
                (index)
                <U1
                '2' '3'
                array(['2', '3'], dtype='<U1')
              • columns
                (columns)
                <U1
                'A' 'B'
                array(['A', 'B'], dtype='<U1')

            copy()

            We can call copy() method on xarray DataArray to create a copy of it. This will actually create a new array and any modification to this new array won't reflect in an original array from which it was copied because this new array is stored with its own memory.

            arr1_copy = arr1.copy()
            
            arr1_copy
            
            <xarray.DataArray 'Array1' (index: 4, columns: 5)>
            array([[0.57868507, 0.78605464, 0.90389917, 0.85013705, 0.5950187 ],
                   [0.46849765, 0.07263884, 0.20157703, 0.99471873, 0.93488703],
                   [0.93084546, 0.24244413, 0.82591196, 0.81989938, 0.68520558],
                   [0.24271528, 0.62774479, 0.66185214, 0.41166893, 0.50476117]])
            Coordinates:
              * index    (index) int64 0 1 2 3
              * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
            xarray.DataArray
            'Array1'
            • index: 4
            • columns: 5
            • 0.5787 0.7861 0.9039 0.8501 0.595 ... 0.6277 0.6619 0.4117 0.5048
              array([[0.57868507, 0.78605464, 0.90389917, 0.85013705, 0.5950187 ],
                     [0.46849765, 0.07263884, 0.20157703, 0.99471873, 0.93488703],
                     [0.93084546, 0.24244413, 0.82591196, 0.81989938, 0.68520558],
                     [0.24271528, 0.62774479, 0.66185214, 0.41166893, 0.50476117]])
              • index
                (index)
                int64
                0 1 2 3
                array([0, 1, 2, 3])
              • columns
                (columns)
                <U1
                'A' 'B' 'C' 'D' 'E'
                array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

            dropna(dim,how='any')

            The dropna() method let us drop values across dimension of array. It accepts dimension name as the first parameter and method of drop as the second parameter to drop values. There are two different methods to drop values.

            • 'any' - This is default method value. It'll drop entries of dimension where even a single value is Nan.
            • 'all' - It'll drop entries of dimension where all entries are Nan.

            Below we have set a few entries to Nan in our array which we created by copying one of our existing arrays.

            arr1_copy[0,3] = np.nan
            
            arr1_copy[2,4] = np.nan
            
            arr1_copy
            
            <xarray.DataArray 'Array1' (index: 4, columns: 5)>
            array([[0.57868507, 0.78605464, 0.90389917,        nan, 0.5950187 ],
                   [0.46849765, 0.07263884, 0.20157703, 0.99471873, 0.93488703],
                   [0.93084546, 0.24244413, 0.82591196, 0.81989938,        nan],
                   [0.24271528, 0.62774479, 0.66185214, 0.41166893, 0.50476117]])
            Coordinates:
              * index    (index) int64 0 1 2 3
              * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
            xarray.DataArray
            'Array1'
            • index: 4
            • columns: 5
            • 0.5787 0.7861 0.9039 nan 0.595 ... 0.2427 0.6277 0.6619 0.4117 0.5048
              array([[0.57868507, 0.78605464, 0.90389917,        nan, 0.5950187 ],
                     [0.46849765, 0.07263884, 0.20157703, 0.99471873, 0.93488703],
                     [0.93084546, 0.24244413, 0.82591196, 0.81989938,        nan],
                     [0.24271528, 0.62774479, 0.66185214, 0.41166893, 0.50476117]])
              • index
                (index)
                int64
                0 1 2 3
                array([0, 1, 2, 3])
              • columns
                (columns)
                <U1
                'A' 'B' 'C' 'D' 'E'
                array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

            Below we have called dropna() method to drop values across 'index' dimension. It'll drop values where even a single value is Nan.

            arr1_copy.dropna(dim="index")
            
            <xarray.DataArray 'Array1' (index: 2, columns: 5)>
            array([[0.46849765, 0.07263884, 0.20157703, 0.99471873, 0.93488703],
                   [0.24271528, 0.62774479, 0.66185214, 0.41166893, 0.50476117]])
            Coordinates:
              * index    (index) int64 1 3
              * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
            xarray.DataArray
            'Array1'
            • index: 2
            • columns: 5
            • 0.4685 0.07264 0.2016 0.9947 0.9349 0.2427 0.6277 0.6619 0.4117 0.5048
              array([[0.46849765, 0.07263884, 0.20157703, 0.99471873, 0.93488703],
                     [0.24271528, 0.62774479, 0.66185214, 0.41166893, 0.50476117]])
              • index
                (index)
                int64
                1 3
                array([1, 3])
              • columns
                (columns)
                <U1
                'A' 'B' 'C' 'D' 'E'
                array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

            Below we have called dropna() method to drop values across 'columns' dimension.

            arr1_copy.dropna(dim="columns")
            
            <xarray.DataArray 'Array1' (index: 4, columns: 3)>
            array([[0.57868507, 0.78605464, 0.90389917],
                   [0.46849765, 0.07263884, 0.20157703],
                   [0.93084546, 0.24244413, 0.82591196],
                   [0.24271528, 0.62774479, 0.66185214]])
            Coordinates:
              * index    (index) int64 0 1 2 3
              * columns  (columns) <U1 'A' 'B' 'C'
            xarray.DataArray
            'Array1'
            • index: 4
            • columns: 3
            • 0.5787 0.7861 0.9039 0.4685 0.07264 ... 0.8259 0.2427 0.6277 0.6619
              array([[0.57868507, 0.78605464, 0.90389917],
                     [0.46849765, 0.07263884, 0.20157703],
                     [0.93084546, 0.24244413, 0.82591196],
                     [0.24271528, 0.62774479, 0.66185214]])
              • index
                (index)
                int64
                0 1 2 3
                array([0, 1, 2, 3])
              • columns
                (columns)
                <U1
                'A' 'B' 'C'
                array(['A', 'B', 'C'], dtype='<U1')

            fillna(value)

            We can use fillna() method to fill NaN values in the array. It accepts a single value as input which will be replaced in place of all NaNs.

            arr1_copy.fillna(value=9.99999)
            
            <xarray.DataArray 'Array1' (index: 4, columns: 5)>
            array([[0.57868507, 0.78605464, 0.90389917, 9.99999   , 0.5950187 ],
                   [0.46849765, 0.07263884, 0.20157703, 0.99471873, 0.93488703],
                   [0.93084546, 0.24244413, 0.82591196, 0.81989938, 9.99999   ],
                   [0.24271528, 0.62774479, 0.66185214, 0.41166893, 0.50476117]])
            Coordinates:
              * index    (index) int64 0 1 2 3
              * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
            xarray.DataArray
            'Array1'
            • index: 4
            • columns: 5
            • 0.5787 0.7861 0.9039 10.0 0.595 ... 0.2427 0.6277 0.6619 0.4117 0.5048
              array([[0.57868507, 0.78605464, 0.90389917, 9.99999   , 0.5950187 ],
                     [0.46849765, 0.07263884, 0.20157703, 0.99471873, 0.93488703],
                     [0.93084546, 0.24244413, 0.82591196, 0.81989938, 9.99999   ],
                     [0.24271528, 0.62774479, 0.66185214, 0.41166893, 0.50476117]])
              • index
                (index)
                int64
                0 1 2 3
                array([0, 1, 2, 3])
              • columns
                (columns)
                <U1
                'A' 'B' 'C' 'D' 'E'
                array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

            drop_duplicates(dim)

            The drop_duplicate() method let us drop duplicate values across dimension. We need to provide dimension names across which we want to drop duplicates.

            Below we have first created a copy of one of our existing arrays and then we have copied one of the second axis data to another to create duplicate data. We can notice from the dataset printed below that the 1st and 3rd columns have the same data.

            arr1_copy = arr1.copy()
            
            arr1_copy[:, 2] = arr1_copy[:, 0]
            
            arr1_copy
            
            <xarray.DataArray 'Array1' (index: 4, columns: 5)>
            array([[0.57868507, 0.78605464, 0.57868507, 0.85013705, 0.5950187 ],
                   [0.46849765, 0.07263884, 0.46849765, 0.99471873, 0.93488703],
                   [0.93084546, 0.24244413, 0.93084546, 0.81989938, 0.68520558],
                   [0.24271528, 0.62774479, 0.24271528, 0.41166893, 0.50476117]])
            Coordinates:
              * index    (index) int64 0 1 2 3
              * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
            xarray.DataArray
            'Array1'
            • index: 4
            • columns: 5
            • 0.5787 0.7861 0.5787 0.8501 0.595 ... 0.6277 0.2427 0.4117 0.5048
              array([[0.57868507, 0.78605464, 0.57868507, 0.85013705, 0.5950187 ],
                     [0.46849765, 0.07263884, 0.46849765, 0.99471873, 0.93488703],
                     [0.93084546, 0.24244413, 0.93084546, 0.81989938, 0.68520558],
                     [0.24271528, 0.62774479, 0.24271528, 0.41166893, 0.50476117]])
              • index
                (index)
                int64
                0 1 2 3
                array([0, 1, 2, 3])
              • columns
                (columns)
                <U1
                'A' 'B' 'C' 'D' 'E'
                array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')
            arr1_copy.drop_duplicates(dim='columns')
            
            <xarray.DataArray 'Array1' (index: 4, columns: 5)>
            array([[0.57868507, 0.78605464, 0.57868507, 0.85013705, 0.5950187 ],
                   [0.46849765, 0.07263884, 0.46849765, 0.99471873, 0.93488703],
                   [0.93084546, 0.24244413, 0.93084546, 0.81989938, 0.68520558],
                   [0.24271528, 0.62774479, 0.24271528, 0.41166893, 0.50476117]])
            Coordinates:
              * index    (index) int64 0 1 2 3
              * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
            xarray.DataArray
            'Array1'
            • index: 4
            • columns: 5
            • 0.5787 0.7861 0.5787 0.8501 0.595 ... 0.6277 0.2427 0.4117 0.5048
              array([[0.57868507, 0.78605464, 0.57868507, 0.85013705, 0.5950187 ],
                     [0.46849765, 0.07263884, 0.46849765, 0.99471873, 0.93488703],
                     [0.93084546, 0.24244413, 0.93084546, 0.81989938, 0.68520558],
                     [0.24271528, 0.62774479, 0.24271528, 0.41166893, 0.50476117]])
              • index
                (index)
                int64
                0 1 2 3
                array([0, 1, 2, 3])
              • columns
                (columns)
                <U1
                'A' 'B' 'C' 'D' 'E'
                array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

            clip(min,max)

            The clip() method let us restrict values of the array between the minimum and maximum values specified by us. It accepts two values as input where the first value is the minimum value and the second value is the maximum value. It then replaces all values in an array less than the minimum value with minimum value and all values greater than the maximum value with maximum value.

            Below we have tried to restrict values of our array in the range [0.3,0.6] using clip() method.

            arr1.clip(min=0.3, max=0.6)
            
            <xarray.DataArray 'Array1' (index: 4, columns: 5)>
            array([[0.57868507, 0.6       , 0.6       , 0.6       , 0.5950187 ],
                   [0.46849765, 0.3       , 0.3       , 0.6       , 0.6       ],
                   [0.6       , 0.3       , 0.6       , 0.6       , 0.6       ],
                   [0.3       , 0.6       , 0.6       , 0.41166893, 0.50476117]])
            Coordinates:
              * index    (index) int64 0 1 2 3
              * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
            xarray.DataArray
            'Array1'
            • index: 4
            • columns: 5
            • 0.5787 0.6 0.6 0.6 0.595 0.4685 0.3 ... 0.6 0.3 0.6 0.6 0.4117 0.5048
              array([[0.57868507, 0.6       , 0.6       , 0.6       , 0.5950187 ],
                     [0.46849765, 0.3       , 0.3       , 0.6       , 0.6       ],
                     [0.6       , 0.3       , 0.6       , 0.6       , 0.6       ],
                     [0.3       , 0.6       , 0.6       , 0.41166893, 0.50476117]])
              • index
                (index)
                int64
                0 1 2 3
                array([0, 1, 2, 3])
              • columns
                (columns)
                <U1
                'A' 'B' 'C' 'D' 'E'
                array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

            contact(objs, dim)

            We can combine arrays across dimensions using concat() method. It accepts a list of arrays as the first parameter and dimension name as the second parameter. It then combines two arrays across that dimension.

            Below we have combined two arrays across 'index' dimension.

            xr.concat((arr,arr1), dim="index")
            
            <xarray.DataArray 'Array' (index: 8, columns: 5)>
            array([[0.38733228, 0.23109638, 0.66964265, 0.6708009 , 0.95829975],
                   [0.7713564 , 0.1166787 , 0.6483082 , 0.75409353, 0.76900532],
                   [0.54839661, 0.72950701, 0.65034097, 0.92334631, 0.70863973],
                   [0.0655017 , 0.56941354, 0.59030199, 0.5371372 , 0.45977435],
                   [0.57868507, 0.78605464, 0.90389917, 0.85013705, 0.5950187 ],
                   [0.46849765, 0.07263884, 0.20157703, 0.99471873, 0.93488703],
                   [0.93084546, 0.24244413, 0.82591196, 0.81989938, 0.68520558],
                   [0.24271528, 0.62774479, 0.66185214, 0.41166893, 0.50476117]])
            Coordinates:
              * index    (index) object '0' '1' '2' '3' 0 1 2 3
              * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
            Attributes:
                index:      X-Dimension of Data
                columns:    Y-Dimension of Data
                info:       Pandas DataFrame
                long_name:  Random Data
                units:      Unknown
            xarray.DataArray
            'Array'
            • index: 8
            • columns: 5
            • 0.3873 0.2311 0.6696 0.6708 0.9583 ... 0.6277 0.6619 0.4117 0.5048
              array([[0.38733228, 0.23109638, 0.66964265, 0.6708009 , 0.95829975],
                     [0.7713564 , 0.1166787 , 0.6483082 , 0.75409353, 0.76900532],
                     [0.54839661, 0.72950701, 0.65034097, 0.92334631, 0.70863973],
                     [0.0655017 , 0.56941354, 0.59030199, 0.5371372 , 0.45977435],
                     [0.57868507, 0.78605464, 0.90389917, 0.85013705, 0.5950187 ],
                     [0.46849765, 0.07263884, 0.20157703, 0.99471873, 0.93488703],
                     [0.93084546, 0.24244413, 0.82591196, 0.81989938, 0.68520558],
                     [0.24271528, 0.62774479, 0.66185214, 0.41166893, 0.50476117]])
              • index
                (index)
                object
                '0' '1' '2' '3' 0 1 2 3
                array(['0', '1', '2', '3', 0, 1, 2, 3], dtype=object)
              • columns
                (columns)
                <U1
                'A' 'B' 'C' 'D' 'E'
                array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')
            • index :
              X-Dimension of Data
              columns :
              Y-Dimension of Data
              info :
              Pandas DataFrame
              long_name :
              Random Data
              units :
              Unknown

            Below we have combined two arrays across 'columns' dimension.

            xr.concat((arr,arr2), dim="columns")
            
            <xarray.DataArray 'Array' (index: 4, columns: 10)>
            array([[0.38733228, 0.23109638, 0.66964265, 0.6708009 , 0.95829975,
                    0.07511355, 0.60393655, 0.74898288, 0.25581543, 0.79114767],
                   [0.7713564 , 0.1166787 , 0.6483082 , 0.75409353, 0.76900532,
                    0.73421379, 0.7067142 , 0.24650569, 0.2074986 , 0.41164924],
                   [0.54839661, 0.72950701, 0.65034097, 0.92334631, 0.70863973,
                    0.50616351, 0.64518492, 0.66608194, 0.16975831, 0.67817385],
                   [0.0655017 , 0.56941354, 0.59030199, 0.5371372 , 0.45977435,
                    0.36282616, 0.63435477, 0.56852942, 0.81083044, 0.46026918]])
            Coordinates:
              * index    (index) <U1 '0' '1' '2' '3'
              * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E' 'A' 'B' 'C' 'D' 'E'
            Attributes:
                index:      X-Dimension of Data
                columns:    Y-Dimension of Data
                info:       Pandas DataFrame
                long_name:  Random Data
                units:      Unknown
            xarray.DataArray
            'Array'
            • index: 4
            • columns: 10
            • 0.3873 0.2311 0.6696 0.6708 0.9583 ... 0.6344 0.5685 0.8108 0.4603
              array([[0.38733228, 0.23109638, 0.66964265, 0.6708009 , 0.95829975,
                      0.07511355, 0.60393655, 0.74898288, 0.25581543, 0.79114767],
                     [0.7713564 , 0.1166787 , 0.6483082 , 0.75409353, 0.76900532,
                      0.73421379, 0.7067142 , 0.24650569, 0.2074986 , 0.41164924],
                     [0.54839661, 0.72950701, 0.65034097, 0.92334631, 0.70863973,
                      0.50616351, 0.64518492, 0.66608194, 0.16975831, 0.67817385],
                     [0.0655017 , 0.56941354, 0.59030199, 0.5371372 , 0.45977435,
                      0.36282616, 0.63435477, 0.56852942, 0.81083044, 0.46026918]])
              • index
                (index)
                <U1
                '0' '1' '2' '3'
                array(['0', '1', '2', '3'], dtype='<U1')
              • columns
                (columns)
                <U1
                'A' 'B' 'C' 'D' ... 'B' 'C' 'D' 'E'
                array(['A', 'B', 'C', 'D', 'E', 'A', 'B', 'C', 'D', 'E'], dtype='<U1')
            • index :
              X-Dimension of Data
              columns :
              Y-Dimension of Data
              info :
              Pandas DataFrame
              long_name :
              Random Data
              units :
              Unknown

            round()

            The round() method will round float values of an array.

            arr1.round()
            
            <xarray.DataArray 'Array1' (index: 4, columns: 5)>
            array([[1., 1., 1., 1., 1.],
                   [0., 0., 0., 1., 1.],
                   [1., 0., 1., 1., 1.],
                   [0., 1., 1., 0., 1.]])
            Coordinates:
              * index    (index) int64 0 1 2 3
              * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
            xarray.DataArray
            'Array1'
            • index: 4
            • columns: 5
            • 1.0 1.0 1.0 1.0 1.0 0.0 0.0 0.0 ... 1.0 1.0 1.0 0.0 1.0 1.0 0.0 1.0
              array([[1., 1., 1., 1., 1.],
                     [0., 0., 0., 1., 1.],
                     [1., 0., 1., 1., 1.],
                     [0., 1., 1., 0., 1.]])
              • index
                (index)
                int64
                0 1 2 3
                array([0, 1, 2, 3])
              • columns
                (columns)
                <U1
                'A' 'B' 'C' 'D' 'E'
                array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

            4. Simple Statistics

            In this section, we'll explain how we can perform simple statistics like sum, mean, variance, standard deviation, cumulative sum, cumulative product, etc.

            sum(dim=None)

            The sum() function can calculate sum across dimensions. If we don't provide dimension then it'll calculate the sum of all elements of the array.

            Below we have first calculated the sum of all elements of the array. Then in the next cell, we have calculated the sum across 'index' dimension.

            arr1.sum()
            
            <xarray.DataArray 'Array1' ()>
            array(12.33916272)
            xarray.DataArray
            'Array1'
            • 12.34
              array(12.33916272)
              arr1.sum(dim="index")
              
              <xarray.DataArray 'Array1' (columns: 5)>
              array([2.22074346, 1.7288824 , 2.59324029, 3.07642409, 2.71987247])
              Coordinates:
                * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
              xarray.DataArray
              'Array1'
              • columns: 5
              • 2.221 1.729 2.593 3.076 2.72
                array([2.22074346, 1.7288824 , 2.59324029, 3.07642409, 2.71987247])
                • columns
                  (columns)
                  <U1
                  'A' 'B' 'C' 'D' 'E'
                  array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

              min(dim=None)

              The min() function returns minimum values across dimensions.

              Below we have first retrieved the minimum value of the whole array. Then in the next cell, we have retrieved minimum values across 'columns' dimension of the array.

              arr1.min()
              
              <xarray.DataArray 'Array1' ()>
              array(0.07263884)
              xarray.DataArray
              'Array1'
              • 0.07264
                array(0.07263884)
                arr1.min(dim="columns")
                
                <xarray.DataArray 'Array1' (index: 4)>
                array([0.57868507, 0.07263884, 0.24244413, 0.24271528])
                Coordinates:
                  * index    (index) int64 0 1 2 3
                xarray.DataArray
                'Array1'
                • index: 4
                • 0.5787 0.07264 0.2424 0.2427
                  array([0.57868507, 0.07263884, 0.24244413, 0.24271528])
                  • index
                    (index)
                    int64
                    0 1 2 3
                    array([0, 1, 2, 3])

                max(dim=None)

                The max() method works exactly like min() but returns maximum values instead.

                arr1.max()
                
                <xarray.DataArray 'Array1' ()>
                array(0.99471873)
                xarray.DataArray
                'Array1'
                • 0.9947
                  array(0.99471873)
                  arr1.max(dim="index")
                  
                  <xarray.DataArray 'Array1' (columns: 5)>
                  array([0.93084546, 0.78605464, 0.90389917, 0.99471873, 0.93488703])
                  Coordinates:
                    * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
                  xarray.DataArray
                  'Array1'
                  • columns: 5
                  • 0.9308 0.7861 0.9039 0.9947 0.9349
                    array([0.93084546, 0.78605464, 0.90389917, 0.99471873, 0.93488703])
                    • columns
                      (columns)
                      <U1
                      'A' 'B' 'C' 'D' 'E'
                      array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

                  std(dim=None)

                  The std() method helps us calculate a standard deviation across different dimensions of an array. Below we have explained the usage with simple examples.

                  arr1.std()
                  
                  <xarray.DataArray 'Array1' ()>
                  array(0.26712418)
                  xarray.DataArray
                  'Array1'
                  • 0.2671
                    array(0.26712418)
                    arr1.std(dim="columns")
                    
                    <xarray.DataArray 'Array1' (index: 4)>
                    array([0.13275403, 0.37433162, 0.24211231, 0.15232193])
                    Coordinates:
                      * index    (index) int64 0 1 2 3
                    xarray.DataArray
                    'Array1'
                    • index: 4
                    • 0.1328 0.3743 0.2421 0.1523
                      array([0.13275403, 0.37433162, 0.24211231, 0.15232193])
                      • index
                        (index)
                        int64
                        0 1 2 3
                        array([0, 1, 2, 3])

                    var(dim=None)

                    The var() function helps us calculate variance across dimensions of array.

                    arr1.var()
                    
                    <xarray.DataArray 'Array1' ()>
                    array(0.07135533)
                    xarray.DataArray
                    'Array1'
                    • 0.07136
                      array(0.07135533)
                      arr1.var(dim="index")
                      
                      <xarray.DataArray 'Array1' (columns: 5)>
                      array([0.06170627, 0.0821856 , 0.0741555 , 0.04695209, 0.02573124])
                      Coordinates:
                        * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
                      xarray.DataArray
                      'Array1'
                      • columns: 5
                      • 0.06171 0.08219 0.07416 0.04695 0.02573
                        array([0.06170627, 0.0821856 , 0.0741555 , 0.04695209, 0.02573124])
                        • columns
                          (columns)
                          <U1
                          'A' 'B' 'C' 'D' 'E'
                          array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

                      median(dim=None)

                      The median() function helps us find the median across different dimensions of the array.

                      arr1.median()
                      
                      <xarray.DataArray 'Array1' ()>
                      array(0.64479846)
                      xarray.DataArray
                      'Array1'
                      • 0.6448
                        array(0.64479846)
                        arr1.median(dim="index")
                        
                        <xarray.DataArray 'Array1' (columns: 5)>
                        array([0.52359136, 0.43509446, 0.74388205, 0.83501821, 0.64011214])
                        Coordinates:
                          * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
                        xarray.DataArray
                        'Array1'
                        • columns: 5
                        • 0.5236 0.4351 0.7439 0.835 0.6401
                          array([0.52359136, 0.43509446, 0.74388205, 0.83501821, 0.64011214])
                          • columns
                            (columns)
                            <U1
                            'A' 'B' 'C' 'D' 'E'
                            array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

                        count(dim=None)

                        The count() function counts a number of elements across dimensions of the array.

                        arr1.count()
                        
                        <xarray.DataArray 'Array1' ()>
                        array(20)
                        xarray.DataArray
                        'Array1'
                        • 20
                          array(20)
                          arr1.count(dim="index")
                          
                          <xarray.DataArray 'Array1' (columns: 5)>
                          array([4, 4, 4, 4, 4])
                          Coordinates:
                            * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
                          xarray.DataArray
                          'Array1'
                          • columns: 5
                          • 4 4 4 4 4
                            array([4, 4, 4, 4, 4])
                            • columns
                              (columns)
                              <U1
                              'A' 'B' 'C' 'D' 'E'
                              array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

                          cumprod(dim=None)

                          The cumprod() function helps us calculate cumulative product across different dimensions of the array.

                          Below we have first calculated cumulative product across 'index' dimension of the array and then in the next cell, we have calculated cumulative product across 'columns' dimension of the array.

                          arr1.cumprod(dim='index')
                          
                          <xarray.DataArray 'Array1' (index: 4, columns: 5)>
                          array([[0.57868507, 0.78605464, 0.90389917, 0.85013705, 0.5950187 ],
                                 [0.2711126 , 0.05709809, 0.18220531, 0.84564725, 0.55627526],
                                 [0.25236393, 0.0138431 , 0.15048555, 0.69334565, 0.38116291],
                                 [0.06125258, 0.00868993, 0.09959918, 0.28542886, 0.19239624]])
                          Coordinates:
                            * index    (index) int64 0 1 2 3
                            * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
                          xarray.DataArray
                          'Array1'
                          • index: 4
                          • columns: 5
                          • 0.5787 0.7861 0.9039 0.8501 0.595 ... 0.00869 0.0996 0.2854 0.1924
                            array([[0.57868507, 0.78605464, 0.90389917, 0.85013705, 0.5950187 ],
                                   [0.2711126 , 0.05709809, 0.18220531, 0.84564725, 0.55627526],
                                   [0.25236393, 0.0138431 , 0.15048555, 0.69334565, 0.38116291],
                                   [0.06125258, 0.00868993, 0.09959918, 0.28542886, 0.19239624]])
                            • index
                              (index)
                              int64
                              0 1 2 3
                              array([0, 1, 2, 3])
                            • columns
                              (columns)
                              <U1
                              'A' 'B' 'C' 'D' 'E'
                              array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')
                          arr1.cumprod(dim='columns')
                          
                          <xarray.DataArray 'Array1' (index: 4, columns: 5)>
                          array([[0.57868507, 0.45487809, 0.41116392, 0.34954568, 0.20798622],
                                 [0.46849765, 0.03403112, 0.00685989, 0.00682366, 0.00637935],
                                 [0.93084546, 0.22567802, 0.18639018, 0.15282119, 0.10471393],
                                 [0.24271528, 0.15236325, 0.10084194, 0.04151349, 0.0209544 ]])
                          Coordinates:
                            * index    (index) int64 0 1 2 3
                            * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
                          xarray.DataArray
                          'Array1'
                          • index: 4
                          • columns: 5
                          • 0.5787 0.4549 0.4112 0.3495 0.208 ... 0.1524 0.1008 0.04151 0.02095
                            array([[0.57868507, 0.45487809, 0.41116392, 0.34954568, 0.20798622],
                                   [0.46849765, 0.03403112, 0.00685989, 0.00682366, 0.00637935],
                                   [0.93084546, 0.22567802, 0.18639018, 0.15282119, 0.10471393],
                                   [0.24271528, 0.15236325, 0.10084194, 0.04151349, 0.0209544 ]])
                            • index
                              (index)
                              int64
                              0 1 2 3
                              array([0, 1, 2, 3])
                            • columns
                              (columns)
                              <U1
                              'A' 'B' 'C' 'D' 'E'
                              array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

                          cumsum(dim=None)

                          The cumsum() function helps us find cumulative sum across different dimensions of the array and works exactly like cumprod() function.

                          arr1.cumsum(dim='index')
                          
                          <xarray.DataArray 'Array1' (index: 4, columns: 5)>
                          array([[0.57868507, 0.78605464, 0.90389917, 0.85013705, 0.5950187 ],
                                 [1.04718272, 0.85869348, 1.1054762 , 1.84485578, 1.52990572],
                                 [1.97802818, 1.10113761, 1.93138816, 2.66475516, 2.2151113 ],
                                 [2.22074346, 1.7288824 , 2.59324029, 3.07642409, 2.71987247]])
                          Coordinates:
                            * index    (index) int64 0 1 2 3
                            * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
                          xarray.DataArray
                          'Array1'
                          • index: 4
                          • columns: 5
                          • 0.5787 0.7861 0.9039 0.8501 0.595 ... 2.221 1.729 2.593 3.076 2.72
                            array([[0.57868507, 0.78605464, 0.90389917, 0.85013705, 0.5950187 ],
                                   [1.04718272, 0.85869348, 1.1054762 , 1.84485578, 1.52990572],
                                   [1.97802818, 1.10113761, 1.93138816, 2.66475516, 2.2151113 ],
                                   [2.22074346, 1.7288824 , 2.59324029, 3.07642409, 2.71987247]])
                            • index
                              (index)
                              int64
                              0 1 2 3
                              array([0, 1, 2, 3])
                            • columns
                              (columns)
                              <U1
                              'A' 'B' 'C' 'D' 'E'
                              array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')
                          arr1.cumsum(dim='columns')
                          
                          <xarray.DataArray 'Array1' (index: 4, columns: 5)>
                          array([[0.57868507, 1.36473971, 2.26863888, 3.11877593, 3.71379463],
                                 [0.46849765, 0.54113648, 0.74271352, 1.73743225, 2.67231928],
                                 [0.93084546, 1.1732896 , 1.99920155, 2.81910093, 3.50430651],
                                 [0.24271528, 0.87046007, 1.5323122 , 1.94398113, 2.44874231]])
                          Coordinates:
                            * index    (index) int64 0 1 2 3
                            * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
                          xarray.DataArray
                          'Array1'
                          • index: 4
                          • columns: 5
                          • 0.5787 1.365 2.269 3.119 3.714 ... 0.2427 0.8705 1.532 1.944 2.449
                            array([[0.57868507, 1.36473971, 2.26863888, 3.11877593, 3.71379463],
                                   [0.46849765, 0.54113648, 0.74271352, 1.73743225, 2.67231928],
                                   [0.93084546, 1.1732896 , 1.99920155, 2.81910093, 3.50430651],
                                   [0.24271528, 0.87046007, 1.5323122 , 1.94398113, 2.44874231]])
                            • index
                              (index)
                              int64
                              0 1 2 3
                              array([0, 1, 2, 3])
                            • columns
                              (columns)
                              <U1
                              'A' 'B' 'C' 'D' 'E'
                              array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

                          corr()

                          The corr() function helps us find the Pearson correlation coefficient across different dimensions of an array.

                          Below we have calculated the correlation between two arrays of the same dimensions. Then we have calculated correlation across 'index' dimension and 'columns' dimensions respectively. It'll take 1D arrays from 2D arrays based on dimensions and find out the correlation between them.

                          xr.corr(arr, arr2)
                          
                          <xarray.DataArray ()>
                          array(0.0211588)
                          xarray.DataArray
                          • 0.02116
                            array(0.0211588)
                            xr.corr(arr, arr2, dim="index")
                            
                            <xarray.DataArray (columns: 5)>
                            array([ 0.62264745, -0.33522625,  0.16205577, -0.8278302 ,  0.64970906])
                            Coordinates:
                              * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
                            xarray.DataArray
                            • columns: 5
                            • 0.6226 -0.3352 0.1621 -0.8278 0.6497
                              array([ 0.62264745, -0.33522625,  0.16205577, -0.8278302 ,  0.64970906])
                              • columns
                                (columns)
                                <U1
                                'A' 'B' 'C' 'D' 'E'
                                array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')
                            xr.corr(arr, arr2, dim="columns")
                            
                            <xarray.DataArray (index: 4)>
                            array([ 0.43851828, -0.46300051, -0.67088344,  0.71856875])
                            Coordinates:
                              * index    (index) <U1 '0' '1' '2' '3'
                            xarray.DataArray
                            • index: 4
                            • 0.4385 -0.463 -0.6709 0.7186
                              array([ 0.43851828, -0.46300051, -0.67088344,  0.71856875])
                              • index
                                (index)
                                <U1
                                '0' '1' '2' '3'
                                array(['0', '1', '2', '3'], dtype='<U1')

                            rolling()

                            The rolling() method let us perform rolling window functions on xarray DataArray objects. It accepts the dimension at which to apply the rolling window function and window size as input. We can provide dimension name and window size as a dictionary or as if they are parameters of methods as well. After applying the rolling window function, we can calculate various aggregate functions like mean, standard deviation, sum, variance, etc on rolled windows of data.

                            Below we have performed the rolling window function on our array at 'index' dimension with a window size of 2. We have then taken the average of windows.

                            If you want to know how to perform moving window functions in pandas then please feel free to check our tutorial on the same where we cover the topic in detail.

                            rolling_mean = arr3.rolling({"index": 2}).mean()
                            
                            rolling_mean
                            
                            <xarray.DataArray 'Array3' (index: 4, columns: 5)>
                            array([[       nan,        nan,        nan,        nan,        nan],
                                   [0.30568927, 0.84770355, 0.47577914, 0.33966321, 0.23168078],
                                   [0.57932296, 0.55677522, 0.03106052, 0.58404636, 0.5049073 ],
                                   [0.47388237, 0.3936182 , 0.36601416, 0.6863906 , 0.62810675]])
                            Coordinates:
                              * index    (index) datetime64[ns] 2021-01-01 2021-01-02 2021-01-03 2021-01-04
                              * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
                            xarray.DataArray
                            'Array3'
                            • index: 4
                            • columns: 5
                            • nan nan nan nan nan 0.3057 ... 0.4739 0.3936 0.366 0.6864 0.6281
                              array([[       nan,        nan,        nan,        nan,        nan],
                                     [0.30568927, 0.84770355, 0.47577914, 0.33966321, 0.23168078],
                                     [0.57932296, 0.55677522, 0.03106052, 0.58404636, 0.5049073 ],
                                     [0.47388237, 0.3936182 , 0.36601416, 0.6863906 , 0.62810675]])
                              • index
                                (index)
                                datetime64[ns]
                                2021-01-01 ... 2021-01-04
                                array(['2021-01-01T00:00:00.000000000', '2021-01-02T00:00:00.000000000',
                                       '2021-01-03T00:00:00.000000000', '2021-01-04T00:00:00.000000000'],
                                      dtype='datetime64[ns]')
                              • columns
                                (columns)
                                <U1
                                'A' 'B' 'C' 'D' 'E'
                                array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

                            Below we have created exactly the same example as our previous cell but by providing dimension name and window size as parameters of the method.

                            rolling_mean = arr3.rolling(index=2).mean()
                            
                            rolling_mean
                            
                            <xarray.DataArray 'Array3' (index: 4, columns: 5)>
                            array([[       nan,        nan,        nan,        nan,        nan],
                                   [0.30568927, 0.84770355, 0.47577914, 0.33966321, 0.23168078],
                                   [0.57932296, 0.55677522, 0.03106052, 0.58404636, 0.5049073 ],
                                   [0.47388237, 0.3936182 , 0.36601416, 0.6863906 , 0.62810675]])
                            Coordinates:
                              * index    (index) datetime64[ns] 2021-01-01 2021-01-02 2021-01-03 2021-01-04
                              * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
                            xarray.DataArray
                            'Array3'
                            • index: 4
                            • columns: 5
                            • nan nan nan nan nan 0.3057 ... 0.4739 0.3936 0.366 0.6864 0.6281
                              array([[       nan,        nan,        nan,        nan,        nan],
                                     [0.30568927, 0.84770355, 0.47577914, 0.33966321, 0.23168078],
                                     [0.57932296, 0.55677522, 0.03106052, 0.58404636, 0.5049073 ],
                                     [0.47388237, 0.3936182 , 0.36601416, 0.6863906 , 0.62810675]])
                              • index
                                (index)
                                datetime64[ns]
                                2021-01-01 ... 2021-01-04
                                array(['2021-01-01T00:00:00.000000000', '2021-01-02T00:00:00.000000000',
                                       '2021-01-03T00:00:00.000000000', '2021-01-04T00:00:00.000000000'],
                                      dtype='datetime64[ns]')
                              • columns
                                (columns)
                                <U1
                                'A' 'B' 'C' 'D' 'E'
                                array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

                            Below we have created another example where we are performing a rolling window function on our array at 'columns' dimension with a window size of 3. We have then taken standard deviation on data windows.

                            rolling_mean = arr3.rolling({"columns": 3}).std()
                            
                            rolling_mean
                            
                            <xarray.DataArray 'Array3' (index: 4, columns: 5)>
                            array([[       nan,        nan, 0.23202869, 0.41078754, 0.38733677],
                                   [       nan,        nan, 0.38156689, 0.37894448, 0.29049268],
                                   [       nan,        nan, 0.38635193, 0.18272065, 0.34157819],
                                   [       nan,        nan, 0.29524203, 0.12527674, 0.21038439]])
                            Coordinates:
                              * index    (index) datetime64[ns] 2021-01-01 2021-01-02 2021-01-03 2021-01-04
                              * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
                            xarray.DataArray
                            'Array3'
                            • index: 4
                            • columns: 5
                            • nan nan 0.232 0.4108 0.3873 nan ... nan nan 0.2952 0.1253 0.2104
                              array([[       nan,        nan, 0.23202869, 0.41078754, 0.38733677],
                                     [       nan,        nan, 0.38156689, 0.37894448, 0.29049268],
                                     [       nan,        nan, 0.38635193, 0.18272065, 0.34157819],
                                     [       nan,        nan, 0.29524203, 0.12527674, 0.21038439]])
                              • index
                                (index)
                                datetime64[ns]
                                2021-01-01 ... 2021-01-04
                                array(['2021-01-01T00:00:00.000000000', '2021-01-02T00:00:00.000000000',
                                       '2021-01-03T00:00:00.000000000', '2021-01-04T00:00:00.000000000'],
                                      dtype='datetime64[ns]')
                              • columns
                                (columns)
                                <U1
                                'A' 'B' 'C' 'D' 'E'
                                array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

                            resample()

                            The resample() function is useful in situations where the dimension is datetime and we want to resample it at a different frequency than the current one. The resampling can be of two types.

                            1. Up Sampling - We increase sample frequency from lesser to higher. E.g. - daily frequency to monthly.
                            2. Down Sampling - We decrease sample frequency. E.g. - daily to 6 hourly

                            The resample() function takes as input dimension name and new frequency as input to resample xarray DataArra. We can provide dimension and frequency either as a dictionary or as if they are parameters of the method.

                            If you are interested in learning about resampling using pandas then please feel free to check our tutorial where we discuss resampling in detail.

                            Up Sampling

                            Below we have taken one of our arrays which had 'index' dimension with datetime coordinates, we have then resampled the array to 2 days frequency to daily frequency. We have upsampled array. After upsampling, we have called mean() function to replace values in the new array as an average of values.

                            two_day_sampled = arr3.resample({"index": "2D"})
                            
                            two_day_sampled
                            
                            DataArrayResample, grouped over '__resample_dim__'
                            2 groups with labels 2021-01-01, 2021-01-03.
                            for dt, darray in two_day_sampled:
                                print(dt, darray.shape, darray.dims)
                            
                            2021-01-01T00:00:00.000000000 (2, 5) ('index', 'columns')
                            2021-01-03T00:00:00.000000000 (2, 5) ('index', 'columns')
                            
                            two_day_sampled_mean = two_day_sampled.mean()
                            
                            two_day_sampled_mean
                            
                            <xarray.DataArray 'Array3' (index: 2, columns: 5)>
                            array([[0.30568927, 0.84770355, 0.47577914, 0.33966321, 0.23168078],
                                   [0.47388237, 0.3936182 , 0.36601416, 0.6863906 , 0.62810675]])
                            Coordinates:
                              * index    (index) datetime64[ns] 2021-01-01 2021-01-03
                              * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
                            xarray.DataArray
                            'Array3'
                            • index: 2
                            • columns: 5
                            • 0.3057 0.8477 0.4758 0.3397 0.2317 0.4739 0.3936 0.366 0.6864 0.6281
                              array([[0.30568927, 0.84770355, 0.47577914, 0.33966321, 0.23168078],
                                     [0.47388237, 0.3936182 , 0.36601416, 0.6863906 , 0.62810675]])
                              • index
                                (index)
                                datetime64[ns]
                                2021-01-01 2021-01-03
                                array(['2021-01-01T00:00:00.000000000', '2021-01-03T00:00:00.000000000'],
                                      dtype='datetime64[ns]')
                              • columns
                                (columns)
                                <U1
                                'A' 'B' 'C' 'D' 'E'
                                array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

                            Below we have recreated our previous example by providing dimension name and frequency as parameters.

                            two_day_sampled_mean = arr3.resample(index="2D").mean()
                            
                            two_day_sampled_mean
                            
                            <xarray.DataArray 'Array3' (index: 2, columns: 5)>
                            array([[0.30568927, 0.84770355, 0.47577914, 0.33966321, 0.23168078],
                                   [0.47388237, 0.3936182 , 0.36601416, 0.6863906 , 0.62810675]])
                            Coordinates:
                              * index    (index) datetime64[ns] 2021-01-01 2021-01-03
                              * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
                            xarray.DataArray
                            'Array3'
                            • index: 2
                            • columns: 5
                            • 0.3057 0.8477 0.4758 0.3397 0.2317 0.4739 0.3936 0.366 0.6864 0.6281
                              array([[0.30568927, 0.84770355, 0.47577914, 0.33966321, 0.23168078],
                                     [0.47388237, 0.3936182 , 0.36601416, 0.6863906 , 0.62810675]])
                              • index
                                (index)
                                datetime64[ns]
                                2021-01-01 2021-01-03
                                array(['2021-01-01T00:00:00.000000000', '2021-01-03T00:00:00.000000000'],
                                      dtype='datetime64[ns]')
                              • columns
                                (columns)
                                <U1
                                'A' 'B' 'C' 'D' 'E'
                                array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')
                            Down Sampling

                            In this section, we have downsampled our DataArray from daily frequency to 12 hourly frequency. As we have downsampled dataset, it'll introduce many new entries and will also introduce NaNs in the dataset in places we don't have data. The reason behind NaNs is that we have introduced new entries in the dataset which were not present earlier by downsampling. Our data has entry only for 1 day and not every 12 hours. We can fill NaNs by calling some xarray functions like ffill(), bfill(), fillna(), etc.

                            After downsampling, we have taken an average of resampled entries. We have also displayed 'index' dimension data for verification purposes.

                            twelve_hour_sampled_mean = arr3.resample({"index": "12H"}).mean()
                            
                            twelve_hour_sampled_mean
                            
                            <xarray.DataArray 'Array3' (index: 7, columns: 5)>
                            array([[0.39792208, 0.79787484, 0.94760726, 0.01103115, 0.34796905],
                                   [       nan,        nan,        nan,        nan,        nan],
                                   [0.21345645, 0.89753226, 0.00395103, 0.66829528, 0.11539251],
                                   [       nan,        nan,        nan,        nan,        nan],
                                   [0.94518946, 0.21601817, 0.05817   , 0.49979745, 0.89442209],
                                   [       nan,        nan,        nan,        nan,        nan],
                                   [0.00257528, 0.57121823, 0.67385832, 0.87298376, 0.36179141]])
                            Coordinates:
                              * index    (index) datetime64[ns] 2021-01-01 ... 2021-01-04
                              * columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
                            xarray.DataArray
                            'Array3'
                            • index: 7
                            • columns: 5
                            • 0.3979 0.7979 0.9476 0.01103 0.348 ... 0.5712 0.6739 0.873 0.3618
                              array([[0.39792208, 0.79787484, 0.94760726, 0.01103115, 0.34796905],
                                     [       nan,        nan,        nan,        nan,        nan],
                                     [0.21345645, 0.89753226, 0.00395103, 0.66829528, 0.11539251],
                                     [       nan,        nan,        nan,        nan,        nan],
                                     [0.94518946, 0.21601817, 0.05817   , 0.49979745, 0.89442209],
                                     [       nan,        nan,        nan,        nan,        nan],
                                     [0.00257528, 0.57121823, 0.67385832, 0.87298376, 0.36179141]])
                              • index
                                (index)
                                datetime64[ns]
                                2021-01-01 ... 2021-01-04
                                array(['2021-01-01T00:00:00.000000000', '2021-01-01T12:00:00.000000000',
                                       '2021-01-02T00:00:00.000000000', '2021-01-02T12:00:00.000000000',
                                       '2021-01-03T00:00:00.000000000', '2021-01-03T12:00:00.000000000',
                                       '2021-01-04T00:00:00.000000000'], dtype='datetime64[ns]')
                              • columns
                                (columns)
                                <U1
                                'A' 'B' 'C' 'D' 'E'
                                array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')
                            twelve_hour_sampled_mean["index"]
                            
                            <xarray.DataArray 'index' (index: 7)>
                            array(['2021-01-01T00:00:00.000000000', '2021-01-01T12:00:00.000000000',
                                   '2021-01-02T00:00:00.000000000', '2021-01-02T12:00:00.000000000',
                                   '2021-01-03T00:00:00.000000000', '2021-01-03T12:00:00.000000000',
                                   '2021-01-04T00:00:00.000000000'], dtype='datetime64[ns]')
                            Coordinates:
                              * index    (index) datetime64[ns] 2021-01-01 ... 2021-01-04
                            xarray.DataArray
                            'index'
                            • index: 7
                            • 2021-01-01 2021-01-01T12:00:00 ... 2021-01-03T12:00:00 2021-01-04
                              array(['2021-01-01T00:00:00.000000000', '2021-01-01T12:00:00.000000000',
                                     '2021-01-02T00:00:00.000000000', '2021-01-02T12:00:00.000000000',
                                     '2021-01-03T00:00:00.000000000', '2021-01-03T12:00:00.000000000',
                                     '2021-01-04T00:00:00.000000000'], dtype='datetime64[ns]')
                              • index
                                (index)
                                datetime64[ns]
                                2021-01-01 ... 2021-01-04
                                array(['2021-01-01T00:00:00.000000000', '2021-01-01T12:00:00.000000000',
                                       '2021-01-02T00:00:00.000000000', '2021-01-02T12:00:00.000000000',
                                       '2021-01-03T00:00:00.000000000', '2021-01-03T12:00:00.000000000',
                                       '2021-01-04T00:00:00.000000000'], dtype='datetime64[ns]')

                            This ends our small tutorial explaining the DataArray data structure of xarray to hold and manipulate data. Please feel free to let us know your views in the comments section.

                            Reference

                            Sunny Solanki  Sunny Solanki

                            YouTube Subscribe Comfortable Learning through Video Tutorials?

                            If you are more comfortable learning through video tutorials then we would recommend that you subscribe to our YouTube channel.

                            Need Help Stuck Somewhere? Need Help with Coding? Have Doubts About the Topic/Code?

                            When going through coding examples, it's quite common to have doubts and errors.

                            If you have doubts about some code examples or are stuck somewhere when trying our code, send us an email at coderzcolumn07@gmail.com. We'll help you or point you in the direction where you can find a solution to your problem.

                            You can even send us a mail if you are trying something new and need guidance regarding coding. We'll try to respond as soon as possible.

                            Share Views Want to Share Your Views? Have Any Suggestions?

                            If you want to

                            • provide some suggestions on topic
                            • share your views
                            • include some details in tutorial
                            • suggest some new topics on which we should create tutorials/blogs
                            Please feel free to contact us at coderzcolumn07@gmail.com. We appreciate and value your feedbacks. You can also support us with a small contribution by clicking DONATE.


                            Subscribe to Our YouTube Channel

                            YouTube SubScribe

                            Newsletter Subscription