Updated On : Nov-11,2021  xarray, DataArray xarray: Simple Guide to Labeled N-Dimensional Array (DataArray)¶

Xarray is a python library that lets us create N-dimensional arrays just like numpy but it let us name the dimension of the N-dimensional array as well. Apart from letting us specify a name for dimensions, it let us specify coordinates data for each dimension. It also lets us record some attributes with our n-dimensional array. All the operations that we perform on a numpy array using integer indexing can be performed on xarray array as well but all those operations can be performed using dimension names as well. The code written using xarray becomes more intuitive as we use dimension names instead of integer indexing. The concept of dimensions, coordinates, and attributes will become more clear when we explain arrays with examples below.

Xarray provides two important data structures to store data.

• DataArray - It's a data structure that is used to represent an N-dimensional array.
• Dataset - It's a data structure that is used to represent a multi-dimensional array which is dict-like container holding DataArray objects. The DataArray objects are aligned across shared dimensions.

As a part of this tutorial, we'll be discussing only DataArray data structure. We'll explain with simple examples how to create them, perform indexing, normal array operations, and simple statistics. If you have come to learn about Dataset data structure then please feel free to check the below tutorial where we have covered it in detail with examples.

Below we have highlighted important sections of the tutorial to give an overview of the material covered.

Important Sections of Tutorial¶

1. DataArray Creation
• Creation From Numpy Array
• DataArray with Attributes
• Creation From Pandas Series
• Creation From DataFrame
2. Indexing DataArray
• Numpy Like Integer Indexing
• Pandas Like Indexing using .loc Property
• Integer Indexing using isel() Function
• Indexing Based on Dimension Data using sel() Function
3. Normal Array Operations
4. Simple Statistics

We have imported all necessary libraries at the beginning of our tutorial.

In :
import xarray as xr

print("Xarray Version : {}".format(xr.__version__))
Xarray Version : 0.20.1
In :
import numpy as np

print("Numpy Version : {}".format(np.__version__))
Numpy Version : 1.20.3
In :
import pandas as pd

print("Pandas Version : {}".format(pd.__version__))
Pandas Version : 1.3.4

1. DataArray Creation ¶

In this section, we'll explain various ways of creating a xarray DataArray object. We'll explore different methods available from xarray to create arrays.

Creation From Numpy Array¶

The first and the most simple way to create a DataArray is by using DataArray() constructor available from xarray. We can provide a numpy array or python list, pandas series object, and pandas dataframe object to this constructor to create DataArray object. Below we have highlighted the signature of DataArray() constructor for reference purposes.

• DataArray(data, dims=None,coords=None,attrs=None,name=None) - This constructor takes as input numpy array, python list, pandas series or pandas dataframe and creates an instance of DataArray. All other parameters are optional.
• The dims parameter accepts a list of names specified as strings to define dimension names for each dimension of the array. For 1D array we need to provide a list with one name, for 2D array we need to provide a list with 2 names, for 3D we need to provide a list with 3 names, and so on.
• The coords parameter accepts dictionary specifying values for each dimension which will be used when indexing an array. The key of the dictionary is the name of the dimension and the value is a list of the same length as the number of values in that dimension. E.g - For 2D array of shape 3x5, we can provide a dictionary with 2 dimensions where one will have a list of 3 values and the other will have a list of 5 values.
• The attrs parameter accepts a dictionary which will be a list of attributes that we want to attach with this array describing it.
• The name parameter accepts string specifying the name of the array.

Below we have created our first xarray DataArray using a random numpy array of shape (5,). As it is 1D array, we have given dims parameter with a single name. We have given index name to the single dimension of our array.

In :
arr = xr.DataArray(data=np.random.rand(5), dims=["index"])

arr
Out:
<xarray.DataArray (index: 5)>
array([0.97636211, 0.76268531, 0.53293316, 0.38971404, 0.84243048])
Dimensions without coordinates: index
xarray.DataArray
• index: 5
• 0.9764 0.7627 0.5329 0.3897 0.8424
array([0.97636211, 0.76268531, 0.53293316, 0.38971404, 0.84243048])

Below we have created another example where we have created a 2D DataArray of shape 4x5 using a numpy array of random numbers. We have specified two-dimension names this time as we have 2D array.

In :
arr = xr.DataArray(data=np.random.rand(4,5), dims=["index", "columns"])

arr

Out:
<xarray.DataArray (index: 4, columns: 5)>
array([[0.48648517, 0.12542794, 0.4972441 , 0.69972002, 0.32564098],
[0.94822908, 0.87763739, 0.20857022, 0.9199263 , 0.88037042],
[0.62336462, 0.11829816, 0.27168636, 0.77116992, 0.77662334],
[0.76880574, 0.53286298, 0.06375732, 0.38386554, 0.04482307]])
Dimensions without coordinates: index, columns
xarray.DataArray
• index: 4
• columns: 5
• 0.4865 0.1254 0.4972 0.6997 0.3256 ... 0.5329 0.06376 0.3839 0.04482
array([[0.48648517, 0.12542794, 0.4972441 , 0.69972002, 0.32564098],
[0.94822908, 0.87763739, 0.20857022, 0.9199263 , 0.88037042],
[0.62336462, 0.11829816, 0.27168636, 0.77116992, 0.77662334],
[0.76880574, 0.53286298, 0.06375732, 0.38386554, 0.04482307]])

We can access data of our array anytime using data attribute of DataArray object.

In :
arr.data

Out:
array([[0.48648517, 0.12542794, 0.4972441 , 0.69972002, 0.32564098],
[0.94822908, 0.87763739, 0.20857022, 0.9199263 , 0.88037042],
[0.62336462, 0.11829816, 0.27168636, 0.77116992, 0.77662334],
[0.76880574, 0.53286298, 0.06375732, 0.38386554, 0.04482307]])

Other array attributes like dtype, shape, size, ndim, nbytes which are available for numpy array are also available for DataArray. The nbytes attribute returns a total number of bytes taken by an array which is 160 (20*8) in this case (20 floats elements each of size 8 bytes).

In :
arr.dtype

Out:
dtype('float64')
In :
arr.nbytes

Out:
160
In :
arr.ndim

Out:
2
In :
arr.shape

Out:
(4, 5)
In :
arr.size

Out:
20
In :
arr.sizes

Out:
Frozen({'index': 4, 'columns': 5})

Below we have created another example explaining how we can create DataArray of 3D shape.

In :
arr = xr.DataArray(data=np.random.rand(2,3,4), dims=["index", "columns", "items"])

arr

Out:
<xarray.DataArray (index: 2, columns: 3, items: 4)>
array([[[0.47468654, 0.30231721, 0.65516318, 0.92652759],
[0.4320954 , 0.97064867, 0.63535385, 0.6786689 ],
[0.85087508, 0.40156857, 0.83255594, 0.67374223]],

[[0.78386596, 0.63289745, 0.78499957, 0.62841028],
[0.21529929, 0.03341366, 0.12401273, 0.79578469],
[0.68887276, 0.63861678, 0.19319422, 0.83450311]]])
Dimensions without coordinates: index, columns, items
xarray.DataArray
• index: 2
• columns: 3
• items: 4
• 0.4747 0.3023 0.6552 0.9265 0.4321 ... 0.6889 0.6386 0.1932 0.8345
array([[[0.47468654, 0.30231721, 0.65516318, 0.92652759],
[0.4320954 , 0.97064867, 0.63535385, 0.6786689 ],
[0.85087508, 0.40156857, 0.83255594, 0.67374223]],

[[0.78386596, 0.63289745, 0.78499957, 0.62841028],
[0.21529929, 0.03341366, 0.12401273, 0.79578469],
[0.68887276, 0.63861678, 0.19319422, 0.83450311]]])

In all our previous examples, we only specified dimension names of DataArray but we did not specify coordinates for those dimensions. Now, we'll explain how we can include coordinates for the dimensions of an array.

Below we have created a 2D DataArray using a random numpy array. We have specified coordinates of our array by providing a dictionary to coords parameter. We have defined two dimensions of data (index, columns). The index represents the first dimension of size 4 and columns represents the second dimension of size 5. We have provided a simple python list of size 4 for index dimension and a list of strings for columns dimension. Apart from specifying coordinates, we have also specified the name of an array using name parameter.

When we define an array using dimension values like this, we can access subarray and elements of an array using these values for indexing. We'll be explaining how we can use these values to perform indexing in the upcoming section of the tutorial.

In :
arr1 = xr.DataArray(data=np.random.rand(4,5), dims=['index','columns'],
coords={"index": [0,1,2,3], "columns": list("ABCDE")},
name="Array1"
)

arr1

Out:
<xarray.DataArray 'Array1' (index: 4, columns: 5)>
array([[0.57868507, 0.78605464, 0.90389917, 0.85013705, 0.5950187 ],
[0.46849765, 0.07263884, 0.20157703, 0.99471873, 0.93488703],
[0.93084546, 0.24244413, 0.82591196, 0.81989938, 0.68520558],
[0.24271528, 0.62774479, 0.66185214, 0.41166893, 0.50476117]])
Coordinates:
* index    (index) int64 0 1 2 3
* columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
xarray.DataArray
'Array1'
• index: 4
• columns: 5
• 0.5787 0.7861 0.9039 0.8501 0.595 ... 0.6277 0.6619 0.4117 0.5048
array([[0.57868507, 0.78605464, 0.90389917, 0.85013705, 0.5950187 ],
[0.46849765, 0.07263884, 0.20157703, 0.99471873, 0.93488703],
[0.93084546, 0.24244413, 0.82591196, 0.81989938, 0.68520558],
[0.24271528, 0.62774479, 0.66185214, 0.41166893, 0.50476117]])
• index
(index)
int64
0 1 2 3
array([0, 1, 2, 3])
• columns
(columns)
<U1
'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

Below we have created another DataArray using a random numpy array. This time we have specified index dimension values as a list of strings, unlike our previous examples where values were a list of integers.

We'll be using these arrays during the indexing section to explain indexing in different ways using these coordinate values.

In :
arr2 = xr.DataArray(data=np.random.rand(4,5),
dims=['index','columns'],
coords={"index": ['0','1','2','3'], "columns": list("ABCDE")},
name="Array2"
)

arr2

Out:
<xarray.DataArray 'Array2' (index: 4, columns: 5)>
array([[0.07511355, 0.60393655, 0.74898288, 0.25581543, 0.79114767],
[0.73421379, 0.7067142 , 0.24650569, 0.2074986 , 0.41164924],
[0.50616351, 0.64518492, 0.66608194, 0.16975831, 0.67817385],
[0.36282616, 0.63435477, 0.56852942, 0.81083044, 0.46026918]])
Coordinates:
* index    (index) <U1 '0' '1' '2' '3'
* columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
xarray.DataArray
'Array2'
• index: 4
• columns: 5
• 0.07511 0.6039 0.749 0.2558 0.7911 ... 0.6344 0.5685 0.8108 0.4603
array([[0.07511355, 0.60393655, 0.74898288, 0.25581543, 0.79114767],
[0.73421379, 0.7067142 , 0.24650569, 0.2074986 , 0.41164924],
[0.50616351, 0.64518492, 0.66608194, 0.16975831, 0.67817385],
[0.36282616, 0.63435477, 0.56852942, 0.81083044, 0.46026918]])
• index
(index)
<U1
'0' '1' '2' '3'
array(['0', '1', '2', '3'], dtype='<U1')
• columns
(columns)
<U1
'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

Below we have created another DataArray of shape 4x5 whose data is a random numpy array. This time we have specified index dimension value as a list of dates. We have used the pandas date_range() function to create a list of dates starting from 2020-1-1.

In :
arr3 = xr.DataArray(data=np.random.rand(4,5),
dims=['index','columns'],
coords={"index": pd.date_range(start="2021-01-01", freq="D", periods=4),
"columns": list("ABCDE")},
name="Array3"
)

arr3

Out:
<xarray.DataArray 'Array3' (index: 4, columns: 5)>
array([[0.39792208, 0.79787484, 0.94760726, 0.01103115, 0.34796905],
[0.21345645, 0.89753226, 0.00395103, 0.66829528, 0.11539251],
[0.94518946, 0.21601817, 0.05817   , 0.49979745, 0.89442209],
[0.00257528, 0.57121823, 0.67385832, 0.87298376, 0.36179141]])
Coordinates:
* index    (index) datetime64[ns] 2021-01-01 2021-01-02 2021-01-03 2021-01-04
* columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
xarray.DataArray
'Array3'
• index: 4
• columns: 5
• 0.3979 0.7979 0.9476 0.01103 0.348 ... 0.5712 0.6739 0.873 0.3618
array([[0.39792208, 0.79787484, 0.94760726, 0.01103115, 0.34796905],
[0.21345645, 0.89753226, 0.00395103, 0.66829528, 0.11539251],
[0.94518946, 0.21601817, 0.05817   , 0.49979745, 0.89442209],
[0.00257528, 0.57121823, 0.67385832, 0.87298376, 0.36179141]])
• index
(index)
datetime64[ns]
2021-01-01 ... 2021-01-04
array(['2021-01-01T00:00:00.000000000', '2021-01-02T00:00:00.000000000',
'2021-01-03T00:00:00.000000000', '2021-01-04T00:00:00.000000000'],
dtype='datetime64[ns]')
• columns
(columns)
<U1
'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

DataArray with Attributes¶

In this section, we have explained how we can create an array with attributes.

We have created a DataArray of shape 4x5 using a random numpy array. We have specified dimensions and coordinates like we were doing till now. Apart from that, we have provided a dictionary to attrs parameter explaining our dataset. We can describe our data, dimensions, and coordinates in this dictionary.

In :
arr = xr.DataArray(
data=np.random.rand(4,5),
dims=['index','columns'],
coords={"index": ['0','1','2','3'], "columns": list("ABCDE")},
attrs={"index": "X-Dimension of Data",
"columns": "Y-Dimension of Data",
"info": "Pandas DataFrame",
"long_name": "Random Data",
"units": "Unknown"
},
name="Array"
)

arr

Out:
<xarray.DataArray 'Array' (index: 4, columns: 5)>
array([[0.38733228, 0.23109638, 0.66964265, 0.6708009 , 0.95829975],
[0.7713564 , 0.1166787 , 0.6483082 , 0.75409353, 0.76900532],
[0.54839661, 0.72950701, 0.65034097, 0.92334631, 0.70863973],
[0.0655017 , 0.56941354, 0.59030199, 0.5371372 , 0.45977435]])
Coordinates:
* index    (index) <U1 '0' '1' '2' '3'
* columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
Attributes:
index:      X-Dimension of Data
columns:    Y-Dimension of Data
info:       Pandas DataFrame
long_name:  Random Data
units:      Unknown
xarray.DataArray
'Array'
• index: 4
• columns: 5
• 0.3873 0.2311 0.6696 0.6708 0.9583 ... 0.5694 0.5903 0.5371 0.4598
array([[0.38733228, 0.23109638, 0.66964265, 0.6708009 , 0.95829975],
[0.7713564 , 0.1166787 , 0.6483082 , 0.75409353, 0.76900532],
[0.54839661, 0.72950701, 0.65034097, 0.92334631, 0.70863973],
[0.0655017 , 0.56941354, 0.59030199, 0.5371372 , 0.45977435]])
• index
(index)
<U1
'0' '1' '2' '3'
array(['0', '1', '2', '3'], dtype='<U1')
• columns
(columns)
<U1
'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')
• index :
X-Dimension of Data
columns :
Y-Dimension of Data
info :
Pandas DataFrame
long_name :
Random Data
units :
Unknown

We can access attributes of our DataArray using attrs attribute anytime.

In :
arr.attrs

Out:
{'index': 'X-Dimension of Data',
'columns': 'Y-Dimension of Data',
'info': 'Pandas DataFrame',
'long_name': 'Random Data',
'units': 'Unknown'}
In :
arr.attrs["index"]

Out:
'X-Dimension of Data'
In :
arr.attrs["long_name"]

Out:
'Random Data'

Creation From Pandas Series¶

In this section, we have explained how we can create DataArray from the pandas series.

Below we have first created a pandas series with index and data.

In :
ser = pd.Series([1,2,3,4], index=list("ABCD"),name="col")

ser

Out:
A    1
B    2
C    3
D    4
Name: col, dtype: int64

We can create DataArray by just giving pandas series as input. It'll take dimension and coordinate data based on index values of series.

In :
arr_ser = xr.DataArray(ser)

arr_ser

Out:
<xarray.DataArray 'col' (dim_0: 4)>
array([1, 2, 3, 4])
Coordinates:
* dim_0    (dim_0) object 'A' 'B' 'C' 'D'
xarray.DataArray
'col'
• dim_0: 4
• 1 2 3 4
array([1, 2, 3, 4])
• dim_0
(dim_0)
object
'A' 'B' 'C' 'D'
array(['A', 'B', 'C', 'D'], dtype=object)

Creation From DataFrame¶

In this section, we have explained how we can create DataArray from pandas dataframe.

Below we have created pandas dataframe with random data. We have also provided dataframe index values and column names.

In :
df = pd.DataFrame(np.random.rand(4,5), index=[0,1,2,3], columns=list("ABCDE"))

df

Out:
A B C D E
0 0.236578 0.285889 0.370095 0.357964 0.162042
1 0.324387 0.495267 0.203329 0.352109 0.566172
2 0.163010 0.381800 0.082297 0.831716 0.842050
3 0.559487 0.871914 0.340260 0.459081 0.346937

We can create DataArray from pandas dataframe directly. It'll take dimension and coordinate values based on index and column names of pandas dataframe.

In :
arr_df = xr.DataArray(df)

arr_df

Out:
<xarray.DataArray (dim_0: 4, dim_1: 5)>
array([[0.23657771, 0.28588863, 0.37009544, 0.35796388, 0.16204199],
[0.32438665, 0.49526733, 0.20332903, 0.35210868, 0.56617198],
[0.16300996, 0.38179992, 0.08229747, 0.83171561, 0.8420505 ],
[0.55948712, 0.87191389, 0.34025972, 0.45908091, 0.34693702]])
Coordinates:
* dim_0    (dim_0) int64 0 1 2 3
* dim_1    (dim_1) object 'A' 'B' 'C' 'D' 'E'
xarray.DataArray
• dim_0: 4
• dim_1: 5
• 0.2366 0.2859 0.3701 0.358 0.162 ... 0.8719 0.3403 0.4591 0.3469
array([[0.23657771, 0.28588863, 0.37009544, 0.35796388, 0.16204199],
[0.32438665, 0.49526733, 0.20332903, 0.35210868, 0.56617198],
[0.16300996, 0.38179992, 0.08229747, 0.83171561, 0.8420505 ],
[0.55948712, 0.87191389, 0.34025972, 0.45908091, 0.34693702]])
• dim_0
(dim_0)
int64
0 1 2 3
array([0, 1, 2, 3])
• dim_1
(dim_1)
object
'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype=object)

2. Indexing DataArray ¶

In this section, we'll explain how we can perform indexing operations on xarray DataArray. We can do normal numpy indexing using integers as well as indexing using coordinate values that we specified when creating arrays. We'll be performing indexing on arrays that we created during the array creation section earlier.

Numpy Like Integer Indexing¶

In this section, we have performed normal numpy-like integer indexing on our xarray DataArray.

Below we have accessed the 0th element of our 2D array which we created earlier.

In :
arr1

Out:
<xarray.DataArray 'Array1' (columns: 5)>
array([0.57868507, 0.78605464, 0.90389917, 0.85013705, 0.5950187 ])
Coordinates:
index    int64 0
* columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
xarray.DataArray
'Array1'
• columns: 5
• 0.5787 0.7861 0.9039 0.8501 0.595
array([0.57868507, 0.78605464, 0.90389917, 0.85013705, 0.5950187 ])
• index
()
int64
0
array(0)
• columns
(columns)
<U1
'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

Below we have accessed all elements of the first dimension and the 0th elements of the second dimension. This will be like accessing 1 column of 2D array.

In :
arr1[:, 0]

Out:
<xarray.DataArray 'Array1' (index: 4)>
array([0.57868507, 0.46849765, 0.93084546, 0.24271528])
Coordinates:
* index    (index) int64 0 1 2 3
columns  <U1 'A'
xarray.DataArray
'Array1'
• index: 4
• 0.5787 0.4685 0.9308 0.2427
array([0.57868507, 0.46849765, 0.93084546, 0.24271528])
• index
(index)
int64
0 1 2 3
array([0, 1, 2, 3])
• columns
()
<U1
'A'
array('A', dtype='<U1')

Below we have accessed the 0th and 1st row of our data.

In :
arr1[[0,1]]

Out:
<xarray.DataArray 'Array1' (index: 2, columns: 5)>
array([[0.57868507, 0.78605464, 0.90389917, 0.85013705, 0.5950187 ],
[0.46849765, 0.07263884, 0.20157703, 0.99471873, 0.93488703]])
Coordinates:
* index    (index) int64 0 1
* columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
xarray.DataArray
'Array1'
• index: 2
• columns: 5
• 0.5787 0.7861 0.9039 0.8501 0.595 0.4685 0.07264 0.2016 0.9947 0.9349
array([[0.57868507, 0.78605464, 0.90389917, 0.85013705, 0.5950187 ],
[0.46849765, 0.07263884, 0.20157703, 0.99471873, 0.93488703]])
• index
(index)
int64
0 1
array([0, 1])
• columns
(columns)
<U1
'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

Below we have accessed the 0th and 1st column of our 2D array.

In :
arr1[:,[0,1]]

Out:
<xarray.DataArray 'Array1' (index: 4, columns: 2)>
array([[0.57868507, 0.78605464],
[0.46849765, 0.07263884],
[0.93084546, 0.24244413],
[0.24271528, 0.62774479]])
Coordinates:
* index    (index) int64 0 1 2 3
* columns  (columns) <U1 'A' 'B'
xarray.DataArray
'Array1'
• index: 4
• columns: 2
• 0.5787 0.7861 0.4685 0.07264 0.9308 0.2424 0.2427 0.6277
array([[0.57868507, 0.78605464],
[0.46849765, 0.07263884],
[0.93084546, 0.24244413],
[0.24271528, 0.62774479]])
• index
(index)
int64
0 1 2 3
array([0, 1, 2, 3])
• columns
(columns)
<U1
'A' 'B'
array(['A', 'B'], dtype='<U1')

Below we have accessed 2D array of shape 2x2 from our original 4x5 array.

In :
arr1[[1,2],[0,1]]

Out:
<xarray.DataArray 'Array1' (index: 2, columns: 2)>
array([[0.46849765, 0.07263884],
[0.93084546, 0.24244413]])
Coordinates:
* index    (index) int64 1 2
* columns  (columns) <U1 'A' 'B'
xarray.DataArray
'Array1'
• index: 2
• columns: 2
• 0.4685 0.07264 0.9308 0.2424
array([[0.46849765, 0.07263884],
[0.93084546, 0.24244413]])
• index
(index)
int64
1 2
array([1, 2])
• columns
(columns)
<U1
'A' 'B'
array(['A', 'B'], dtype='<U1')

Pandas Like Indexing using .loc Property¶

The xarray DataArray provided loc property which we can use to index arrays as we do with pandas dataframe. The loc property let us specify coordinates values that we had provided when we created the array. The coordinates values can be of any type (string, date, time, etc), not only integer.

Below we have accessed the first element of the first dimension of our DataArray which we created earlier.

In :
arr1.loc

Out:
<xarray.DataArray 'Array1' (columns: 5)>
array([0.57868507, 0.78605464, 0.90389917, 0.85013705, 0.5950187 ])
Coordinates:
index    int64 0
* columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
xarray.DataArray
'Array1'
• columns: 5
• 0.5787 0.7861 0.9039 0.8501 0.595
array([0.57868507, 0.78605464, 0.90389917, 0.85013705, 0.5950187 ])
• index
()
int64
0
array(0)
• columns
(columns)
<U1
'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

Below we have accessed the sub-array by using loc property. We have accessed the sub-array which crosses the 0th element of the first dimension and the first two values of the second dimension. We have used string values for indexing DataArray this time.

In :
arr1.loc[0, ["A","B"]]

Out:
<xarray.DataArray 'Array1' (columns: 2)>
array([0.57868507, 0.78605464])
Coordinates:
index    int64 0
* columns  (columns) <U1 'A' 'B'
xarray.DataArray
'Array1'
• columns: 2
• 0.5787 0.7861
array([0.57868507, 0.78605464])
• index
()
int64
0
array(0)
• columns
(columns)
<U1
'A' 'B'
array(['A', 'B'], dtype='<U1')

Below we have accessed the first value of the 0th dimension of our DataArray which we created earlier using loc property. We have a string value to access the value.

In :
arr2.loc['0']

Out:
<xarray.DataArray 'Array2' (columns: 5)>
array([0.07511355, 0.60393655, 0.74898288, 0.25581543, 0.79114767])
Coordinates:
index    <U1 '0'
* columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
xarray.DataArray
'Array2'
• columns: 5
• 0.07511 0.6039 0.749 0.2558 0.7911
array([0.07511355, 0.60393655, 0.74898288, 0.25581543, 0.79114767])
• index
()
<U1
'0'
array('0', dtype='<U1')
• columns
(columns)
<U1
'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

Below we have accessed another sub-array from our original DataArray using all indices as string values inside of loc property.

In :
arr2.loc['0', ["A","B","C"]]

Out:
<xarray.DataArray 'Array2' (columns: 3)>
array([0.07511355, 0.60393655, 0.74898288])
Coordinates:
index    <U1 '0'
* columns  (columns) <U1 'A' 'B' 'C'
xarray.DataArray
'Array2'
• columns: 3
• 0.07511 0.6039 0.749
array([0.07511355, 0.60393655, 0.74898288])
• index
()
<U1
'0'
array('0', dtype='<U1')
• columns
(columns)
<U1
'A' 'B' 'C'
array(['A', 'B', 'C'], dtype='<U1')

Below we have accessed the sub-array from our array where we had first dimension coordinates specified as date values. We have specified the date value as a string.

In :
arr3.loc["2021-1-1", ['A','B']]

Out:
<xarray.DataArray 'Array3' (columns: 2)>
array([0.39792208, 0.79787484])
Coordinates:
index    datetime64[ns] 2021-01-01
* columns  (columns) <U1 'A' 'B'
xarray.DataArray
'Array3'
• columns: 2
• 0.3979 0.7979
array([0.39792208, 0.79787484])
• index
()
datetime64[ns]
2021-01-01
array('2021-01-01T00:00:00.000000000', dtype='datetime64[ns]')
• columns
(columns)
<U1
'A' 'B'
array(['A', 'B'], dtype='<U1')

Below we have created another example where we are accessing sub-array from our array with date dimension. We have specified list dates as strings this time to access the sub-array.

In :
arr3.loc[["2021-1-1","2021-1-3"], ['A','B']]

Out:
<xarray.DataArray 'Array3' (index: 2, columns: 2)>
array([[0.39792208, 0.79787484],
[0.94518946, 0.21601817]])
Coordinates:
* index    (index) datetime64[ns] 2021-01-01 2021-01-03
* columns  (columns) <U1 'A' 'B'
xarray.DataArray
'Array3'
• index: 2
• columns: 2
• 0.3979 0.7979 0.9452 0.216
array([[0.39792208, 0.79787484],
[0.94518946, 0.21601817]])
• index
(index)
datetime64[ns]
2021-01-01 2021-01-03
array(['2021-01-01T00:00:00.000000000', '2021-01-03T00:00:00.000000000'],
dtype='datetime64[ns]')
• columns
(columns)
<U1
'A' 'B'
array(['A', 'B'], dtype='<U1')

In this example, we have accessed sub-array from our date dimension array by providing date dimension coordinates as a list of dates. We have created a list of 3 dates using date_range() function and provided it to filter first dimension values.

In :
three_days = pd.date_range(start="2021-1-1",periods=3)

arr3.loc[three_days, ["A","B","C"]]

Out:
<xarray.DataArray 'Array3' (index: 3, columns: 3)>
array([[0.39792208, 0.79787484, 0.94760726],
[0.21345645, 0.89753226, 0.00395103],
[0.94518946, 0.21601817, 0.05817   ]])
Coordinates:
* index    (index) datetime64[ns] 2021-01-01 2021-01-02 2021-01-03
* columns  (columns) <U1 'A' 'B' 'C'
xarray.DataArray
'Array3'
• index: 3
• columns: 3
• 0.3979 0.7979 0.9476 0.2135 0.8975 0.003951 0.9452 0.216 0.05817
array([[0.39792208, 0.79787484, 0.94760726],
[0.21345645, 0.89753226, 0.00395103],
[0.94518946, 0.21601817, 0.05817   ]])
• index
(index)
datetime64[ns]
2021-01-01 2021-01-02 2021-01-03
array(['2021-01-01T00:00:00.000000000', '2021-01-02T00:00:00.000000000',
'2021-01-03T00:00:00.000000000'], dtype='datetime64[ns]')
• columns
(columns)
<U1
'A' 'B' 'C'
array(['A', 'B', 'C'], dtype='<U1')

Integer Indexing using isel() Function¶

The xarray DataArray has a method named isel() which lets us specify dimension values as integers and access the sub-array of the original array based on values provided to it.

In order to perform indexing using isel() method, we can provide dimension names and their values either as a dictionary or we can provide them as if they are parameters of the methods as well. We'll explain with examples below how we can use this method to perform indexing to make things clear.

Below we have retrieved the 0th element of the 'index' dimension of the array using isel() method. We have provided value to the dimension as if it is a parameter of the method.

In :
arr1.isel(index=0)

Out:
<xarray.DataArray 'Array1' (columns: 5)>
array([0.57868507, 0.78605464, 0.90389917, 0.85013705, 0.5950187 ])
Coordinates:
index    int64 0
* columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
xarray.DataArray
'Array1'
• columns: 5
• 0.5787 0.7861 0.9039 0.8501 0.595
array([0.57868507, 0.78605464, 0.90389917, 0.85013705, 0.5950187 ])
• index
()
int64
0
array(0)
• columns
(columns)
<U1
'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

Below we have recreated our previous example by providing coordinate value for dimension as a dictionary. This has the same effect as the previous cell.

In :
arr1.isel({'index':0})

Out:
<xarray.DataArray 'Array1' (columns: 5)>
array([0.57868507, 0.78605464, 0.90389917, 0.85013705, 0.5950187 ])
Coordinates:
index    int64 0
* columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
xarray.DataArray
'Array1'
• columns: 5
• 0.5787 0.7861 0.9039 0.8501 0.595
array([0.57868507, 0.78605464, 0.90389917, 0.85013705, 0.5950187 ])
• index
()
int64
0
array(0)
• columns
(columns)
<U1
'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

Below we have tried to retrieve 2D array of shape 2x4 using isel() method. We have provided two coordinate values for the 'index' dimension and 4 coordinates values for the 'columns' dimension.

In :
arr1.isel(index=[0,1], columns=[0,1,2,3])

Out:
<xarray.DataArray 'Array1' (index: 2, columns: 4)>
array([[0.57868507, 0.78605464, 0.90389917, 0.85013705],
[0.46849765, 0.07263884, 0.20157703, 0.99471873]])
Coordinates:
* index    (index) int64 0 1
* columns  (columns) <U1 'A' 'B' 'C' 'D'
xarray.DataArray
'Array1'
• index: 2
• columns: 4
• 0.5787 0.7861 0.9039 0.8501 0.4685 0.07264 0.2016 0.9947
array([[0.57868507, 0.78605464, 0.90389917, 0.85013705],
[0.46849765, 0.07263884, 0.20157703, 0.99471873]])
• index
(index)
int64
0 1
array([0, 1])
• columns
(columns)
<U1
'A' 'B' 'C' 'D'
array(['A', 'B', 'C', 'D'], dtype='<U1')

Below we have recreated our previous example by providing coordinate values as a dictionary.

In :
arr1.isel({'index':[0,1], 'columns':[0,1,2,3]})

Out:
<xarray.DataArray 'Array1' (index: 2, columns: 4)>
array([[0.57868507, 0.78605464, 0.90389917, 0.85013705],
[0.46849765, 0.07263884, 0.20157703, 0.99471873]])
Coordinates:
* index    (index) int64 0 1
* columns  (columns) <U1 'A' 'B' 'C' 'D'
xarray.DataArray
'Array1'
• index: 2
• columns: 4
• 0.5787 0.7861 0.9039 0.8501 0.4685 0.07264 0.2016 0.9947
array([[0.57868507, 0.78605464, 0.90389917, 0.85013705],
[0.46849765, 0.07263884, 0.20157703, 0.99471873]])
• index
(index)
int64
0 1
array([0, 1])
• columns
(columns)
<U1
'A' 'B' 'C' 'D'
array(['A', 'B', 'C', 'D'], dtype='<U1')

Indexing Based on Dimension Data using sel() Function¶

The xarray DataArray provides a method named sel() which works like isel() but it can accept the actual value of coordinates to access sub-arrays rather than integer indexing. We can provide values as either dictionary or as if they are parameters of the method.

Below we have retrieved a sub-array of shape 3x5 from our original array using sel() method. The 'index' dimension has coordinate values as integers hence we have provided them as integers.

In :
arr1.sel(index=[0,1,2])

Out:
<xarray.DataArray 'Array1' (index: 3, columns: 5)>
array([[0.57868507, 0.78605464, 0.90389917, 0.85013705, 0.5950187 ],
[0.46849765, 0.07263884, 0.20157703, 0.99471873, 0.93488703],
[0.93084546, 0.24244413, 0.82591196, 0.81989938, 0.68520558]])
Coordinates:
* index    (index) int64 0 1 2
* columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
xarray.DataArray
'Array1'
• index: 3
• columns: 5
• 0.5787 0.7861 0.9039 0.8501 0.595 ... 0.2424 0.8259 0.8199 0.6852
array([[0.57868507, 0.78605464, 0.90389917, 0.85013705, 0.5950187 ],
[0.46849765, 0.07263884, 0.20157703, 0.99471873, 0.93488703],
[0.93084546, 0.24244413, 0.82591196, 0.81989938, 0.68520558]])
• index
(index)
int64
0 1 2
array([0, 1, 2])
• columns
(columns)
<U1
'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

Below we have tried to access the sub-array of shape 3x5 from our original array using sel() method. This time we have provided coordinate values as a list of strings because original arrays have 'index' dimension values stored as integers.

In :
arr2.sel(index=['0','1','2'])

Out:
<xarray.DataArray 'Array2' (index: 3, columns: 5)>
array([[0.07511355, 0.60393655, 0.74898288, 0.25581543, 0.79114767],
[0.73421379, 0.7067142 , 0.24650569, 0.2074986 , 0.41164924],
[0.50616351, 0.64518492, 0.66608194, 0.16975831, 0.67817385]])
Coordinates:
* index    (index) <U1 '0' '1' '2'
* columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
xarray.DataArray
'Array2'
• index: 3
• columns: 5
• 0.07511 0.6039 0.749 0.2558 0.7911 ... 0.6452 0.6661 0.1698 0.6782
array([[0.07511355, 0.60393655, 0.74898288, 0.25581543, 0.79114767],
[0.73421379, 0.7067142 , 0.24650569, 0.2074986 , 0.41164924],
[0.50616351, 0.64518492, 0.66608194, 0.16975831, 0.67817385]])
• index
(index)
<U1
'0' '1' '2'
array(['0', '1', '2'], dtype='<U1')
• columns
(columns)
<U1
'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

Below we have accessed another 3x3 array from our original array using sel() method. We have provided coordinate values for both dimensions as a list of strings.

In :
arr2.sel(index=['0','1','2'], columns=['A','C','E'])

Out:
<xarray.DataArray 'Array2' (index: 3, columns: 3)>
array([[0.07511355, 0.74898288, 0.79114767],
[0.73421379, 0.24650569, 0.41164924],
[0.50616351, 0.66608194, 0.67817385]])
Coordinates:
* index    (index) <U1 '0' '1' '2'
* columns  (columns) <U1 'A' 'C' 'E'
xarray.DataArray
'Array2'
• index: 3
• columns: 3
• 0.07511 0.749 0.7911 0.7342 0.2465 0.4116 0.5062 0.6661 0.6782
array([[0.07511355, 0.74898288, 0.79114767],
[0.73421379, 0.24650569, 0.41164924],
[0.50616351, 0.66608194, 0.67817385]])
• index
(index)
<U1
'0' '1' '2'
array(['0', '1', '2'], dtype='<U1')
• columns
(columns)
<U1
'A' 'C' 'E'
array(['A', 'C', 'E'], dtype='<U1')

Below we have created another example demonstrating the use of sel() method. We are accessing a sub-array of dimension which holds dates.

In :
arr3.sel(index=["2021-1-1","2021-1-2", "2021-1-3"], columns=['A','B'])

Out:
<xarray.DataArray 'Array3' (index: 3, columns: 2)>
array([[0.39792208, 0.79787484],
[0.21345645, 0.89753226],
[0.94518946, 0.21601817]])
Coordinates:
* index    (index) datetime64[ns] 2021-01-01 2021-01-02 2021-01-03
* columns  (columns) <U1 'A' 'B'
xarray.DataArray
'Array3'
• index: 3
• columns: 2
• 0.3979 0.7979 0.2135 0.8975 0.9452 0.216
array([[0.39792208, 0.79787484],
[0.21345645, 0.89753226],
[0.94518946, 0.21601817]])
• index
(index)
datetime64[ns]
2021-01-01 2021-01-02 2021-01-03
array(['2021-01-01T00:00:00.000000000', '2021-01-02T00:00:00.000000000',
'2021-01-03T00:00:00.000000000'], dtype='datetime64[ns]')
• columns
(columns)
<U1
'A' 'B'
array(['A', 'B'], dtype='<U1')

Below we have created one more example demonstrating the use of sel() method. We have created a list of dates using the pandas date_range() function to access the sub-array based on it. We have provided this list of dates to the 'index' dimension of an array. For other 'columns' dimension, we have provided a list of 3 strings.

In :
three_days = pd.date_range(start="2021-1-1",periods=3)

arr3.sel(index=three_days, columns=['A','B', 'C'])

Out:
<xarray.DataArray 'Array3' (index: 3, columns: 3)>
array([[0.39792208, 0.79787484, 0.94760726],
[0.21345645, 0.89753226, 0.00395103],
[0.94518946, 0.21601817, 0.05817   ]])
Coordinates:
* index    (index) datetime64[ns] 2021-01-01 2021-01-02 2021-01-03
* columns  (columns) <U1 'A' 'B' 'C'
xarray.DataArray
'Array3'
• index: 3
• columns: 3
• 0.3979 0.7979 0.9476 0.2135 0.8975 0.003951 0.9452 0.216 0.05817
array([[0.39792208, 0.79787484, 0.94760726],
[0.21345645, 0.89753226, 0.00395103],
[0.94518946, 0.21601817, 0.05817   ]])
• index
(index)
datetime64[ns]
2021-01-01 2021-01-02 2021-01-03
array(['2021-01-01T00:00:00.000000000', '2021-01-02T00:00:00.000000000',
'2021-01-03T00:00:00.000000000'], dtype='datetime64[ns]')
• columns
(columns)
<U1
'A' 'B' 'C'
array(['A', 'B', 'C'], dtype='<U1')

3. Normal Array Operations ¶

In this section, we'll explain some of the commonly performed operations with arrays like addition, multiplication with scalar, transpose, dot product, null elements check, etc. We'll try to explain as many simple operations as possible with simple examples.

Transpose¶

We can retrieve the transpose of an array by calling T attribute on the array or by calling transpose() method on it.

In :
arr1_transpose = arr1.T # arr1.transpose() works same

arr1_transpose

Out:
<xarray.DataArray 'Array1' (columns: 5, index: 4)>
array([[0.57868507, 0.46849765, 0.93084546, 0.24271528],
[0.78605464, 0.07263884, 0.24244413, 0.62774479],
[0.90389917, 0.20157703, 0.82591196, 0.66185214],
[0.85013705, 0.99471873, 0.81989938, 0.41166893],
[0.5950187 , 0.93488703, 0.68520558, 0.50476117]])
Coordinates:
* index    (index) int64 0 1 2 3
* columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
xarray.DataArray
'Array1'
• columns: 5
• index: 4
• 0.5787 0.4685 0.9308 0.2427 0.7861 ... 0.595 0.9349 0.6852 0.5048
array([[0.57868507, 0.46849765, 0.93084546, 0.24271528],
[0.78605464, 0.07263884, 0.24244413, 0.62774479],
[0.90389917, 0.20157703, 0.82591196, 0.66185214],
[0.85013705, 0.99471873, 0.81989938, 0.41166893],
[0.5950187 , 0.93488703, 0.68520558, 0.50476117]])
• index
(index)
int64
0 1 2 3
array([0, 1, 2, 3])
• columns
(columns)
<U1
'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

We can easily multiply, add, subtract and perform many other operations using scalar.

In :
arr1 * 10

Out:
<xarray.DataArray 'Array1' (index: 4, columns: 5)>
array([[5.78685073, 7.8605464 , 9.03899168, 8.50137048, 5.95018697],
[4.68497649, 0.72638835, 2.01577033, 9.94718734, 9.34887027],
[9.30845463, 2.42444133, 8.25911956, 8.19899378, 6.85205578],
[2.42715276, 6.27744792, 6.61852136, 4.11668931, 5.04761172]])
Coordinates:
* index    (index) int64 0 1 2 3
* columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
xarray.DataArray
'Array1'
• index: 4
• columns: 5
• 5.787 7.861 9.039 8.501 5.95 4.685 ... 2.427 6.277 6.619 4.117 5.048
array([[5.78685073, 7.8605464 , 9.03899168, 8.50137048, 5.95018697],
[4.68497649, 0.72638835, 2.01577033, 9.94718734, 9.34887027],
[9.30845463, 2.42444133, 8.25911956, 8.19899378, 6.85205578],
[2.42715276, 6.27744792, 6.61852136, 4.11668931, 5.04761172]])
• index
(index)
int64
0 1 2 3
array([0, 1, 2, 3])
• columns
(columns)
<U1
'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

We can add arrays of the same shape only if dimension names and coordinate values match between them.

That's the reason below we are adding our first array to itself to demonstrate array addition because all our arrays created earlier have different coordinate values.

In :
arr1 + arr1

Out:
<xarray.DataArray 'Array1' (index: 4, columns: 5)>
array([[1.15737015, 1.57210928, 1.80779834, 1.7002741 , 1.19003739],
[0.9369953 , 0.14527767, 0.40315407, 1.98943747, 1.86977405],
[1.86169093, 0.48488827, 1.65182391, 1.63979876, 1.37041116],
[0.48543055, 1.25548958, 1.32370427, 0.82333786, 1.00952234]])
Coordinates:
* index    (index) int64 0 1 2 3
* columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
xarray.DataArray
'Array1'
• index: 4
• columns: 5
• 1.157 1.572 1.808 1.7 1.19 0.937 ... 0.4854 1.255 1.324 0.8233 1.01
array([[1.15737015, 1.57210928, 1.80779834, 1.7002741 , 1.19003739],
[0.9369953 , 0.14527767, 0.40315407, 1.98943747, 1.86977405],
[1.86169093, 0.48488827, 1.65182391, 1.63979876, 1.37041116],
[0.48543055, 1.25548958, 1.32370427, 0.82333786, 1.00952234]])
• index
(index)
int64
0 1 2 3
array([0, 1, 2, 3])
• columns
(columns)
<U1
'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')
In :
arr + arr2

Out:
<xarray.DataArray (index: 4, columns: 5)>
array([[0.46244583, 0.83503292, 1.41862553, 0.92661634, 1.74944742],
[1.50557018, 0.8233929 , 0.89481389, 0.96159213, 1.18065456],
[1.05456012, 1.37469192, 1.31642291, 1.09310462, 1.38681358],
[0.42832786, 1.20376832, 1.1588314 , 1.34796764, 0.92004354]])
Coordinates:
* index    (index) <U1 '0' '1' '2' '3'
* columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
xarray.DataArray
• index: 4
• columns: 5
• 0.4624 0.835 1.419 0.9266 1.749 ... 0.4283 1.204 1.159 1.348 0.92
array([[0.46244583, 0.83503292, 1.41862553, 0.92661634, 1.74944742],
[1.50557018, 0.8233929 , 0.89481389, 0.96159213, 1.18065456],
[1.05456012, 1.37469192, 1.31642291, 1.09310462, 1.38681358],
[0.42832786, 1.20376832, 1.1588314 , 1.34796764, 0.92004354]])
• index
(index)
<U1
'0' '1' '2' '3'
array(['0', '1', '2', '3'], dtype='<U1')
• columns
(columns)
<U1
'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

argmax()¶

We can retrieve an index of the maximum element in the array using argmax() method.

Below we have retrieved the index of the maximum element of one of our arrays.

In :
max_index = arr1.argmax()

max_index

Out:
<xarray.DataArray 'Array1' ()>
array(8)
xarray.DataArray
'Array1'
• 8
array(8)

We can call item() method on an array with one element to access it.

We can use the same item() method with index to retrieve an element at that index value. Below we are retrieving the maximum element using item() method.

In :
arr1.item(max_index.item())

Out:
0.9947187341846935

The item() method can also accept a tuple of indices for arrays with more than one dimension to extract the individual element.

In :
arr1.item((0,0))

Out:
0.5786850732755588

As we had said earlier, the majority of array operations which we perform on a numpy array can be performed on xarray DataArray as well. But the major difference is that DataArray let us perform those operations based on dimension name and axis index both whereas numpy array let us perform an operation based only on-axis.

Below we have tried to get indices of maximum values across 'index' dimension of an array.

In :
max_indices = arr1.argmax(dim='index', skipna=True)

max_indices

Out:
<xarray.DataArray 'Array1' (columns: 5)>
array([2, 0, 0, 1, 1])
Coordinates:
* columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
xarray.DataArray
'Array1'
• columns: 5
• 2 0 0 1 1
array([2, 0, 0, 1, 1])
• columns
(columns)
<U1
'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

idxmax()¶

The idxmax() method works exactly like argmax() method with only difference that index values are returned as floats instead of integers.

In :
max_indices = arr1.idxmax(dim='index',skipna=True)

max_indices

Out:
<xarray.DataArray 'index' (columns: 5)>
array([2., 0., 0., 1., 1.])
Coordinates:
* columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
xarray.DataArray
'index'
• columns: 5
• 2.0 0.0 0.0 1.0 1.0
array([2., 0., 0., 1., 1.])
• columns
(columns)
<U1
'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

argmin()¶

The argmin() method can be used to retrieve an index of minimum values.

Below we have retrieved indices of minimum values across 'columns' dimension.

There is idxmin() method as well which works exactly like this method.

In :
min_indices = arr1.argmin(dim='columns')

min_indices

Out:
<xarray.DataArray 'Array1' (index: 4)>
array([0, 1, 1, 0])
Coordinates:
* index    (index) int64 0 1 2 3
xarray.DataArray
'Array1'
• index: 4
• 0 1 1 0
array([0, 1, 1, 0])
• index
(index)
int64
0 1 2 3
array([0, 1, 2, 3])

isnull()¶

The isnull() method detect Nan/None values in array. It returns an array of the same size as the original array with boolean values indicating the presence/absence of Nan/None values.

In :
arr1.isnull()

Out:
<xarray.DataArray 'Array1' (index: 4, columns: 5)>
array([[False, False, False, False, False],
[False, False, False, False, False],
[False, False, False, False, False],
[False, False, False, False, False]])
Coordinates:
* index    (index) int64 0 1 2 3
* columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
xarray.DataArray
'Array1'
• index: 4
• columns: 5
• False False False False False False ... False False False False False
array([[False, False, False, False, False],
[False, False, False, False, False],
[False, False, False, False, False],
[False, False, False, False, False]])
• index
(index)
int64
0 1 2 3
array([0, 1, 2, 3])
• columns
(columns)
<U1
'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

where()¶

The where() method lets us perform the conditional operation on an array. Its first argument is condition and the second argument is a value that should be taken in the case where the condition evaluates to False.

Below we have printed two of our earlier arrays as a reference as we'll be testing where() function on them.

In :
arr, arr2

Out:
(<xarray.DataArray 'Array' (index: 4, columns: 5)>
array([[0.38733228, 0.23109638, 0.66964265, 0.6708009 , 0.95829975],
[0.7713564 , 0.1166787 , 0.6483082 , 0.75409353, 0.76900532],
[0.54839661, 0.72950701, 0.65034097, 0.92334631, 0.70863973],
[0.0655017 , 0.56941354, 0.59030199, 0.5371372 , 0.45977435]])
Coordinates:
* index    (index) <U1 '0' '1' '2' '3'
* columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
Attributes:
index:      X-Dimension of Data
columns:    Y-Dimension of Data
info:       Pandas DataFrame
long_name:  Random Data
units:      Unknown,
<xarray.DataArray 'Array2' (index: 4, columns: 5)>
array([[0.07511355, 0.60393655, 0.74898288, 0.25581543, 0.79114767],
[0.73421379, 0.7067142 , 0.24650569, 0.2074986 , 0.41164924],
[0.50616351, 0.64518492, 0.66608194, 0.16975831, 0.67817385],
[0.36282616, 0.63435477, 0.56852942, 0.81083044, 0.46026918]])
Coordinates:
* index    (index) <U1 '0' '1' '2' '3'
* columns  (columns) <U1 'A' 'B' 'C' 'D' 'E')

Below we have called where() method on arr array checking for a condition where the value of an array is greater than 0.5. Whenever value is greater than 0.5 take value from arr else take value from arr2.

In :
arr.where(arr > 0.5, arr2)

Out:
<xarray.DataArray 'Array' (index: 4, columns: 5)>
array([[0.07511355, 0.60393655, 0.66964265, 0.6708009 , 0.95829975],
[0.7713564 , 0.7067142 , 0.6483082 , 0.75409353, 0.76900532],
[0.54839661, 0.72950701, 0.65034097, 0.92334631, 0.70863973],
[0.36282616, 0.56941354, 0.59030199, 0.5371372 , 0.46026918]])
Coordinates:
* index    (index) <U1 '0' '1' '2' '3'
* columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
Attributes:
index:      X-Dimension of Data
columns:    Y-Dimension of Data
info:       Pandas DataFrame
long_name:  Random Data
units:      Unknown
xarray.DataArray
'Array'
• index: 4
• columns: 5
• 0.07511 0.6039 0.6696 0.6708 0.9583 ... 0.5694 0.5903 0.5371 0.4603
array([[0.07511355, 0.60393655, 0.66964265, 0.6708009 , 0.95829975],
[0.7713564 , 0.7067142 , 0.6483082 , 0.75409353, 0.76900532],
[0.54839661, 0.72950701, 0.65034097, 0.92334631, 0.70863973],
[0.36282616, 0.56941354, 0.59030199, 0.5371372 , 0.46026918]])
• index
(index)
<U1
'0' '1' '2' '3'
array(['0', '1', '2', '3'], dtype='<U1')
• columns
(columns)
<U1
'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')
• index :
X-Dimension of Data
columns :
Y-Dimension of Data
info :
Pandas DataFrame
long_name :
Random Data
units :
Unknown

Below we have explained the usage of where() method with another example.

In :
arr.where(arr2 > 0.5, arr)

Out:
<xarray.DataArray 'Array' (index: 4, columns: 5)>
array([[0.38733228, 0.23109638, 0.66964265, 0.6708009 , 0.95829975],
[0.7713564 , 0.1166787 , 0.6483082 , 0.75409353, 0.76900532],
[0.54839661, 0.72950701, 0.65034097, 0.92334631, 0.70863973],
[0.0655017 , 0.56941354, 0.59030199, 0.5371372 , 0.45977435]])
Coordinates:
* index    (index) <U1 '0' '1' '2' '3'
* columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
Attributes:
index:      X-Dimension of Data
columns:    Y-Dimension of Data
info:       Pandas DataFrame
long_name:  Random Data
units:      Unknown
xarray.DataArray
'Array'
• index: 4
• columns: 5
• 0.3873 0.2311 0.6696 0.6708 0.9583 ... 0.5694 0.5903 0.5371 0.4598
array([[0.38733228, 0.23109638, 0.66964265, 0.6708009 , 0.95829975],
[0.7713564 , 0.1166787 , 0.6483082 , 0.75409353, 0.76900532],
[0.54839661, 0.72950701, 0.65034097, 0.92334631, 0.70863973],
[0.0655017 , 0.56941354, 0.59030199, 0.5371372 , 0.45977435]])
• index
(index)
<U1
'0' '1' '2' '3'
array(['0', '1', '2', '3'], dtype='<U1')
• columns
(columns)
<U1
'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')
• index :
X-Dimension of Data
columns :
Y-Dimension of Data
info :
Pandas DataFrame
long_name :
Random Data
units :
Unknown

dot()¶

We can perform the dot product of two arrays using dot() method. We can perform dot products based on dimension names as well.

Below we have performed dot product of two arrays based on dimension 'columns' present in both.

In :
xr.dot(arr, arr2, dims=["columns"])

Out:
<xarray.DataArray (index: 4)>
array([1.59997017, 1.28164447, 1.81875229, 1.36772713])
Coordinates:
* index    (index) <U1 '0' '1' '2' '3'
xarray.DataArray
• index: 4
• 1.6 1.282 1.819 1.368
array([1.59997017, 1.28164447, 1.81875229, 1.36772713])
• index
(index)
<U1
'0' '1' '2' '3'
array(['0', '1', '2', '3'], dtype='<U1')

Below we have performed dot product of two array-based on dimension 'index' present in both.

In :
xr.dot(arr, arr2, dims=["index"])

Out:
<xarray.DataArray (columns: 5)>
array([0.89677849, 1.05390316, 1.43014696, 0.92034748, 1.76691796])
Coordinates:
* columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
xarray.DataArray
• columns: 5
• 0.8968 1.054 1.43 0.9203 1.767
array([0.89677849, 1.05390316, 1.43014696, 0.92034748, 1.76691796])
• columns
(columns)
<U1
'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

Below we have performed bot product without specifying any dimension name.

In :
xr.dot(arr,arr2)

Out:
<xarray.DataArray ()>
array(6.06809405)
xarray.DataArray
• 6.068
array(6.06809405)

drop()¶

The drop() method can be used to drop values in an array based on dimension and coordinates of dimension. It accepts two values as input. The first value is a list of coordinates and the second value is the dimension name, it then drops those values of dimension which has specified coordinates.

Below we have dropped values of 'index' dimension who has coordinate values [0,1].

In :
arr1.drop(labels=[0,1], dim="index")

Out:
<xarray.DataArray 'Array1' (index: 2, columns: 5)>
array([[0.93084546, 0.24244413, 0.82591196, 0.81989938, 0.68520558],
[0.24271528, 0.62774479, 0.66185214, 0.41166893, 0.50476117]])
Coordinates:
* index    (index) int64 2 3
* columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
xarray.DataArray
'Array1'
• index: 2
• columns: 5
• 0.9308 0.2424 0.8259 0.8199 0.6852 0.2427 0.6277 0.6619 0.4117 0.5048
array([[0.93084546, 0.24244413, 0.82591196, 0.81989938, 0.68520558],
[0.24271528, 0.62774479, 0.66185214, 0.41166893, 0.50476117]])
• index
(index)
int64
2 3
array([2, 3])
• columns
(columns)
<U1
'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

Below we have created another example demonstrating the use of drop() method in cases where coordinate values are not integers.

In :
arr2.drop(labels=['0','1'], dim="index")

Out:
<xarray.DataArray 'Array2' (index: 2, columns: 5)>
array([[0.50616351, 0.64518492, 0.66608194, 0.16975831, 0.67817385],
[0.36282616, 0.63435477, 0.56852942, 0.81083044, 0.46026918]])
Coordinates:
* index    (index) <U1 '2' '3'
* columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
xarray.DataArray
'Array2'
• index: 2
• columns: 5
• 0.5062 0.6452 0.6661 0.1698 0.6782 0.3628 0.6344 0.5685 0.8108 0.4603
array([[0.50616351, 0.64518492, 0.66608194, 0.16975831, 0.67817385],
[0.36282616, 0.63435477, 0.56852942, 0.81083044, 0.46026918]])
• index
(index)
<U1
'2' '3'
array(['2', '3'], dtype='<U1')
• columns
(columns)
<U1
'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

Below we have created another example demonstrating the use of drop() method. This time we are dropping values across 'columns' dimension of our array.

In :
arr1.drop(labels=["D","E"], dim="columns")

Out:
<xarray.DataArray 'Array1' (index: 4, columns: 3)>
array([[0.57868507, 0.78605464, 0.90389917],
[0.46849765, 0.07263884, 0.20157703],
[0.93084546, 0.24244413, 0.82591196],
[0.24271528, 0.62774479, 0.66185214]])
Coordinates:
* index    (index) int64 0 1 2 3
* columns  (columns) <U1 'A' 'B' 'C'
xarray.DataArray
'Array1'
• index: 4
• columns: 3
• 0.5787 0.7861 0.9039 0.4685 0.07264 ... 0.8259 0.2427 0.6277 0.6619
array([[0.57868507, 0.78605464, 0.90389917],
[0.46849765, 0.07263884, 0.20157703],
[0.93084546, 0.24244413, 0.82591196],
[0.24271528, 0.62774479, 0.66185214]])
• index
(index)
int64
0 1 2 3
array([0, 1, 2, 3])
• columns
(columns)
<U1
'A' 'B' 'C'
array(['A', 'B', 'C'], dtype='<U1')

drop_isel()¶

The drop_isel() method works like the drop method but it let us specify coordinate values as integers instead of original coordinate values which can be of other data type as well.

The drop_isel() method works like isel() method and lets us specify coordinates of dimension either as a dictionary or as if they are parameters of the method.

Below we have dropped elements from the array whose coordinate value is 0 for dimension 'index'.

In :
arr1.drop_isel({"index":0})

Out:
<xarray.DataArray 'Array1' (index: 3, columns: 5)>
array([[0.46849765, 0.07263884, 0.20157703, 0.99471873, 0.93488703],
[0.93084546, 0.24244413, 0.82591196, 0.81989938, 0.68520558],
[0.24271528, 0.62774479, 0.66185214, 0.41166893, 0.50476117]])
Coordinates:
* index    (index) int64 1 2 3
* columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
xarray.DataArray
'Array1'
• index: 3
• columns: 5
• 0.4685 0.07264 0.2016 0.9947 0.9349 ... 0.6277 0.6619 0.4117 0.5048
array([[0.46849765, 0.07263884, 0.20157703, 0.99471873, 0.93488703],
[0.93084546, 0.24244413, 0.82591196, 0.81989938, 0.68520558],
[0.24271528, 0.62774479, 0.66185214, 0.41166893, 0.50476117]])
• index
(index)
int64
1 2 3
array([1, 2, 3])
• columns
(columns)
<U1
'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

Below we have dropped elements from the array whose coordinate values are 0 and 1 for dimension 'index'.

In :
arr1.drop_isel({"index":[0,1]})

Out:
<xarray.DataArray 'Array1' (index: 2, columns: 5)>
array([[0.93084546, 0.24244413, 0.82591196, 0.81989938, 0.68520558],
[0.24271528, 0.62774479, 0.66185214, 0.41166893, 0.50476117]])
Coordinates:
* index    (index) int64 2 3
* columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
xarray.DataArray
'Array1'
• index: 2
• columns: 5
• 0.9308 0.2424 0.8259 0.8199 0.6852 0.2427 0.6277 0.6619 0.4117 0.5048
array([[0.93084546, 0.24244413, 0.82591196, 0.81989938, 0.68520558],
[0.24271528, 0.62774479, 0.66185214, 0.41166893, 0.50476117]])
• index
(index)
int64
2 3
array([2, 3])
• columns
(columns)
<U1
'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

Below we have created another example demonstrating the use of drop_isel() method to drop values across multiple dimensions of the array.

In :
arr1.drop_isel({"index":[0,1], "columns": [2,3,4]})

Out:
<xarray.DataArray 'Array1' (index: 2, columns: 2)>
array([[0.93084546, 0.24244413],
[0.24271528, 0.62774479]])
Coordinates:
* index    (index) int64 2 3
* columns  (columns) <U1 'A' 'B'
xarray.DataArray
'Array1'
• index: 2
• columns: 2
• 0.9308 0.2424 0.2427 0.6277
array([[0.93084546, 0.24244413],
[0.24271528, 0.62774479]])
• index
(index)
int64
2 3
array([2, 3])
• columns
(columns)
<U1
'A' 'B'
array(['A', 'B'], dtype='<U1')

drop_sel()¶

The drop_sel() method works exactly like drop_isel() with only difference that it accepts original coordinate values of dimension instead of integer values.

Below we have dropped elements from the array whose coordinate values is [0,1] for dimension 'index' and ["C","D","E"] for dimension 'columns'.

In :
arr1.drop_sel({"index":[0,1], "columns": ["C","D","E"]})

Out:
<xarray.DataArray 'Array1' (index: 2, columns: 2)>
array([[0.93084546, 0.24244413],
[0.24271528, 0.62774479]])
Coordinates:
* index    (index) int64 2 3
* columns  (columns) <U1 'A' 'B'
xarray.DataArray
'Array1'
• index: 2
• columns: 2
• 0.9308 0.2424 0.2427 0.6277
array([[0.93084546, 0.24244413],
[0.24271528, 0.62774479]])
• index
(index)
int64
2 3
array([2, 3])
• columns
(columns)
<U1
'A' 'B'
array(['A', 'B'], dtype='<U1')

Below we have created another example demonstrating the use of drop_sel() method across multiple dimensions.

In :
arr2.drop_sel({"index":['0','1'], "columns": ["C","D","E"]})

Out:
<xarray.DataArray 'Array2' (index: 2, columns: 2)>
array([[0.50616351, 0.64518492],
[0.36282616, 0.63435477]])
Coordinates:
* index    (index) <U1 '2' '3'
* columns  (columns) <U1 'A' 'B'
xarray.DataArray
'Array2'
• index: 2
• columns: 2
• 0.5062 0.6452 0.3628 0.6344
array([[0.50616351, 0.64518492],
[0.36282616, 0.63435477]])
• index
(index)
<U1
'2' '3'
array(['2', '3'], dtype='<U1')
• columns
(columns)
<U1
'A' 'B'
array(['A', 'B'], dtype='<U1')

copy()¶

We can call copy() method on xarray DataArray to create a copy of it. This will actually create a new array and any modification to this new array won't reflect in an original array from which it was copied because this new array is stored with its own memory.

In :
arr1_copy = arr1.copy()

arr1_copy

Out:
<xarray.DataArray 'Array1' (index: 4, columns: 5)>
array([[0.57868507, 0.78605464, 0.90389917, 0.85013705, 0.5950187 ],
[0.46849765, 0.07263884, 0.20157703, 0.99471873, 0.93488703],
[0.93084546, 0.24244413, 0.82591196, 0.81989938, 0.68520558],
[0.24271528, 0.62774479, 0.66185214, 0.41166893, 0.50476117]])
Coordinates:
* index    (index) int64 0 1 2 3
* columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
xarray.DataArray
'Array1'
• index: 4
• columns: 5
• 0.5787 0.7861 0.9039 0.8501 0.595 ... 0.6277 0.6619 0.4117 0.5048
array([[0.57868507, 0.78605464, 0.90389917, 0.85013705, 0.5950187 ],
[0.46849765, 0.07263884, 0.20157703, 0.99471873, 0.93488703],
[0.93084546, 0.24244413, 0.82591196, 0.81989938, 0.68520558],
[0.24271528, 0.62774479, 0.66185214, 0.41166893, 0.50476117]])
• index
(index)
int64
0 1 2 3
array([0, 1, 2, 3])
• columns
(columns)
<U1
'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

dropna(dim,how='any')¶

The dropna() method let us drop values across dimension of array. It accepts dimension name as the first parameter and method of drop as the second parameter to drop values. There are two different methods to drop values.

• 'any' - This is default method value. It'll drop entries of dimension where even a single value is Nan.
• 'all' - It'll drop entries of dimension where all entries are Nan.

Below we have set a few entries to Nan in our array which we created by copying one of our existing arrays.

In :
arr1_copy[0,3] = np.nan

arr1_copy[2,4] = np.nan

arr1_copy

Out:
<xarray.DataArray 'Array1' (index: 4, columns: 5)>
array([[0.57868507, 0.78605464, 0.90389917,        nan, 0.5950187 ],
[0.46849765, 0.07263884, 0.20157703, 0.99471873, 0.93488703],
[0.93084546, 0.24244413, 0.82591196, 0.81989938,        nan],
[0.24271528, 0.62774479, 0.66185214, 0.41166893, 0.50476117]])
Coordinates:
* index    (index) int64 0 1 2 3
* columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
xarray.DataArray
'Array1'
• index: 4
• columns: 5
• 0.5787 0.7861 0.9039 nan 0.595 ... 0.2427 0.6277 0.6619 0.4117 0.5048
array([[0.57868507, 0.78605464, 0.90389917,        nan, 0.5950187 ],
[0.46849765, 0.07263884, 0.20157703, 0.99471873, 0.93488703],
[0.93084546, 0.24244413, 0.82591196, 0.81989938,        nan],
[0.24271528, 0.62774479, 0.66185214, 0.41166893, 0.50476117]])
• index
(index)
int64
0 1 2 3
array([0, 1, 2, 3])
• columns
(columns)
<U1
'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

Below we have called dropna() method to drop values across 'index' dimension. It'll drop values where even a single value is Nan.

In :
arr1_copy.dropna(dim="index")

Out:
<xarray.DataArray 'Array1' (index: 2, columns: 5)>
array([[0.46849765, 0.07263884, 0.20157703, 0.99471873, 0.93488703],
[0.24271528, 0.62774479, 0.66185214, 0.41166893, 0.50476117]])
Coordinates:
* index    (index) int64 1 3
* columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
xarray.DataArray
'Array1'
• index: 2
• columns: 5
• 0.4685 0.07264 0.2016 0.9947 0.9349 0.2427 0.6277 0.6619 0.4117 0.5048
array([[0.46849765, 0.07263884, 0.20157703, 0.99471873, 0.93488703],
[0.24271528, 0.62774479, 0.66185214, 0.41166893, 0.50476117]])
• index
(index)
int64
1 3
array([1, 3])
• columns
(columns)
<U1
'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

Below we have called dropna() method to drop values across 'columns' dimension.

In :
arr1_copy.dropna(dim="columns")

Out:
<xarray.DataArray 'Array1' (index: 4, columns: 3)>
array([[0.57868507, 0.78605464, 0.90389917],
[0.46849765, 0.07263884, 0.20157703],
[0.93084546, 0.24244413, 0.82591196],
[0.24271528, 0.62774479, 0.66185214]])
Coordinates:
* index    (index) int64 0 1 2 3
* columns  (columns) <U1 'A' 'B' 'C'
xarray.DataArray
'Array1'
• index: 4
• columns: 3
• 0.5787 0.7861 0.9039 0.4685 0.07264 ... 0.8259 0.2427 0.6277 0.6619
array([[0.57868507, 0.78605464, 0.90389917],
[0.46849765, 0.07263884, 0.20157703],
[0.93084546, 0.24244413, 0.82591196],
[0.24271528, 0.62774479, 0.66185214]])
• index
(index)
int64
0 1 2 3
array([0, 1, 2, 3])
• columns
(columns)
<U1
'A' 'B' 'C'
array(['A', 'B', 'C'], dtype='<U1')

fillna(value)¶

We can use fillna() method to fill NaN values in the array. It accepts a single value as input which will be replaced in place of all NaNs.

In :
arr1_copy.fillna(value=9.99999)

Out:
<xarray.DataArray 'Array1' (index: 4, columns: 5)>
array([[0.57868507, 0.78605464, 0.90389917, 9.99999   , 0.5950187 ],
[0.46849765, 0.07263884, 0.20157703, 0.99471873, 0.93488703],
[0.93084546, 0.24244413, 0.82591196, 0.81989938, 9.99999   ],
[0.24271528, 0.62774479, 0.66185214, 0.41166893, 0.50476117]])
Coordinates:
* index    (index) int64 0 1 2 3
* columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
xarray.DataArray
'Array1'
• index: 4
• columns: 5
• 0.5787 0.7861 0.9039 10.0 0.595 ... 0.2427 0.6277 0.6619 0.4117 0.5048
array([[0.57868507, 0.78605464, 0.90389917, 9.99999   , 0.5950187 ],
[0.46849765, 0.07263884, 0.20157703, 0.99471873, 0.93488703],
[0.93084546, 0.24244413, 0.82591196, 0.81989938, 9.99999   ],
[0.24271528, 0.62774479, 0.66185214, 0.41166893, 0.50476117]])
• index
(index)
int64
0 1 2 3
array([0, 1, 2, 3])
• columns
(columns)
<U1
'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

drop_duplicates(dim)¶

The drop_duplicate() method let us drop duplicate values across dimension. We need to provide dimension names across which we want to drop duplicates.

Below we have first created a copy of one of our existing arrays and then we have copied one of the second axis data to another to create duplicate data. We can notice from the dataset printed below that the 1st and 3rd columns have the same data.

In :
arr1_copy = arr1.copy()

arr1_copy[:, 2] = arr1_copy[:, 0]

arr1_copy

Out:
<xarray.DataArray 'Array1' (index: 4, columns: 5)>
array([[0.57868507, 0.78605464, 0.57868507, 0.85013705, 0.5950187 ],
[0.46849765, 0.07263884, 0.46849765, 0.99471873, 0.93488703],
[0.93084546, 0.24244413, 0.93084546, 0.81989938, 0.68520558],
[0.24271528, 0.62774479, 0.24271528, 0.41166893, 0.50476117]])
Coordinates:
* index    (index) int64 0 1 2 3
* columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
xarray.DataArray
'Array1'
• index: 4
• columns: 5
• 0.5787 0.7861 0.5787 0.8501 0.595 ... 0.6277 0.2427 0.4117 0.5048
array([[0.57868507, 0.78605464, 0.57868507, 0.85013705, 0.5950187 ],
[0.46849765, 0.07263884, 0.46849765, 0.99471873, 0.93488703],
[0.93084546, 0.24244413, 0.93084546, 0.81989938, 0.68520558],
[0.24271528, 0.62774479, 0.24271528, 0.41166893, 0.50476117]])
• index
(index)
int64
0 1 2 3
array([0, 1, 2, 3])
• columns
(columns)
<U1
'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')
In :
arr1_copy.drop_duplicates(dim='columns')

Out:
<xarray.DataArray 'Array1' (index: 4, columns: 5)>
array([[0.57868507, 0.78605464, 0.57868507, 0.85013705, 0.5950187 ],
[0.46849765, 0.07263884, 0.46849765, 0.99471873, 0.93488703],
[0.93084546, 0.24244413, 0.93084546, 0.81989938, 0.68520558],
[0.24271528, 0.62774479, 0.24271528, 0.41166893, 0.50476117]])
Coordinates:
* index    (index) int64 0 1 2 3
* columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
xarray.DataArray
'Array1'
• index: 4
• columns: 5
• 0.5787 0.7861 0.5787 0.8501 0.595 ... 0.6277 0.2427 0.4117 0.5048
array([[0.57868507, 0.78605464, 0.57868507, 0.85013705, 0.5950187 ],
[0.46849765, 0.07263884, 0.46849765, 0.99471873, 0.93488703],
[0.93084546, 0.24244413, 0.93084546, 0.81989938, 0.68520558],
[0.24271528, 0.62774479, 0.24271528, 0.41166893, 0.50476117]])
• index
(index)
int64
0 1 2 3
array([0, 1, 2, 3])
• columns
(columns)
<U1
'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

clip(min,max)¶

The clip() method let us restrict values of the array between the minimum and maximum values specified by us. It accepts two values as input where the first value is the minimum value and the second value is the maximum value. It then replaces all values in an array less than the minimum value with minimum value and all values greater than the maximum value with maximum value.

Below we have tried to restrict values of our array in the range [0.3,0.6] using clip() method.

In :
arr1.clip(min=0.3, max=0.6)

Out:
<xarray.DataArray 'Array1' (index: 4, columns: 5)>
array([[0.57868507, 0.6       , 0.6       , 0.6       , 0.5950187 ],
[0.46849765, 0.3       , 0.3       , 0.6       , 0.6       ],
[0.6       , 0.3       , 0.6       , 0.6       , 0.6       ],
[0.3       , 0.6       , 0.6       , 0.41166893, 0.50476117]])
Coordinates:
* index    (index) int64 0 1 2 3
* columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
xarray.DataArray
'Array1'
• index: 4
• columns: 5
• 0.5787 0.6 0.6 0.6 0.595 0.4685 0.3 ... 0.6 0.3 0.6 0.6 0.4117 0.5048
array([[0.57868507, 0.6       , 0.6       , 0.6       , 0.5950187 ],
[0.46849765, 0.3       , 0.3       , 0.6       , 0.6       ],
[0.6       , 0.3       , 0.6       , 0.6       , 0.6       ],
[0.3       , 0.6       , 0.6       , 0.41166893, 0.50476117]])
• index
(index)
int64
0 1 2 3
array([0, 1, 2, 3])
• columns
(columns)
<U1
'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

contact(objs, dim)¶

We can combine arrays across dimensions using concat() method. It accepts a list of arrays as the first parameter and dimension name as the second parameter. It then combines two arrays across that dimension.

Below we have combined two arrays across 'index' dimension.

In :
xr.concat((arr,arr1), dim="index")

Out:
<xarray.DataArray 'Array' (index: 8, columns: 5)>
array([[0.38733228, 0.23109638, 0.66964265, 0.6708009 , 0.95829975],
[0.7713564 , 0.1166787 , 0.6483082 , 0.75409353, 0.76900532],
[0.54839661, 0.72950701, 0.65034097, 0.92334631, 0.70863973],
[0.0655017 , 0.56941354, 0.59030199, 0.5371372 , 0.45977435],
[0.57868507, 0.78605464, 0.90389917, 0.85013705, 0.5950187 ],
[0.46849765, 0.07263884, 0.20157703, 0.99471873, 0.93488703],
[0.93084546, 0.24244413, 0.82591196, 0.81989938, 0.68520558],
[0.24271528, 0.62774479, 0.66185214, 0.41166893, 0.50476117]])
Coordinates:
* index    (index) object '0' '1' '2' '3' 0 1 2 3
* columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
Attributes:
index:      X-Dimension of Data
columns:    Y-Dimension of Data
info:       Pandas DataFrame
long_name:  Random Data
units:      Unknown
xarray.DataArray
'Array'
• index: 8
• columns: 5
• 0.3873 0.2311 0.6696 0.6708 0.9583 ... 0.6277 0.6619 0.4117 0.5048
array([[0.38733228, 0.23109638, 0.66964265, 0.6708009 , 0.95829975],
[0.7713564 , 0.1166787 , 0.6483082 , 0.75409353, 0.76900532],
[0.54839661, 0.72950701, 0.65034097, 0.92334631, 0.70863973],
[0.0655017 , 0.56941354, 0.59030199, 0.5371372 , 0.45977435],
[0.57868507, 0.78605464, 0.90389917, 0.85013705, 0.5950187 ],
[0.46849765, 0.07263884, 0.20157703, 0.99471873, 0.93488703],
[0.93084546, 0.24244413, 0.82591196, 0.81989938, 0.68520558],
[0.24271528, 0.62774479, 0.66185214, 0.41166893, 0.50476117]])
• index
(index)
object
'0' '1' '2' '3' 0 1 2 3
array(['0', '1', '2', '3', 0, 1, 2, 3], dtype=object)
• columns
(columns)
<U1
'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')
• index :
X-Dimension of Data
columns :
Y-Dimension of Data
info :
Pandas DataFrame
long_name :
Random Data
units :
Unknown

Below we have combined two arrays across 'columns' dimension.

In :
xr.concat((arr,arr2), dim="columns")

Out:
<xarray.DataArray 'Array' (index: 4, columns: 10)>
array([[0.38733228, 0.23109638, 0.66964265, 0.6708009 , 0.95829975,
0.07511355, 0.60393655, 0.74898288, 0.25581543, 0.79114767],
[0.7713564 , 0.1166787 , 0.6483082 , 0.75409353, 0.76900532,
0.73421379, 0.7067142 , 0.24650569, 0.2074986 , 0.41164924],
[0.54839661, 0.72950701, 0.65034097, 0.92334631, 0.70863973,
0.50616351, 0.64518492, 0.66608194, 0.16975831, 0.67817385],
[0.0655017 , 0.56941354, 0.59030199, 0.5371372 , 0.45977435,
0.36282616, 0.63435477, 0.56852942, 0.81083044, 0.46026918]])
Coordinates:
* index    (index) <U1 '0' '1' '2' '3'
* columns  (columns) <U1 'A' 'B' 'C' 'D' 'E' 'A' 'B' 'C' 'D' 'E'
Attributes:
index:      X-Dimension of Data
columns:    Y-Dimension of Data
info:       Pandas DataFrame
long_name:  Random Data
units:      Unknown
xarray.DataArray
'Array'
• index: 4
• columns: 10
• 0.3873 0.2311 0.6696 0.6708 0.9583 ... 0.6344 0.5685 0.8108 0.4603
array([[0.38733228, 0.23109638, 0.66964265, 0.6708009 , 0.95829975,
0.07511355, 0.60393655, 0.74898288, 0.25581543, 0.79114767],
[0.7713564 , 0.1166787 , 0.6483082 , 0.75409353, 0.76900532,
0.73421379, 0.7067142 , 0.24650569, 0.2074986 , 0.41164924],
[0.54839661, 0.72950701, 0.65034097, 0.92334631, 0.70863973,
0.50616351, 0.64518492, 0.66608194, 0.16975831, 0.67817385],
[0.0655017 , 0.56941354, 0.59030199, 0.5371372 , 0.45977435,
0.36282616, 0.63435477, 0.56852942, 0.81083044, 0.46026918]])
• index
(index)
<U1
'0' '1' '2' '3'
array(['0', '1', '2', '3'], dtype='<U1')
• columns
(columns)
<U1
'A' 'B' 'C' 'D' ... 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E', 'A', 'B', 'C', 'D', 'E'], dtype='<U1')
• index :
X-Dimension of Data
columns :
Y-Dimension of Data
info :
Pandas DataFrame
long_name :
Random Data
units :
Unknown

round()¶

The round() method will round float values of an array.

In :
arr1.round()

Out:
<xarray.DataArray 'Array1' (index: 4, columns: 5)>
array([[1., 1., 1., 1., 1.],
[0., 0., 0., 1., 1.],
[1., 0., 1., 1., 1.],
[0., 1., 1., 0., 1.]])
Coordinates:
* index    (index) int64 0 1 2 3
* columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
xarray.DataArray
'Array1'
• index: 4
• columns: 5
• 1.0 1.0 1.0 1.0 1.0 0.0 0.0 0.0 ... 1.0 1.0 1.0 0.0 1.0 1.0 0.0 1.0
array([[1., 1., 1., 1., 1.],
[0., 0., 0., 1., 1.],
[1., 0., 1., 1., 1.],
[0., 1., 1., 0., 1.]])
• index
(index)
int64
0 1 2 3
array([0, 1, 2, 3])
• columns
(columns)
<U1
'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

4. Simple Statistics ¶

In this section, we'll explain how we can perform simple statistics like sum, mean, variance, standard deviation, cumulative sum, cumulative product, etc.

sum(dim=None)¶

The sum() function can calculate sum across dimensions. If we don't provide dimension then it'll calculate the sum of all elements of the array.

Below we have first calculated the sum of all elements of the array. Then in the next cell, we have calculated the sum across 'index' dimension.

In :
arr1.sum()

Out:
<xarray.DataArray 'Array1' ()>
array(12.33916272)
xarray.DataArray
'Array1'
• 12.34
array(12.33916272)
In :
arr1.sum(dim="index")

Out:
<xarray.DataArray 'Array1' (columns: 5)>
array([2.22074346, 1.7288824 , 2.59324029, 3.07642409, 2.71987247])
Coordinates:
* columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
xarray.DataArray
'Array1'
• columns: 5
• 2.221 1.729 2.593 3.076 2.72
array([2.22074346, 1.7288824 , 2.59324029, 3.07642409, 2.71987247])
• columns
(columns)
<U1
'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

min(dim=None)¶

The min() function returns minimum values across dimensions.

Below we have first retrieved the minimum value of the whole array. Then in the next cell, we have retrieved minimum values across 'columns' dimension of the array.

In :
arr1.min()

Out:
<xarray.DataArray 'Array1' ()>
array(0.07263884)
xarray.DataArray
'Array1'
• 0.07264
array(0.07263884)
In :
arr1.min(dim="columns")

Out:
<xarray.DataArray 'Array1' (index: 4)>
array([0.57868507, 0.07263884, 0.24244413, 0.24271528])
Coordinates:
* index    (index) int64 0 1 2 3
xarray.DataArray
'Array1'
• index: 4
• 0.5787 0.07264 0.2424 0.2427
array([0.57868507, 0.07263884, 0.24244413, 0.24271528])
• index
(index)
int64
0 1 2 3
array([0, 1, 2, 3])

max(dim=None)¶

The max() method works exactly like min() but returns maximum values instead.

In :
arr1.max()

Out:
<xarray.DataArray 'Array1' ()>
array(0.99471873)
xarray.DataArray
'Array1'
• 0.9947
array(0.99471873)
In :
arr1.max(dim="index")

Out:
<xarray.DataArray 'Array1' (columns: 5)>
array([0.93084546, 0.78605464, 0.90389917, 0.99471873, 0.93488703])
Coordinates:
* columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
xarray.DataArray
'Array1'
• columns: 5
• 0.9308 0.7861 0.9039 0.9947 0.9349
array([0.93084546, 0.78605464, 0.90389917, 0.99471873, 0.93488703])
• columns
(columns)
<U1
'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

std(dim=None)¶

The std() method helps us calculate a standard deviation across different dimensions of an array. Below we have explained the usage with simple examples.

In :
arr1.std()

Out:
<xarray.DataArray 'Array1' ()>
array(0.26712418)
xarray.DataArray
'Array1'
• 0.2671
array(0.26712418)
In :
arr1.std(dim="columns")

Out:
<xarray.DataArray 'Array1' (index: 4)>
array([0.13275403, 0.37433162, 0.24211231, 0.15232193])
Coordinates:
* index    (index) int64 0 1 2 3
xarray.DataArray
'Array1'
• index: 4
• 0.1328 0.3743 0.2421 0.1523
array([0.13275403, 0.37433162, 0.24211231, 0.15232193])
• index
(index)
int64
0 1 2 3
array([0, 1, 2, 3])

var(dim=None)¶

The var() function helps us calculate variance across dimensions of array.

In :
arr1.var()

Out:
<xarray.DataArray 'Array1' ()>
array(0.07135533)
xarray.DataArray
'Array1'
• 0.07136
array(0.07135533)
In :
arr1.var(dim="index")

Out:
<xarray.DataArray 'Array1' (columns: 5)>
array([0.06170627, 0.0821856 , 0.0741555 , 0.04695209, 0.02573124])
Coordinates:
* columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
xarray.DataArray
'Array1'
• columns: 5
• 0.06171 0.08219 0.07416 0.04695 0.02573
array([0.06170627, 0.0821856 , 0.0741555 , 0.04695209, 0.02573124])
• columns
(columns)
<U1
'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

median(dim=None)¶

The median() function helps us find the median across different dimensions of the array.

In :
arr1.median()

Out:
<xarray.DataArray 'Array1' ()>
array(0.64479846)
xarray.DataArray
'Array1'
• 0.6448
array(0.64479846)
In :
arr1.median(dim="index")

Out:
<xarray.DataArray 'Array1' (columns: 5)>
array([0.52359136, 0.43509446, 0.74388205, 0.83501821, 0.64011214])
Coordinates:
* columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
xarray.DataArray
'Array1'
• columns: 5
• 0.5236 0.4351 0.7439 0.835 0.6401
array([0.52359136, 0.43509446, 0.74388205, 0.83501821, 0.64011214])
• columns
(columns)
<U1
'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

count(dim=None)¶

The count() function counts a number of elements across dimensions of the array.

In :
arr1.count()

Out:
<xarray.DataArray 'Array1' ()>
array(20)
xarray.DataArray
'Array1'
• 20
array(20)
In :
arr1.count(dim="index")

Out:
<xarray.DataArray 'Array1' (columns: 5)>
array([4, 4, 4, 4, 4])
Coordinates:
* columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
xarray.DataArray
'Array1'
• columns: 5
• 4 4 4 4 4
array([4, 4, 4, 4, 4])
• columns
(columns)
<U1
'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

cumprod(dim=None)¶

The cumprod() function helps us calculate cumulative product across different dimensions of the array.

Below we have first calculated cumulative product across 'index' dimension of the array and then in the next cell, we have calculated cumulative product across 'columns' dimension of the array.

In :
arr1.cumprod(dim='index')

Out:
<xarray.DataArray 'Array1' (index: 4, columns: 5)>
array([[0.57868507, 0.78605464, 0.90389917, 0.85013705, 0.5950187 ],
[0.2711126 , 0.05709809, 0.18220531, 0.84564725, 0.55627526],
[0.25236393, 0.0138431 , 0.15048555, 0.69334565, 0.38116291],
[0.06125258, 0.00868993, 0.09959918, 0.28542886, 0.19239624]])
Coordinates:
* index    (index) int64 0 1 2 3
* columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
xarray.DataArray
'Array1'
• index: 4
• columns: 5
• 0.5787 0.7861 0.9039 0.8501 0.595 ... 0.00869 0.0996 0.2854 0.1924
array([[0.57868507, 0.78605464, 0.90389917, 0.85013705, 0.5950187 ],
[0.2711126 , 0.05709809, 0.18220531, 0.84564725, 0.55627526],
[0.25236393, 0.0138431 , 0.15048555, 0.69334565, 0.38116291],
[0.06125258, 0.00868993, 0.09959918, 0.28542886, 0.19239624]])
• index
(index)
int64
0 1 2 3
array([0, 1, 2, 3])
• columns
(columns)
<U1
'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')
In :
arr1.cumprod(dim='columns')

Out:
<xarray.DataArray 'Array1' (index: 4, columns: 5)>
array([[0.57868507, 0.45487809, 0.41116392, 0.34954568, 0.20798622],
[0.46849765, 0.03403112, 0.00685989, 0.00682366, 0.00637935],
[0.93084546, 0.22567802, 0.18639018, 0.15282119, 0.10471393],
[0.24271528, 0.15236325, 0.10084194, 0.04151349, 0.0209544 ]])
Coordinates:
* index    (index) int64 0 1 2 3
* columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
xarray.DataArray
'Array1'
• index: 4
• columns: 5
• 0.5787 0.4549 0.4112 0.3495 0.208 ... 0.1524 0.1008 0.04151 0.02095
array([[0.57868507, 0.45487809, 0.41116392, 0.34954568, 0.20798622],
[0.46849765, 0.03403112, 0.00685989, 0.00682366, 0.00637935],
[0.93084546, 0.22567802, 0.18639018, 0.15282119, 0.10471393],
[0.24271528, 0.15236325, 0.10084194, 0.04151349, 0.0209544 ]])
• index
(index)
int64
0 1 2 3
array([0, 1, 2, 3])
• columns
(columns)
<U1
'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

cumsum(dim=None)¶

The cumsum() function helps us find cumulative sum across different dimensions of the array and works exactly like cumprod() function.

In :
arr1.cumsum(dim='index')

Out:
<xarray.DataArray 'Array1' (index: 4, columns: 5)>
array([[0.57868507, 0.78605464, 0.90389917, 0.85013705, 0.5950187 ],
[1.04718272, 0.85869348, 1.1054762 , 1.84485578, 1.52990572],
[1.97802818, 1.10113761, 1.93138816, 2.66475516, 2.2151113 ],
[2.22074346, 1.7288824 , 2.59324029, 3.07642409, 2.71987247]])
Coordinates:
* index    (index) int64 0 1 2 3
* columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
xarray.DataArray
'Array1'
• index: 4
• columns: 5
• 0.5787 0.7861 0.9039 0.8501 0.595 ... 2.221 1.729 2.593 3.076 2.72
array([[0.57868507, 0.78605464, 0.90389917, 0.85013705, 0.5950187 ],
[1.04718272, 0.85869348, 1.1054762 , 1.84485578, 1.52990572],
[1.97802818, 1.10113761, 1.93138816, 2.66475516, 2.2151113 ],
[2.22074346, 1.7288824 , 2.59324029, 3.07642409, 2.71987247]])
• index
(index)
int64
0 1 2 3
array([0, 1, 2, 3])
• columns
(columns)
<U1
'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')
In :
arr1.cumsum(dim='columns')

Out:
<xarray.DataArray 'Array1' (index: 4, columns: 5)>
array([[0.57868507, 1.36473971, 2.26863888, 3.11877593, 3.71379463],
[0.46849765, 0.54113648, 0.74271352, 1.73743225, 2.67231928],
[0.93084546, 1.1732896 , 1.99920155, 2.81910093, 3.50430651],
[0.24271528, 0.87046007, 1.5323122 , 1.94398113, 2.44874231]])
Coordinates:
* index    (index) int64 0 1 2 3
* columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
xarray.DataArray
'Array1'
• index: 4
• columns: 5
• 0.5787 1.365 2.269 3.119 3.714 ... 0.2427 0.8705 1.532 1.944 2.449
array([[0.57868507, 1.36473971, 2.26863888, 3.11877593, 3.71379463],
[0.46849765, 0.54113648, 0.74271352, 1.73743225, 2.67231928],
[0.93084546, 1.1732896 , 1.99920155, 2.81910093, 3.50430651],
[0.24271528, 0.87046007, 1.5323122 , 1.94398113, 2.44874231]])
• index
(index)
int64
0 1 2 3
array([0, 1, 2, 3])
• columns
(columns)
<U1
'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')

corr()¶

The corr() function helps us find the Pearson correlation coefficient across different dimensions of an array.

Below we have calculated the correlation between two arrays of the same dimensions. Then we have calculated correlation across 'index' dimension and 'columns' dimensions respectively. It'll take 1D arrays from 2D arrays based on dimensions and find out the correlation between them.

In :
xr.corr(arr, arr2)

Out:
<xarray.DataArray ()>
array(0.0211588)
xarray.DataArray
• 0.02116
array(0.0211588)
In :
xr.corr(arr, arr2, dim="index")

Out:
<xarray.DataArray (columns: 5)>
array([ 0.62264745, -0.33522625,  0.16205577, -0.8278302 ,  0.64970906])
Coordinates:
* columns  (columns) <U1 'A' 'B' 'C' 'D' 'E'
xarray.DataArray
• columns: 5
• 0.6226 -0.3352 0.1621 -0.8278 0.6497
array([ 0.62264745, -0.33522625,  0.16205577, -0.8278302 ,  0.64970906])
• columns
(columns)
<U1
'A' 'B' 'C' 'D' 'E'
array(['A', 'B', 'C', 'D', 'E'], dtype='<U1')
In :
xr.corr(arr, arr2, dim="columns")

Out: