Usage Examples#
import numpy as np
import pyproj
import xarray as xr
import xproj
xr.set_options(display_expand_indexes=True);
Example dataset#
We’ll use an Xarray tutorial dataset as example.
ds = xr.tutorial.load_dataset("air_temperature")
ds
<xarray.Dataset> Size: 31MB
Dimensions: (lat: 25, time: 2920, lon: 53)
Coordinates:
* lat (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0
* lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
* time (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00
Data variables:
air (time, lat, lon) float64 31MB 241.2 242.5 243.5 ... 296.2 295.7
Attributes:
Conventions: COARDS
title: 4x daily NMC reanalysis (1948)
description: Data is from NMC initialized reanalysis\n(4x/day). These a...
platform: Model
references: http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...This example dataset has no explicit coordinate reference system (CRS) specified. It represents a subset of reanalysis data on a global latitude-longitude (gaussian) grid.
Setting the CRS#
Using the dataset loaded above in GIS applications may require setting a CRS first.
This can be done using xarray.Dataset.proj.assign_crs() that creates a new
spatial reference coordinate (if it doesn’t exist yet), i.e., a scalar
coordinate with a CRSIndex.
Here we specify the common WGS84 global geodetic CRS (EPSG code: 4326).
Note
The name of the spatial reference coordinate is arbitrary. spatial_ref is
chosen here as it is a common name (it is used by default in
rioxarray and
odc-geo).
ds_wgs84 = ds.proj.assign_crs(spatial_ref="epsg:4326")
ds_wgs84
<xarray.Dataset> Size: 31MB
Dimensions: (lat: 25, time: 2920, lon: 53)
Coordinates:
* lat (lat) float32 100B 75.0 72.5 70.0 67.5 ... 22.5 20.0 17.5 15.0
* lon (lon) float32 212B 200.0 202.5 205.0 ... 325.0 327.5 330.0
* time (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00
* spatial_ref int64 8B 0
Data variables:
air (time, lat, lon) float64 31MB 241.2 242.5 243.5 ... 296.2 295.7
Indexes:
spatial_ref CRSIndex (crs=EPSG:4326)
Attributes:
Conventions: COARDS
title: 4x daily NMC reanalysis (1948)
description: Data is from NMC initialized reanalysis\n(4x/day). These a...
platform: Model
references: http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...Getting the CRS#
The easiest way to access the CRS is via the xarray.Dataset.proj.crs
property, which returns a
pyproj.CRS object.
ds_wgs84.proj.crs
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
Alternatively, it is possible to explicitly pass the name of the spatial reference coordinate to the .proj accessor:
ds_wgs84.proj("spatial_ref").crs
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
Writing CRS information#
Before saving the dataset to a given format, it may be useful to write the CRS
information as coordinate metadata (attributes) so it could be later loaded by
other tools like GDAL that understand this metadata. This can be done with
xarray.Dataset.proj.write_crs_info():
ds_info = ds_wgs84.proj.write_crs_info()
ds_info.spatial_ref
<xarray.DataArray 'spatial_ref' ()> Size: 8B
array(0)
Coordinates:
* spatial_ref int64 8B 0
Indexes:
spatial_ref CRSIndex (crs=EPSG:4326)
Attributes:
crs_wkt: GEOGCRS["WGS 84",ENSEMBLE["World Geodetic System 1984 ensemble"...Conversely, the attributes of the spatial reference coordinates can be cleared
via xarray.Dataset.proj.clear_crs_info().
CRS-aware alignment#
One of the main motivations of associating a CRSIndex with a
spatial reference coordinate is to compare or align xarray objects based
on their CRS.
To illustrate how it works, let’s clone the example dataset and assume that the latitude and longitude coordinate values are relative to the WGS72 CRS (EPSG code: 4322):
ds_wgs72 = ds_wgs84.proj.assign_crs(spatial_ref="epsg:4322", allow_override=True)
Note the nice error message below when trying to combine the two datasets with
different CRS (only works if the spatial reference coordinates have the same
name). The error is raised by Xarray itself and shows the repr() of the
respective CRSIndex objects as detailed information.
ds_wgs84 + ds_wgs72
---------------------------------------------------------------------------
AlignmentError Traceback (most recent call last)
Cell In[8], line 1
----> 1 ds_wgs84 + ds_wgs72
File ~/checkouts/readthedocs.org/user_builds/xproj/conda/stable/lib/python3.11/site-packages/xarray/core/_typed_ops.py:205, in DatasetOpsMixin.__add__(self, other)
204 def __add__(self, other: DsCompatible) -> Self | DataTree:
--> 205 return self._binary_op(other, operator.add)
File ~/checkouts/readthedocs.org/user_builds/xproj/conda/stable/lib/python3.11/site-packages/xarray/core/dataset.py:7573, in Dataset._binary_op(self, other, f, reflexive, join)
7571 align_type = OPTIONS["arithmetic_join"] if join is None else join
7572 if isinstance(other, DataArray | Dataset):
-> 7573 self, other = align(self, other, join=align_type, copy=False)
7574 g = f if not reflexive else lambda x, y: f(y, x)
7575 ds = self._calculate_binary_op(g, other, join=align_type)
File ~/checkouts/readthedocs.org/user_builds/xproj/conda/stable/lib/python3.11/site-packages/xarray/structure/alignment.py:944, in align(join, copy, indexes, exclude, fill_value, *objects)
748 """
749 Given any number of Dataset and/or DataArray objects, returns new
750 objects with aligned indexes and dimension sizes.
(...) 934
935 """
936 aligner = Aligner(
937 objects,
938 join=join,
(...) 942 fill_value=fill_value,
943 )
--> 944 aligner.align()
945 return aligner.results
File ~/checkouts/readthedocs.org/user_builds/xproj/conda/stable/lib/python3.11/site-packages/xarray/structure/alignment.py:636, in Aligner.align(self)
634 self.find_matching_indexes()
635 self.find_matching_unindexed_dims()
--> 636 self.align_indexes()
637 self.assert_unindexed_dim_sizes_equal()
639 if self.join == "override":
File ~/checkouts/readthedocs.org/user_builds/xproj/conda/stable/lib/python3.11/site-packages/xarray/structure/alignment.py:462, in Aligner.align_indexes(self)
455 raise AlignmentError(
456 "cannot align objects with join='exact' where "
457 "index/labels/sizes are not equal along "
458 "these coordinates (dimensions): "
459 + ", ".join(f"{name!r} {dims!r}" for name, dims in key[0])
460 )
461 joiner = self._get_index_joiner(index_cls)
--> 462 joined_index = joiner(matching_indexes)
463 if self.join == "left":
464 joined_index_vars = matching_index_vars[0]
File ~/checkouts/readthedocs.org/user_builds/xproj/conda/stable/lib/python3.11/site-packages/xproj/index.py:96, in CRSIndex.join(self, other, how)
92 def join(self, other: CRSIndex, how: str = "inner") -> CRSIndex:
93 # If this method is called during Xarray alignment, it means that the
94 # equality check failed. Instead of a NotImplementedError we raise a
95 # ValueError with a nice error message.
---> 96 raise AlignmentError(
97 "Objects to align do not have the same CRS\n"
98 f"first index:\n{self!r}\n\nsecond index:\n{other!r}"
99 )
AlignmentError: Objects to align do not have the same CRS
first index:
CRSIndex
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
second index:
CRSIndex
<Geographic 2D CRS: EPSG:4322>
Name: WGS 72
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1972
- Ellipsoid: WGS 72
- Prime Meridian: Greenwich
Since it relies on Xarray indexes, CRS-based auto-alignment also works
seamlessly for many operations such as xarray.concat():
xr.concat([ds_wgs84.isel(lat=[0, 1]), ds_wgs84.isel(lat=[-1])], "lat")
<xarray.Dataset> Size: 4MB
Dimensions: (time: 2920, lat: 3, lon: 53)
Coordinates:
* lat (lat) float32 12B 75.0 72.5 15.0
* lon (lon) float32 212B 200.0 202.5 205.0 ... 325.0 327.5 330.0
* time (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00
* spatial_ref int64 8B 0
Data variables:
air (time, lat, lon) float64 4MB 241.2 242.5 243.5 ... 296.2 295.7
Indexes:
spatial_ref CRSIndex (crs=EPSG:4326)
Attributes:
Conventions: COARDS
title: 4x daily NMC reanalysis (1948)
description: Data is from NMC initialized reanalysis\n(4x/day). These a...
platform: Model
references: http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...It is also possible to combine heterogeneous geospatial Datasets (e.g., raster, vector, grid, mesh) as long as they all have an indexed spatial reference coordinate with the same name:
# lat-lon trajectory
ds_traj = xr.Dataset(
coords={
"latitude": ("traj", [73, 80, 76]),
"longitude": ("traj", [220, 260, 270]),
},
)
ds_traj_wgs84 = ds_traj.proj.assign_crs(spatial_ref="epsg:4326")
# merge the lat-lon grid with the vector data cube
xr.merge([ds_wgs84, ds_traj_wgs84])
<xarray.Dataset> Size: 31MB
Dimensions: (lat: 25, time: 2920, lon: 53, traj: 3)
Coordinates:
* lat (lat) float32 100B 75.0 72.5 70.0 67.5 ... 22.5 20.0 17.5 15.0
* lon (lon) float32 212B 200.0 202.5 205.0 ... 325.0 327.5 330.0
* time (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00
* spatial_ref int64 8B 0
latitude (traj) int64 24B 73 80 76
longitude (traj) int64 24B 220 260 270
Dimensions without coordinates: traj
Data variables:
air (time, lat, lon) float64 31MB 241.2 242.5 243.5 ... 296.2 295.7
Indexes:
spatial_ref CRSIndex (crs=EPSG:4326)
Attributes:
Conventions: COARDS
title: 4x daily NMC reanalysis (1948)
description: Data is from NMC initialized reanalysis\n(4x/day). These a...
platform: Model
references: http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...Clearing the CRS#
Just drop the spatial reference coordinate:
ds_no_crs = ds_wgs84.drop_vars("spatial_ref")
ds_no_crs
<xarray.Dataset> Size: 31MB
Dimensions: (lat: 25, time: 2920, lon: 53)
Coordinates:
* lat (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0
* lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
* time (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00
Data variables:
air (time, lat, lon) float64 31MB 241.2 242.5 243.5 ... 296.2 295.7
Attributes:
Conventions: COARDS
title: 4x daily NMC reanalysis (1948)
description: Data is from NMC initialized reanalysis\n(4x/day). These a...
platform: Model
references: http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...Note that it is possible to combine datasets with and without a defined CRS. The resulting dataset will have the common CRS found among all datasets.
ds_wgs84 + ds_no_crs
<xarray.Dataset> Size: 31MB
Dimensions: (lat: 25, lon: 53, time: 2920)
Coordinates:
* lat (lat) float32 100B 75.0 72.5 70.0 67.5 ... 22.5 20.0 17.5 15.0
* lon (lon) float32 212B 200.0 202.5 205.0 ... 325.0 327.5 330.0
* time (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:00:00
* spatial_ref int64 8B 0
Data variables:
air (time, lat, lon) float64 31MB 482.4 485.0 487.0 ... 592.4 591.4
Indexes:
spatial_ref CRSIndex (crs=EPSG:4326)Multi-CRS Dataset or DataArray#
Xproj supports Dataset or DataArray objects with multiple spatial reference coordinates.
Warning
This feature is experimental and may be removed in the future.
One possible use case is having a dataset with both geographic and projected coordinates, e.g.,
projected_crs = pyproj.CRS.from_epsg(4087)
p = pyproj.Proj(projected_crs)
lonlon, latlat = np.meshgrid(ds.lon, ds.lat)
xx, yy = p(lonlon - 360, latlat)
ds_both = (
ds_wgs84
.rename_vars(spatial_ref="geographic_ref")
.assign_coords(x=(("lat", "lon"), xx), y=(("lat", "lon"), yy))
.proj.assign_crs(projected_ref=projected_crs)
)
ds_both
<xarray.Dataset> Size: 31MB
Dimensions: (lat: 25, time: 2920, lon: 53)
Coordinates:
* lat (lat) float32 100B 75.0 72.5 70.0 67.5 ... 20.0 17.5 15.0
* lon (lon) float32 212B 200.0 202.5 205.0 ... 325.0 327.5 330.0
* time (time) datetime64[ns] 23kB 2013-01-01 ... 2014-12-31T18:0...
* geographic_ref int64 8B 0
x (lat, lon) float64 11kB -1.781e+07 -1.753e+07 ... -3.34e+06
y (lat, lon) float64 11kB 8.349e+06 8.349e+06 ... 1.67e+06
* projected_ref int64 8B 0
Data variables:
air (time, lat, lon) float64 31MB 241.2 242.5 ... 296.2 295.7
Indexes:
geographic_ref CRSIndex (crs=EPSG:4326)
projected_ref CRSIndex (crs=EPSG:4087)
Attributes:
Conventions: COARDS
title: 4x daily NMC reanalysis (1948)
description: Data is from NMC initialized reanalysis\n(4x/day). These a...
platform: Model
references: http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...Trying to access the (unique) CRS of the dataset raises an error:
ds_both.proj.crs
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[14], line 1
----> 1 ds_both.proj.crs
File ~/checkouts/readthedocs.org/user_builds/xproj/conda/stable/lib/python3.11/site-packages/xproj/accessor.py:258, in ProjAccessor.crs(self)
256 self._crs = next(iter(all_crs.values()))
257 else:
--> 258 raise ValueError(
259 "found multiple CRS in Dataset or DataArray:\n"
260 + "\n".join(f"{name}: {crs.to_string()}" for name, crs in all_crs.items())
261 )
263 return self._crs
ValueError: found multiple CRS in Dataset or DataArray:
geographic_ref: EPSG:4326
projected_ref: EPSG:4087
To access either the geographic or the projected CRS, the name of the
corresponding spatial reference coordinate must be passed explicitly to the
.proj accessor:
ds_both.proj("projected_ref").crs
<Projected CRS: EPSG:4087>
Name: WGS 84 / World Equidistant Cylindrical
Axis Info [cartesian]:
- X[east]: Easting (metre)
- Y[north]: Northing (metre)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Coordinate Operation:
- name: World Equidistant Cylindrical
- method: Equidistant Cylindrical
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich