IAGOS evaluation dataset¶

The IAGOS evaluation dataset included in ContrailBench provides a gridded dataset of PCR observations based on in-situ measurements from the IAGOS program. This dataset is used to calculate forecast hit rates in ContrailBench benchmarks. See preprocess_iagos.py for details on how this data is prepared from IAGOS measurements.

The IAGOS evaluation dataset is available alongside other evaluation datasets in a public cloud bucket (gs://contrailbench-public-data). This notebook shows

How to load IAGOS evaluation data using Pandas
How to interpret the contents of the IAGOS evaluation dataset
How to use the IAGOS evaluation dataset to compute ContrailBench hit rate metrics

[1]:

import datetime
import os

import aiohttp
import cartopy.crs as ccrs
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import xarray as xr

Loading IAGOS evaluation data¶

IAGOS evaluation data is sharded by time (one file per hour) and altitude (one file per flight level). Coverage for each ContrailBench release cycle is as follows:

Release	Times	Altitudes	GCS path
v1	Jan-Dec 2024	FL270-440	`gs://contrail-bench-public-data/v1/iagos`

Because gridded IAGOS data is relatively sparse, the dataset is stored as parquet files with one row for each grid cell with non-zero flight distance. These files can be read directly from GCS with Pandas.

Warning: be very careful about timezone-naive datetimes when using ContrailBench evaluation datasets. Python’s datetime module assumes that naive datetimes represent local time, which affects the POSIX timestamp returned by datetime.timestamp().

[2]:

gcs_path = "gs://contrailbench-public-data/v1/iagos"
time = datetime.datetime(2024, 6, 1, 1, tzinfo=datetime.UTC)
flight_level = 350

df = pd.read_parquet(f"{gcs_path}/{int(time.timestamp())}_{flight_level}.pq")
df

[2]:

	longitude	latitude	pcr_distance	total_distance
0	-54.50	-32.50	8446.716965	8446.716965
1	-54.25	-32.50	7402.788859	7402.788859
2	-54.25	-32.25	22223.452812	22223.452812
3	-54.00	-32.25	23174.986600	23174.986600
4	-54.00	-32.00	7395.536809	7395.536809
5	-53.75	-32.00	32273.638929	32273.638929
6	-53.75	-31.75	2062.059471	2062.059471
7	-53.50	-31.75	32962.845584	32962.845584
8	-53.50	-31.50	4116.337698	4116.337698
9	-53.25	-31.50	31893.240717	31893.240717
10	-53.25	-31.25	6142.800760	6142.800760
11	-53.00	-31.25	29757.554541	29757.554541
12	-53.00	-31.00	8182.882624	8182.882624
13	-52.75	-31.00	25642.348014	26649.458313
14	-52.75	-30.75	12335.472745	12335.472745
15	-52.50	-30.75	22753.039993	23798.840797
16	-52.50	-30.50	14501.361623	14501.361623
17	-52.25	-30.50	20736.577725	20736.577725
18	-52.25	-30.25	17630.198242	17630.198242
19	-52.00	-30.25	17644.237722	17644.237722
20	-52.00	-30.00	20785.444459	20785.444459
21	-51.75	-30.00	13499.483442	13499.483442
22	-51.75	-29.75	2069.563041	16627.123659

Use in ContrailBench metrics¶

The GRUAN evaluation dataset is used in PCR benchmarks to calculate forecast hit rates, a proxy for the effectiveness of avoidance.

This notebook uses the Contrails.org forecast for an example calculation. See the Contrails.org example notebook for details about accessing and preprocessing the Contrails.org forecast.

[5]:

url = "https://api.contrails.org/v1/grids"
params = {
    "aircraft_class": "default",
    "flight_level": str(flight_level),
    "time": time.strftime("%Y-%m-%dT%H"),
    "units": "ef_per_m",
}
headers = {"x-api-key": os.environ["CONTRAILS_API_KEY"]}

async with (
    aiohttp.ClientSession(raise_for_status=True) as session,
    session.get(url, params=params, headers=headers) as resp,
):
    content = await resp.read()

with open("forecast.nc", "wb") as f:
    f.write(content)
ds = xr.open_dataset("forecast.nc")

ds["pcr"] = ds["ef_per_m"] != 0
processed = ds[["pcr"]]

Forecast hit rates can be calculated using some xarray indexing tricks. This notebook computes hit rates treating all grid cells with non-zero pcr_distance as observed PCRs and weighting grid cells by area, consistent with ContrailBench benchmarks.

[6]:

target_lon = xr.DataArray(df["longitude"].loc[df["pcr_distance"] > 0], dims="row")
target_lat = xr.DataArray(df["latitude"].loc[df["pcr_distance"] > 0], dims="row")
area = xr.DataArray(np.cos(np.deg2rad(df["latitude"].loc[df["pcr_distance"] > 0])), dims="row")

pcr = processed["pcr"].sel(longitude=target_lon, latitude=target_lat)
fcst_area = area.where(pcr).sum().item()
tot_area = area.sum().item()

print(f"Forecast hit rate: {fcst_area / tot_area:.3f}")

Forecast hit rate: 0.697

IAGOS evaluation dataset¶

Loading IAGOS evaluation data¶

Interpretation¶

Use in ContrailBench metrics¶