General example

Choose and download data

Select a combination of variables and fetch data from one day.

Resample to one measurement per 10s; filter for the Northern Hemisphere.

[1]:
from viresclient import SwarmRequest
import datetime as dt

request = SwarmRequest()

request.set_collection("SW_OPER_MAGA_LR_1B")

request.set_products(measurements=["F","B_NEC"],
                     models=["MCO_SHA_2C", "MMA_SHA_2C-Primary", "MMA_SHA_2C-Secondary"],
                     auxiliaries=["QDLat", "QDLon"],
                     residuals=False,
                     sampling_step="PT10S")

request.set_range_filter(parameter="Latitude",
                         minimum=0,
                         maximum=90)

data = request.get_between(start_time=dt.datetime(2016,1,1),
                           end_time=dt.datetime(2016,1,2))
[1/1] Processing:  100%|██████████|  [ Elapsed: 00:01, Remaining: 00:00 ]
      Downloading: 100%|██████████|  [ Elapsed: 00:00, Remaining: 00:00 ] (0.766MB)

Save the data directly

You can save the generated data file directly then open it using other software.

[2]:
data.to_file('testfile.cdf')
Data written to testfile.cdf

Convert to a pandas DataFrame

pandas offers many useful features for working with 2D labelled time series data.

[3]:
df = data.as_dataframe()
df.head()
[3]:
Spacecraft Latitude Longitude Radius F F_MCO_SHA_2C F_MMA_SHA_2C-Primary F_MMA_SHA_2C-Secondary B_NEC B_NEC_MCO_SHA_2C B_NEC_MMA_SHA_2C-Primary B_NEC_MMA_SHA_2C-Secondary QDLat QDLon
2016-01-01 00:28:00 A 0.197610 -102.681841 6828390.60 24630.6312 24795.742996 127.134287 33.517576 [23432.3208, 2928.6274000000003, 7000.6948] [23604.765693069177, 2930.2688321265728, 7004.... [-125.78890509740265, -5.302191425378982, 17.6... [-32.5031562033331, -1.386212193226803, -8.065... 8.425159 -30.812750
2016-01-01 00:28:10 A 0.838158 -102.693946 6828288.87 24808.1073 24971.853392 127.197315 33.718966 [23451.1801, 2899.534, 7554.4035] [23623.327008268625, 2901.392633804825, 7557.3... [-125.60166979171441, -5.252627485291866, 19.3... [-32.494418615582255, -1.3566476193699097, -8.... 9.068963 -30.902256
2016-01-01 00:28:20 A 1.478724 -102.706041 6828186.32 24993.3709 25155.643730 127.257710 33.932493 [23465.812100000003, 2871.8025000000002, 8109.... [23637.6657342565, 2873.1207904780367, 8112.48... [-125.38835978948227, -5.203014637815361, 21.1... [-32.47777846302375, -1.3273711242288924, -9.7... 9.713511 -30.990784
2016-01-01 00:28:30 A 2.119308 -102.718118 6828082.97 25186.2207 25346.943757 127.315462 34.157878 [23476.3596, 2844.1066, 8666.6296] [23647.710917671076, 2845.447078036548, 8669.2... [-125.14900320901313, -5.153357531132542, 22.8... [-32.45318157347463, -1.2983807863969918, -10.... 10.358768 -31.078363
2016-01-01 00:28:40 A 2.759911 -102.730169 6827978.83 25386.4755 25545.573072 127.370566 34.394821 [23482.5073, 2817.0387, 9225.1501] [23653.3949273992, 2818.3663458182045, 9227.67... [-124.88363654458195, -5.103660714780139, 24.5... [-32.42057710492788, -1.2696744882122784, -11.... 11.004701 -31.165012

NB: Some of the columns contain vectors (e.g. B_NEC). Because of the way pandas works, this is a bad way of organising the data - df["B_NEC"].values is actually an array of arrays. It is better to “expand” the columns above using the expand keyword argument:

[4]:
df = data.as_dataframe(expand=True)
df.head()
[4]:
Spacecraft Latitude Longitude Radius F F_MCO_SHA_2C F_MMA_SHA_2C-Primary F_MMA_SHA_2C-Secondary B_NEC_N B_NEC_E ... B_NEC_MCO_SHA_2C_E B_NEC_MCO_SHA_2C_C B_NEC_MMA_SHA_2C-Primary_N B_NEC_MMA_SHA_2C-Primary_E B_NEC_MMA_SHA_2C-Primary_C B_NEC_MMA_SHA_2C-Secondary_N B_NEC_MMA_SHA_2C-Secondary_E B_NEC_MMA_SHA_2C-Secondary_C QDLat QDLon
2016-01-01 00:28:00 A 0.197610 -102.681841 6828390.60 24630.6312 24795.742996 127.134287 33.517576 23432.3208 2928.6274 ... 2930.268832 7004.101076 -125.788905 -5.302191 17.668192 -32.503156 -1.386212 -8.065428 8.425159 -30.812750
2016-01-01 00:28:10 A 0.838158 -102.693946 6828288.87 24808.1073 24971.853392 127.197315 33.718966 23451.1801 2899.5340 ... 2901.392634 7557.367509 -125.601670 -5.252627 19.385239 -32.494419 -1.356648 -8.901737 9.068963 -30.902256
2016-01-01 00:28:20 A 1.478724 -102.706041 6828186.32 24993.3709 25155.643730 127.257710 33.932493 23465.8121 2871.8025 ... 2873.120790 8112.480942 -125.388360 -5.203015 21.100059 -32.477778 -1.327371 -9.738896 9.713511 -30.990784
2016-01-01 00:28:30 A 2.119308 -102.718118 6828082.97 25186.2207 25346.943757 127.315462 34.157878 23476.3596 2844.1066 ... 2845.447078 8669.299689 -125.149003 -5.153358 22.812207 -32.453182 -1.298381 -10.576666 10.358768 -31.078363
2016-01-01 00:28:40 A 2.759911 -102.730169 6827978.83 25386.4755 25545.573072 127.370566 34.394821 23482.5073 2817.0387 ... 2818.366346 9227.677018 -124.883637 -5.103661 24.521235 -32.420577 -1.269674 -11.414808 11.004701 -31.165012

5 rows × 22 columns

Extract numpy arrays

numpy arrays are accessible as properties of the dataframe:

[5]:
df['B_NEC_N'].values
[5]:
array([23432.3208, 23451.1801, 23465.8121, ..., 19454.4269, 19271.3803,
       19084.199 ])
[6]:
df[['B_NEC_N', 'B_NEC_E', 'B_NEC_C']].values
[6]:
array([[23432.3208,  2928.6274,  7000.6948],
       [23451.1801,  2899.534 ,  7554.4035],
       [23465.8121,  2871.8025,  8109.7535],
       ...,
       [19454.4269,   824.8021, 32624.9088],
       [19271.3803,   808.9134, 33094.2485],
       [19084.199 ,   792.9848, 33559.0953]])

Pandas dataframes are not so good when we have higher-dimensional data, hence the motivation to use xarray: http://xarray.pydata.org/en/stable/faq.html#why-is-pandas-not-enough

Convert to an xarray Dataset

xarray extends the power of pandas to N-dimensional data but is more complex to work with.

[7]:
ds = data.as_xarray()
ds
[7]:
<xarray.Dataset>
Dimensions:                     (Timestamp: 4256, dim: 3)
Coordinates:
  * Timestamp                   (Timestamp) datetime64[ns] 2016-01-01T00:28:00 ... 2016-01-01T23:59:50
Dimensions without coordinates: dim
Data variables:
    Spacecraft                  (Timestamp) <U1 'A' 'A' 'A' 'A' ... 'A' 'A' 'A'
    Latitude                    (Timestamp) float64 0.1976 0.8382 ... 30.46 31.1
    Longitude                   (Timestamp) float64 -102.7 -102.7 ... -95.37
    Radius                      (Timestamp) float64 6.828e+06 ... 6.823e+06
    F                           (Timestamp) float64 2.463e+04 ... 3.861e+04
    F_MCO_SHA_2C                (Timestamp) float64 2.48e+04 ... 3.861e+04
    F_MMA_SHA_2C-Primary        (Timestamp) float64 127.1 127.2 ... 43.23 43.24
    F_MMA_SHA_2C-Secondary      (Timestamp) float64 33.52 33.72 ... 4.576 4.604
    B_NEC                       (Timestamp, dim) float64 2.343e+04 ... 3.356e+04
    B_NEC_MCO_SHA_2C            (Timestamp, dim) float64 2.36e+04 ... 3.354e+04
    B_NEC_MMA_SHA_2C-Primary    (Timestamp, dim) float64 -125.8 -5.302 ... 28.31
    B_NEC_MMA_SHA_2C-Secondary  (Timestamp, dim) float64 -32.5 -1.386 ... -4.19
    QDLat                       (Timestamp) float64 8.425 9.069 ... 40.18 40.81
    QDLon                       (Timestamp) float64 -30.81 -30.9 ... -24.81
Attributes:
    Sources:         ['SW_OPER_MAGA_LR_1B_20160101T000000_20160101T235959_050...
    MagneticModels:  ['MCO_SHA_2C = MCO_SHA_2C(max_degree=18,min_degree=1)', ...
    RangeFilters:    ['Latitude:0,90']

ds now contains an xarray Dataset which stores all the data variables with an associated “coordinate” of Timestamp. The dataset itself comprises DataArray objects:

[8]:
ds["B_NEC"]
[8]:
<xarray.DataArray 'B_NEC' (Timestamp: 4256, dim: 3)>
array([[23432.3208,  2928.6274,  7000.6948],
       [23451.1801,  2899.534 ,  7554.4035],
       [23465.8121,  2871.8025,  8109.7535],
       ...,
       [19454.4269,   824.8021, 32624.9088],
       [19271.3803,   808.9134, 33094.2485],
       [19084.199 ,   792.9848, 33559.0953]])
Coordinates:
  * Timestamp  (Timestamp) datetime64[ns] 2016-01-01T00:28:00 ... 2016-01-01T23:59:50
Dimensions without coordinates: dim

To extract numpy arrays:

[9]:
ds["B_NEC"].values
[9]:
array([[23432.3208,  2928.6274,  7000.6948],
       [23451.1801,  2899.534 ,  7554.4035],
       [23465.8121,  2871.8025,  8109.7535],
       ...,
       [19454.4269,   824.8021, 32624.9088],
       [19271.3803,   808.9134, 33094.2485],
       [19084.199 ,   792.9848, 33559.0953]])
[10]:
X,Y,Z = (ds["B_NEC"][:,i].values for i in (0,1,2))
X,Y,Z
[10]:
(array([23432.3208, 23451.1801, 23465.8121, ..., 19454.4269, 19271.3803,
        19084.199 ]),
 array([2928.6274, 2899.534 , 2871.8025, ...,  824.8021,  808.9134,
         792.9848]),
 array([ 7000.6948,  7554.4035,  8109.7535, ..., 32624.9088, 33094.2485,
        33559.0953]))

Work on xarray objects directly

Calculate the custom residual B_{res} = B_{obs} - B_{MCO} - B_{MMA} and plot the Z component against time. NB: It is possible to also calculate this residual on the server directly.

[11]:
B_res = ds["B_NEC"] - ds["B_NEC_MCO_SHA_2C"]\
                    - ds["B_NEC_MMA_SHA_2C-Primary"]\
                    - ds["B_NEC_MMA_SHA_2C-Secondary"]
B_res
[11]:
<xarray.DataArray (Timestamp: 4256, dim: 3)>
array([[-14.152832,   5.046971, -13.00904 ],
       [-14.05082 ,   4.750641, -13.44751 ],
       [-13.987496,   5.212095, -14.088606],
       ...,
       [ -0.065038,  -2.56282 ,  -3.695835],
       [  0.583051,  -3.247   ,  -4.246418],
       [  1.325767,  -4.054801,  -4.315355]])
Coordinates:
  * Timestamp  (Timestamp) datetime64[ns] 2016-01-01T00:28:00 ... 2016-01-01T23:59:50
Dimensions without coordinates: dim
[ ]:
%matplotlib inline
B_res[:,2].plot(x='Timestamp');