Paleodata-model comparison with Pyleoclim?

Hi LinkedEarth team,

Unfortunately, I won’t be able to attend PaleoHack 4 due to a scheduling conflict, but I’m hoping to get some help with the data-model comparison portion of my project using Pyleoclim. I thought this would be the best place to ask my questions, and I’m still open to collaboration if anyone is interested!

At the moment, my main issue is figuring out my workflow since the language of Series/MultipleSeries/EnsembleSeries in Pyleoclim is a little unfamiliar to me. Apologies in advance for the infodump, but I want to provide as much detail as possible to get the most relevant help.

To give you some context about my project:

  1. I have 36 real speleothem δ18O datasets, each accompanied by Bchron age ensembles, in LiPD format. My goal is to compare them with 72 speleothem δ18O pseudoproxy time series obtained from the 36 nearest co-located model grid sites over two 100-year-long iCESM experiments: a control simulation and a freshwater “hosing” simulation.
  2. The pseudoproxies were generated using the speleothem PSM in PRYSM 1.0. For each pseudoproxy site, I created two sets of time series based on the ‘Adv-Disp’ and ‘Well-Mixed’ aquifer recharge models from the PSM. In total, this resulted in four pseudoproxy time series per site: two from the control simulation with each recharge model and two from the “hosing” simulation with each recharge model.
  3. I introduced Gaussian noise to each pseudoproxy time series by generating 1000 realizations of noisy annual data for each 100-year simulation to emulate climate-related noise. So, for each time series, I now have a list of 100 numpy arrays with 1000 values each.
  4. To account for uncertainties stemming from aquifer recharge model choice, I created a combined time series for each site by merging the median ‘Adv-Disp’ and ‘Well-Mixed’ time series with their respective noisy data.

I could use some guidance and assistance with the following:

  1. EnsembleSeries from pseudoproxy “ensemble” arrays?:
    I know how to create a basic Series in Pyleoclim, but I’m not sure how to create an EnsembleSeries, which might be useful for handling my various pseudoproxy datasets. Can you help me understand how to create an EnsembleSeries, please?
  2. Handling LiPD age ensemble data efficiently?:
    Currently, loading and working with age ensembles in Pyleoclim is a very time- and memory-intensive process, given the large amount of data I have. Do you have any tips or techniques to handle LiPD age ensembles more efficiently? I’d love to know any specific approaches that can speed things up.
  3. Mapping a chronEnsemble to a pseudoproxy “ensemble”?:
    Does Pyleoclim have a feature to map a chronEnsemble to pseudoproxy data? This would be really helpful for comparing my real speleothem datasets to my model simulations. If it’s possible, I’d appreciate some guidance on how to do it effectively.

I would really appreciate any help, tips, or insights you can offer on these issues, and I’m open to suggestions or collaborations that can help me improve my project. Thank you in advance for any assistance you can offer!

Hi @AndreaLemur , yes, you came to the right place!

Kudos for managing to use the original PRYSM code! I believe you’ll find PRYSM-API to be a bit of an improvement: https://zenodo.org/record/3257964#.ZFsWaC-B1Ms.

Re: your questions

  1. please look at this tutorial.. Basically: you write a loop which appends each of your series to a list, then enclose the list into a pyleo.EnsembleSeries object. This unlocks the door to ensemble methods, which are still relatively primitive, so we welcome requests!

  2. glad you asked! @alexkjames just made a new PyleoTutorial about this today, leveraging the brand-new (and still actively developing) pyLipd package. It should be available in HTML form by tomorrow. I believe pyLipd makes working with ensembles a lot easier than the old LiPD utilities. If you need help, I’m sure Alex would be willing to jump on Zoom and help you out.

  3. I believe the answer is to cast them both as EnsembleSeries ; hopefully, the new tutorial posted above provides a map to doing that, but we can try and grease the wheels a little.

It’s a shame you can’t make it to PaleoHack 4, as this is exactly the sort of thing we’ll be doing there!

Julien

Hi @jeg, thanks for the info! It’s been helpful.

I ran into a hurdle just now trying to compare my model output to a proxy time series: their very different time scales. Each of my two simulations (control and hosing) is 100 model years long, but my proxy data sets are 1000s of years long. Obviously, that raised this error:

ValueError: At least one series has no common time interval with others. Please check the time axis of the series.

To solve this issue, I tried to bin the proxy time series into 100-year windows (to do a rolling comparison over) like so:

speleo_ts = speleo_list[0] # list of LipdSeries; e.g., 'ABC-1.Anjohibe.Duan.2021' = speleo_list[0]
speleo_ts_binned = speleo_ts.bin(x=speleo_ts.time,y=speleo_ts.value,bin_size=100)

But that caused this error:

Traceback (most recent call last):

  Cell In[325], line 1
    speleo_ts_binned = speleo_ts.bin(x=speleo_ts.time,y=speleo_ts.value,bin_size=100)

  File ~\miniconda3\envs\spyder-env\Lib\site-packages\pyleoclim\core\series.py:4065 in bin
    res_dict = tsutils.bin(self.time,self.value,**kwargs)

TypeError: bin() got multiple values for argument 'x'

‘speleo_ts.time’ is a 1-D array of 650 dates, which seems to be what the function is asking for, unless I’m badly misunderstanding it.

I really wish I could attend PaleoHack4 to ask these questions as they come up, but Graduate School graduation deadlines wait for no one. :slightly_smiling_face:

@AndreaLemur good to know you are making progress!
We’re now straying a bit from the purpose of this forum, which is not debugging. However, I think the issue here is that you are confusing the utils-level bin function and the API-level function. speleo_ts_binned = speleo_ts.bin(bin_size=100) should do the trick, as long as the units are indeed in years. If not, convert_time_unit() will be your friend. Don’t forget to check out our newly updated documentation! https://pyleoclim-util.readthedocs.io.

Good luck with your upcoming graduation deadlines, and see you on the other side, hopefully!