PaleoHack 4: Bring Your Own Project

The LinkedEarth leadership team (D. Khider, J. Emile-Geay and N. McKay) invite you to a new edition of PaleoHack, to build capacity in Python-based data science in the paleoclimate community.

Whereas previous PaleoHack editions walked participants through pre-made tutorials, this “Bring Your Own Problem” edition is aimed at early-career researchers with a science problem to solve. The workshop will feature a mix of lectures, team-building activities, solo and group coding time, toaccelerate your research progress towards a particular science goal (ideally, one or more publications). A particular interest of ours is to foster collaboration between paleoclimate modelers and observationalists, using the LiPD ecosystem and the LinkedEarth Research Hub.

Requirements for participation:

● Intermediate scientific Python proficiency (working knowledge of numpy, pandas andmatplotlib a must; xarray as needed), either through participation in a previous PaleoHackedition, independent training certificate, or submitted code example.
● An interesting and manageable science problem, i.e. something on which reasonable progress might be made over 4 days. Examples include science questions heretofore limitedby the access to data or the ability to analyze them quantitatively.
● Desire to collaborate with other ECRs (graduate students past their qualifying exam process,all the way to junior faculty/staff).

We anticipate hosting 10-15 participants in sunny Marina Del Rey, CA, at the University of Southern California’s Information Sciences Institute. Travel, lodging, and food costs for participants at US-based institutions will be partially or fully funded by the National Science Foundation (grant AGS 2002518), with priority given to early-career researchers, women, and underrepresented minorities. International applicants will be considered, but cannot be financially supported by NSF.

If interested, head to PaleoHack 4, Bring Your Own Project | PaleoHack and register by March 12th, 2023.

Hi Julien,

I’m interested in applying, but curious if the lectures/exercises will be focused mainly on using Pyleoclim to answer specific questions or the scientific Python stack more broadly. If the former, is Pyleoclim still better suited to things like data-model comparison, or can it handle paleo-DA and/or machine learning processes like SOM? I will certainly have a lot of data-model comparison to do, but I’m really interested in the other stuff, and want to tailor my application appropriately.

Good to know of your interest @AndreaLemur .
No exercises here, and very minimal lectures, more like tutorials to give people pointers of various resources they may not be aware of. Rather, the exercise is getting your research done, so you bring your own project and we help you figure out solutions to it in the Python software ecosystem. Pyleoclim may or may not be a piece of it, though we like to think that wherever timeseries are involved, Pyleoclim can be useful. What sorts of science questions do you want to investigate, and what are the technical hurdles right now?

Thanks for the clarification @jeg .

I am broadly interested in investigating the tropical hydroclimate response to the 8.2ka Event, within age and paleodata uncertainties, by comparing my synthesis of 76 published proxy datasets with output from a new iCESM1.2 hosing simulation. More specifically: (i) Where and when did hydroclimate anomalies occur in the tropics in response to the 8.2ka Event? (ii) Were these anomalies synchronous events across a regionally homogeneous hemispheric dipole, or was the hydroclimate response more complex? (iii) How does the hydroclimate variability in poorly-constrained regions like the Caribbean/Mesoamerica and the Maritime Continent compare to that of other regions with well-defined monsoon regimes like South America and East Asia?

I’m already pretty well acquainted with the LiPDverse, since I use geoChronR and actR regularly, but I would like to rope Pyleoclim (or other tools) into my data-model comparison workflow. In that regard, the technical hurdles I am facing are all related to analyzing large amounts of data in a rigorous way, and not necessarily knowing what tools are best suited to the job. I think that a cloud-based approach to analysis would help make quick(er) work of that, and I’d love the opportunity to collaborate with other ECRs on this and similar projects.

That sounds perfectly in theme! Indeed, we’ll show you ways of accessing cloud-hosted GCM output, as long as it is publicly-archived. Will this be the case for this CESM1.2 hosing simulation?

By June, the Ammonyte package might also be added to the pile of tools for detecting abrupt changes in proxy records, and its author @alexkjames is one of the workshop instructors.

Great! I’ll put an app together.

I plan to store the iCESM output in my NCAR Campaign Storage allotment and/or my lab’s HPC cluster. I’m not sure if that meets the “publicly archived” requirement, however, so I’ll look into alternatives if necessary.

Ammonyte definitely looks promising! I look forward to learning more about its development.


Deadline has been extended to March 19th!