Earth Science

Mesh independent flow direction modeling using HexWatershed 3.0

2024-05-15T00:00:00+00:00

After attending the CSDMS annual meeting in 2022, I have continued collaborating with the CSDMS community. Special thanks to all the people who provided feedback on our HexWatershed model. This year, the annual meeting organization contacted me again to ask whether I would be interested in attending the meeting and giving another tutorial. Lots of progress has been made over the past two years, coming from both DOE’s ICoM and InteRFACE projects. HexWatershed also received lots of updates and improvements, including the model structure and I/O/ So I said yes and started to organize the new materials based on our recent work.

In this year’s tutorial, after several round of discussion with the team, I adopted a new workflow to break the barrier between us as model developers and users. That is the web browser-based jupyter notebook. https://github.com/changliao1025/hexwatershed_tutorial

There are several significant designs in this workflow:

use mybinder (https://mybinder.org/) to host the entire environment, so users don’t need to install anything to run all code
separate Python and C++ code: this is not an ideal solution, but it works best currently for me as a developer to isolate different environments
Adopt the latest Python package structure in PyEarth, PyFlowline, and HexWatershed; each package is maintained in a different Python package, making it much easier for CI/CD.

Besides, in this workshop, I used a different mesh type, the DGGRID mesh, which is very popular in the GIS DGGS field but not well recognized in the geophysics field.

My collaborator Matthew G Cooper provided lot of help in this workshop as he helped with the testing and design, especially from a user’s perspetive. For example, we changed the model configuration module. I also developed a new model input checking machnism inspired by his confusion about the model domain boundary input. (remember the linux bash profile/bashrc nightmare?)

We had a relatively minor group of attendees but still received lots of questions and suggestions, which are gradually integrated back into our model right now.

What prevent us from learning new programming skills

2023-11-11T00:00:00+00:00

A couple of days ago, I ran into a Twitter post: https://x.com/VincentS/status/1722674693616910450?s=20

Although I recently turned to Bluesky for social networking (https://bsky.app/profile/changliao.bsky.social),

If you can open the Twitter post with the figure, you might wonder what the intention of this post is. There are a few comments on this post as well. There are some quick takeaways based on my understanding. First, some developers don’t actually write code but might provide suggestions to other developers. Some developers take the time to actually write the code. One comment also pointed out that some GitHub activities are also fake because they are not actually programming activities but instead spell checks. This reminds me that some people also buy GitHub stars for some purposes.

I am from an academic background, so this reflects some reality. Most senior modeling researchers do not code or don’t even have a GitHub account, yet they still claim they are modelers. In contrast, a PhD/postdoc or early career scientist may still participate in programming activities. Thus, their GitHub profile may resemble the figure’s lower part.

So, what prevents senior modelers from coding? My experience and observation provide me with several explanations:

Senior modelers don’t have time for programming. Some are busy with proposals and team building and often have to lend heavy lifting to early careers. Especially if you consider the technology of GitHub is relatively new, many modelers came to fame before GitHub was born.
Senior modelers actually need to gain coding skills. This is also possible because many modelers are more mathematical-based or equation-based and need more experience in computer programming. I have also seen peers use Excel for modeling, which differs from the standard practice.
Senior modelers stopped learning new skills. This might be a deeper problem that most of us ignore.

I will skip reasons 1 and 2 because reason 3 feels more personal. My personal experience is that I received most of my programming training during my undergraduate years, from 2005 to 2009. Most of my C/C++ knowledge was taught in classes. I also had some courses taught in MATLAB for image processing for remote sensing datasets. I self-taught IDL (to replace MATLAB) and C# during my master’s program between 2009 and 2012. I then re-picked up C++ during my PhD 2012-2017. After my Ph.D., I self-taught Python to replace IDL.

I have used Python and C++ daily, but I still feel I need to improve my programming skills. Why? Because I still need to catch up with the latest C++ and Python features. For an Earth scientist, once you find a solution to do a task, you are very often likely to stick with that solution for a long time. This is what we call a habit. I have seen peers use MATLAB and NCL and refuse to switch to Python even though they know NCL will not be supported.

As scientists, we must focus on the science, not the process or the solution.

On the other hand, our advances in high-performance computing (HPC) often shield our limits in programming skills. I have also seen peers write inferior performance code and run it on HPC. No one will question the code if running on HPC takes a short time. If a code takes a lot of time to run on HPC, most modelers will consider this a computationally expensive code. Most of us will not question whether it is because the code was poorly written. That is also why we need FAIR, so peers can help each other to improve the code.

Most organizations need a mechanism to train scientists to become better modelers. And it ultimately depends on personal career development. Since academia often only rewards publications, only some scientists will invest time in programming. To stay in the game, they will instead use more expensive computers (more considerable project funding) or hire early careers to compensate for the computational demands. Once an early career becomes senior and accesses more resources, they will do the same.

Issue in land river coupling in E3SM

2023-03-31T00:00:00+00:00

Recently when I was testing some land river coupling in E3SM, I found some longstanding issues.

When the coupler needs to send fluxes or states from one to another, to conserve mass, the process sometimes needs to consider the area associated with it.

For example, if the flux is runoff, which is expressed as mm/day, then the coupler calculates the mass as: flux X area. However, within a grid cell, the area is partially covered by land, so area is calculated as: dArea_grid X dFraction_land.

However, in the earlier development, this fraction of land is often set as 1.0 before lake and river are small at 1.0 degree resolution (~100km). This decision will make the area of river as 0.0.

The problem comes when we want to transfer flux from river back to land.

Again, design decision in the earlier stage can cause problem in the later stage. Another example of technical debt.

How to couple land and river model using a MPAS mesh

2023-03-24T00:00:00+00:00

The E3SM river component MOSART can be run using a MPAS mesh. However, the MOSART requires forcing data such as surface runoff from a land model.

If the land model is not turned on, we can still run the MOSART with external forcing data, which is often not using the MPAS mesh.

This article explain how to run a coupled lnd-rof-(atm) simulation with the rof on the MPAS mesh.

We need to carry out several steps, but not necessarily in the following order:

Generate the MPAS mesh-based MOSART parameters, and generate the domain file;
Generate the envolope lnd domain using the MOSART domain file, use this domain file as the atmosphere domain as well (somehow datm is still needed for some bad reason)
Generate the mapping between these two domain files
Since now land and river are not on the same grid, we need to create a new compset/grid to reflect this
Update the coupler so the dlnd variable can be accepted. In general, the dlnd will use the mapping file convert stream files, then they are passed to coupler through l2x
Update the coupler so the rof can accept the incoming variable through x2r

A review on remap

2023-03-09T00:00:00+00:00

https://acme-climate.atlassian.net/wiki/spaces/DOC/pages/872579110/Running+E3SM+on+New+Grids

https://acme-climate.atlassian.net/wiki/spaces/DOC/pages/1043235115/Special+Considerations+for+FV+Physics+Grids

https://acme-climate.atlassian.net/wiki/spaces/DOC/pages/178848194/Transition+to+TempestRemap+for+Atmosphere+grids

https://acme-climate.atlassian.net/wiki/spaces/DOC/pages/84541856/Creating+mapping+and+domain+files

https://github.com/ClimateGlobalChange/tempestremap

Where should hydrology go

2023-03-02T00:00:00+00:00

This post is a reflection following up a recent article. https://blogs.egu.eu/divisions/hs/2023/03/01/where-should-hydrology-go/

From a computational hydrologist’s perspective, one limitation in hydrology is how to connect the water cycle with both natural and anthropogenic processes in the Earth system model.

It is generally easy to focus on one process or term, such as runoff or ET. However, it is challenging to link ET with runoff in different landscapes.

In the Earth system model framework, we need to consider all the water cycle processes. For example, how does water flows from land to river, then to lake or ocean? And how does ET come from land or lake into the atmosphere?

The first challenge in ESM is how to represent land, river, and lake appropriately so that they can communicate. For example, the Antarctic and Greenland are considered masses of glaciers, but many other hydrologic processes on them are ignored.

The second challenge is how to consider the vegetation and animal feedback with the water cycle. This is also important for the carbon cycle.

The last challenge is how to consider the human factor, including agriculture, and dam operation.

There is also a dependency relationship between these challenges. For example, without improving the representation of the natural system, there will be large uncertainty in the human factor.

In ESM, we need to consider all the above three challenges all together to have a better understanding of the water cycle.

Thoughts on research software

2023-02-28T00:00:00+00:00

This post is a reflection following up a recent article. https://www.nature.com/articles/s41559-023-02008-w

In my experience in Earth science, research software development is always under-appreciated. Software development has never received enough credits and needless to say publish in a high impact journal.

Looking back in 2015-2017 when I started to use Github, there were barriers that prevents me from practicing the open source better. I had several papers that I didn’t share all the code and data then.

But now I have contributed multiple open source projects. With platforms like Github, Zenoto, Overleaf, sharing resources are becoming easier and easier.

But at the same time, I can still see lots of papers are not making the data and code publicly available, especially when the conclusions drawn are also questionable.

My current practices to promote open science:

Only read and recommend papers that share both data and code;
Only cite papers that share both data and code;

Like the old saying, talk is cheap, show me the code, we cannot trust research that cannot be reproduced.

Personally, I think if you can’t even share your work with your family with excitements, how can you convince yourself the meaningness of research?

The domain file in ESM

2023-02-08T00:00:00+00:00

In order to set up a MPAS mesh-based MOSART/ELM simulation, I need to prepare a domain file.

After some effort, I was not able to find any documentation describing the so-called domain file.

However, I found quite some documentation on how to generate this domain file, such as: https://www2.cesm.ucar.edu/models/cesm1.2/clm/models/lnd/clm/doc/UsersGuide/x11812.html

Without looking at the official documentation, the only way to understand the structure of the domain file is through existing files and possibly code.

In general, the domain file stores the information of mesh cells, including cell center, vertices, and area.

The cell center is either a 1D (unstructured) or 2D (structured) array.

As a result, the vertices can be a 2D or 3D (structured) array. In practice, the vertices array often uses the (nj, ni, nv) structures to store the data.

For unstructured mesh, we can set nj or ni as 1.

Different types of domain files

ELM surface data

gen_domain to create a domain file for datm from a mapping file. The domain file is then used by BOTH DATM AND CLM to define the grid and land-mask.

Stream file

Differences between global and local domain files

ATM_DOMAIN_FILE and ATM_DOMAIN_PATH

    
      char
      UNSET
      run_domain
      env_run.xml
      atm domain file
    
    
    
      char
      $DIN_LOC_ROOT/share/domains
      run_domain
      env_run.xml
      path of atm domain file
    

Leap year and technical debt

2023-02-02T00:00:00+00:00

In my work, I need to convert an E3SM model output into a different format. The output file is in the netCDF and I found some interesting design issue in the model.

The model runs at some time step but the output can be in a different time step. For example, the model can run at 3-hour time step but the output may be daily, monthly.

These are controlled by several namelist variable. However, The model cannot handle the different number of day in different months, which is also relevant to the leap year.

As a result, some of the output time series has 360 days (12 * 30), some 365 days, and some 366 days. In my opinion, this is a typical technical debt which I learned recently.

This type of design makes the postprocessing and exchange with other workflow extremely difficult. For example, the time variable within the netcdf is used to index the time series. And this variable may start from 0 or 1, and the length is also variable.

Ideally, we should use the exact number of days throughout the whole model so they are consistent in all processes.

Now with this issue, lot of guessing efforts are needed because the output is simply unusable.

Reference: https://en.wikipedia.org/wiki/Technical_debt

Mesh independent vs Topological relationship

2023-01-23T00:00:00+00:00

PyFlowline is mesh independent, and it uses topological relationship to model river networks. But what are the relationships between these two features? This is also the question I asked myself when presenting the model to the team members.

For example, one may ask “Which feature is more important?” or “Can I turn off the topological relationship feature?”

To understand their relationships, we also need to consider HexWatershed.

From one side, without topological relationship, river networks become a binary mask. And that means we cannot produce conceptual river network using PyFlowline anymore. However, HexWatershed is still able to produce it after watershed delineation. From this perspective, topological relationship must be on for PyFlowline, but not for HexWatershed.

Then what makes the model mesh independent? Both models were designed in a way that it does not rely on 2D index, which also means some traditional methods can be extended to mesh independent if the 2D index structure assumption can be abandoned.

Feature	What if without	Relation
Mesh independent	Cannot couple river and other hydrologic features	Does not concern topological relationship, but it helps capture details
Topological relationship	Cannot assist stream burning	Supports unstructured mesh by default

For PyFlowline alone, topological relationship may be more important because it is how the model capture the river network. However, with the mesh independent, it is possible to use refined mesh near river to capture river features. To this extend, mesh independent enhances the model.

For HexWatershed, as long as the river networks are available, the topological relationship only improve river bed slope. Thus the mesh independent may be more important.