# 2023

## March

• Where should hydrology go

This post is a reflection following up a recent article. https://blogs.egu.eu/divisions/hs/2023/03/01/where-should-hydrology-go/

## February

• Thoughts on research software

This post is a reflection following up a recent article. https://www.nature.com/articles/s41559-023-02008-w

• The domain file in ESM

In order to set up a MPAS mesh-based MOSART/ELM simulation, I need to prepare a domain file.

• Leap year and technical debt

In my work, I need to convert an E3SM model output into a different format. The output file is in the netCDF and I found some interesting design issue in the model.

# 2022

## July

• Break the model

Recently I have broken a few models I wrote previously into pieces, which gave me lots of thoughts on how I, as a modeler, work to improve our modeling skills.

## January

• Calculating the area in a 3D space

In the HexWatershed model, we invented a new metric to evaluate the closeness of the modeled river network with real ones. This concept involves the calculation of ares from two sets of vector-based flowlines. The calculation, however, is not trival.

• A review of HexWatershed and its future application

Almost 4 years ago, I decided to write a model that can be used for the 3D land surface model routing component. At that time, they were already plenty of global routing models, except that all of them are for coarse spatial resolution. I need to create a model that is for high spatial resolution. However, this is an impossible task back then because I cannot run existing tools/models directly. The reason is that (1) existing regional scale model cannot be run at global scale; (2) global scale data usually has a different spatial reference. All of these lead me to develope the unstructured mesh-based flow direction model.

# 2021

## April

• HexWatershed Python package

After recent discussion, it is clear that a Python package should be developed to make HexWatershed available to boarder audience. This decision means that there is much ongoing effort to re-structure the data flow and related algorithms behind HexWatershed. Below I will introduce some steps/efforts we made or will make in the next phrase of development.

• Reducing the dependency of HexWatershed

When HexWatershed was developed in earlier days, I gradually added support for stream burning and a few other features.

# 2020

## August

• Stream burning on hexagonal mesh

This is a follow up study of HexWatershed. In the latest development, we decided to add the so called “stream burning” capability into the HexWatershed model.

## July

• Streaming burning on MPAS mesh

To generate a stream network that is realistic with the NHD flow line, we need to recondition the DEM, or the so-called “stream burning” into the DEM.

## April

• Depression filling on a sphere

One of my interests and projects is about flow routing on a global scale or a sphere. This work has several major challenges awaiting to be resolved. Here I will explain a few of them.

## January

• The life span of a project

Recently I have been working on a project that I need to prepare some maps. I had to go back to use some code I wrote almost 5 years ago. Then I realized that this happened to me many times.

• A workflow for distributed parallel data analysis on HPC with checkpoint

A typical task we do nowadays is to submit a job to the cluster to run some data analysis. But there are some limitations we can do as I know, to some extend.

• A review on litter decomposition modeling

Litter decomposition involves series of processes.

• A review on dissolved organic carbon modeling

Researchgate and Mendeley recommend some nice papers to me based on my own publications. Here are some quick review on the DOC modeling I read today.

• Paper discussion Streamflow in the Columbia River Basin

Streamflow in the Columbia River Basin: Quantifying Changes Over the Period 1951‐2008 and Determining the Drivers of Those Changes (https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/2018WR024256)

• The impact of ice content on hydraulic conductivity

In most aquifer test, the hydraulic conductivity is a function of the material, but it does not explicitly consider the impact of ice content on K. When soil or any material is partially or fully frozen, its actual K will decrease significantly. Here I want to explore how to model this impact in soil hydrology model. Special attention will be paid to the different impacts on vertical and horizontal K, respectively.

# 2019

## November

• Learning in a diverse environment

Asking for help sometimes can save your life. But how, where, when ask for help is always a challenge.

• Actions that might help fighting climate change

As an Earth science researcher, I talk a lot about climate change. But most of time I was basically sending message stating that climate change is real and bad. I didn’t spend much time thinking and explaining what should we do to battle climate change.

## September

• Lessons I have learnt during E3SM development

I have been involved with the E3SM development since I joined PNNL as a postdoc. Over the course of time, I have learnt a lot from the E3SM model. I also found many issues within the model, which reflects lots of similar struggles in the lifespan of software engineering.

• Rebuild the topology for billions of points on a sphere

One of my projects requires an index system for a DGGS alike system with topology information. Unlike a 2D plane coordinate system which can be easily managed by a 2D matrix, plane coordinate system does work well on a sphere. One biggest issue is that we cannot locate the neighbors of each point.

• Publication graphic generation workflow

Preparing graphics for journal publication is an important step before we submit the manuscript. And sometimes this process is not as smooth as we expected. There are several problems we often encounter:

• A revisit of spatial discretization

Discretization by definition from Wikipedia: In applied mathematics, discretization is the process of transferring continuous functions, models, variables, and equations into discrete counterparts. This process is usually carried out as a first step toward making them suitable for numerical evaluation and implementation on digital computers. Now we add “spatial” to the term. The first intuitive definition is the discretization of functions in the spatial domain. There are two elements in this description: functions and spatial domain. For functions, we often refer to integral or ODEs/PDEs in numerical simulations. If these functions involve with gradient information, then they depend on spatial domain, which is how gradient is calculated. For spatial domain, we often refer to mesh or grid. And mesh can generally be classified into structured and unstructured grid. In practice, we have spent great effects on both aspects of the spatial discretization: mesh and corresponding function solvers. In Earth science, lots of functions involve with gradient. For example, gas exchange depends on pressure gradient or concentration gradient. As a result, Our spatial discretization is always associated with mesh grid. Many studies have developed methods to solve equations using the finite difference method (FDM), the finite volume method (FVM) and the finite element method (FEM). However, many of them have not considered the effects of mesh grid on these solvers. For example, most models use array to represent spatial domain. In this case, water flow at any location can only flow to 4 directions. However, in reality, water can flow to any direction because Earth has no discretization. The problem is how can we represent a direction if it is 30 degree? We need a better way to represent the spatial domain decomposition. We can improve the resolution. We can also use unstructured grid such as TIN. Ideally, we need a mesh that matches with reality. In my opinion, adaptive hexagon grid mesh is closest.

## February

• Tips for ArcSWAT issue

I have to use ArcSWAT to prepare some SWAT model simulation inputs. And the experience wasn’t exactly good (maybe we need a different tool to do this the right way).

# 2018

## October

• The current state of ocean water storage

In the hydrological cycle, ocean water has the largest storage without doubt. However, our estimates of the ocean water storage have great uncertainty. I have done some preliminary research into this field and I’d like to share my understanding of the current state of these estimates. I also post a related question in the ReseachGate website (Link).
I will start with a few publications and reports. The first one is the USGS website: https://water.usgs.gov/edu/earthwherewater.html, which states the ocean storage to be around 96.5% of the global storage. The numbers are from “World fresh water resources” by Peter H Gleick 1993.

## May

• Hole and Boundary

We sometimes do not understand well enough of what we are dealing with in our daily life. I was lucky enough to attend a seminar by Achille C. Varzi a few years ago, which inspired me on this topic: looking at hole and boundary from a different perspective.

## February

• Another thought on Earth System Model challenge

I recent had several discussions with several friends on various topics. Such as whether improving one component with an Earth System Model (ESM) necessarily improve the overall performance.

## January

• How to write a computational climate model with confidence

I have covered many aspects of climate model related topics so far and I have also discussed quite a few examples in model development. I think it is time to warp up a post to serve as a guidance for whose who may be interested in this field.

# 2017

## April

• High performance computing: data management

When you leave a position, such as graduation from university, you usually have to backup lots of data, which is usually considered as one critical step in data management.

• Ecosystem modeling: uncertainty quantification

I recently read a few news articles of skeptical attitudes towards climate change, and I decided to write something about it. I am not trying to promote climate change, but instead I want to point out there is great uncertainty in our current Earth system modeling.

• Ecosystem modeling: model evaluation of current implementation in ECO3D 1.0

As I am preparing my defense, the first version of ECOSYSTEM 1.0 in my thesis was finally completed. Looking forward to the next chapter of my life, I thought it is time to evaluate the ECOSYSTEM 1.0 to see what it is capable of and what still needs to be improved or expanded in the near future.

## January

• Scientific writing: high quality scalable vector graphics

I recently found a new way to prepare publication-ready high quality vector graphic using Google Drive. Here is the result:

• High Performance Computing: Download and prepare data in a batch mode

Over the time, I need to manipulate a lot of data on a Linux cluster. Some of these manipulations actually read/write data, whereas some are essentially file system operations, such as downloading the files. Here I present a list of similar operations suitable for HPC using pbs job approach whenever possible. I do not attempt to include all possible methods but only the ones that I find useful and easy to prepare in seconds. Download The most efficient way to download MODIS alike data using HPC. wget -r –no-parent -R “index.html” –retr-symlinks -A “.nc” ftp-url wget -r –no-parent -R “index.html” -A “MOD17A2.A2000.hdf” -A “MOD17A2.A2000.xml” http-url wget -r –no-parent -R “index.html” -A “MOD17.hdf” -A “MOD17.xml” http-url You can basically setup filter for file type, year and granule id. A live example: ///========================================================== #!/bin/bash
#PBS -l nodes=1:ppn=1
#PBS -l naccesspolicy=singleuser
#PBS -l walltime=40:00:00
#PBS -q standby
cd \$PBS_O_WORKDIR wget -r –no-parent -R “index.html” –retr-symlinks -A “.tar” ftp://somwhere ///==========================================================

# 2016

## October

• High Performance Computing: ParaFly

In my recent posts I shared some first hand experience of parallel computing using OpenMP. While OpenMP is supported by many programming languages, there are still a few does not. So here I am sharing another approach to create a parallel computing job.

• High Performance Computing: OpenMP and Anusplin

This is a live example from the project I am working right now. Good luck if you are following! In order to speed my Anusplin program, I decided to use OpenMP. Anusplin is a package for climate data preparation.

• Ecosystem modeling: how to deal with the time-variant datasets?

Ecosystem modeling usually requires a variety of data to run a complete simulation. These data usually include both time-variant and time-invariant datasets. For example, we usually consider Digital Elevation Model (DEM) as time-invariant data because surface topography is relatively stable for a given period of time unless extreme events such as earthquake occur.

## June

• Numerical simulation: unit system

Unit is always one of the most important factors in science. Most of the time, a number without unit is useless. For numerical simulation, a good design of unit system can be critical. Recently, I have written a short script/program to prepare climate data for my groundwater/surface water hydrology simulation. I downloaded some data from the National Climatic Data Center(NCDC) separately, and I did NOT notice the changes in unit system. In the end, I have to do everything over again.

• Groundwater hydrology modeling: Why it is so difficult to simulate the groundwater dynamics in mountain area

First, in mountain area, there is generally less groundwater observations, which makes it’s difficult to define the boundary condition. Without observations, the initial head can only be assumed or estimated using a steady state simulation. Second, groundwater recharge in mountain area is very low. This is because some mountain surface has no soil or a shallow soil layer. And the permeability of rock fractures is very low. As a result, infiltration from the surface/subsurface to groundwater systems is extremely low. Third, the hydraulic priorities of the fractured and unfractured rock material are also very low, which means the groundwater system is very sensitive. Beside, topography effects from mountain area usually require high spatial resolution grid cell in the numerical simulation.

• Ecosystem modeling: challenges in water-carbon cycling in cold region numerical simulation

Numerical simulation of water and carbon cycling in cold region usually encounter some unique challenges which must be taken into consideration in climate change study.

• Ecosystem modeling: a question of chicken or eggs?

Ecosystem modeling generally include three conceptual components: inputs, algorithm and outputs. For example, we usually need precipitation to estimate surface runoff. On the other hand, sometimes some outputs are also considered as inputs for other models. For example, vegetation dynamics also change the surface albedo and therefore the incoming radiation. In another word, the feedback among different processes are often too complex that we generally have to ignore some feedback processes.

## April

• Surface water hydrology modeling: scaling issue in Precipitation Runoff Modeling System(PRMS)

Grid-based HRU network for stream routing in PRMS could be convenient for several reasons. Apparently the ideal data structure as matrix could be one of them.But it is not always the case if the scaling issue is not well handled. Below is an example how the scale issue could be a potential problem for the simulation framework. First, I would like to introduce the CRT. Cascade Routing Tool (CRT) is a computer application for watershed models that include the coupled Groundwater and Surface-water FLOW model GSFLOW and the Precipitation-Runoff Modeling System (PRMS). More information could be found at (http://water.usgs.gov/ogw/CRT/) The example run result is listed as:

• Groundwater hydrology modeling: relation between steady state and transient simulation in MODFLOW simulation

Standard groundwater modeling usually uses the three-dimensional Richard’s equation to describe the groundwater flow. Solution to the Richard equation usually requires numerical methods such as finite-difference equation used in USGS MODFLOW. To solute the array of equations within MODFLOW, an initial head is required regardless of steady state (SS) or transient (TR) simulation. As the MODFLOW manual states that in a steady state simulation, the storage terms are ignored since storage of each grid cell is constant with time. One of the common problems to run the MODFLOW is that we don’t have the initial head data for the simulation. Therefore it is usually suggested that a SS simulation could be run at first to provide the head for the TR simulation. This is because SS is independent with initial head, but TR isn’t. However, whether placing the SS simulation ahead of TR simulations is doubtful in many practices! Here are the explanations: First, the SS flow equation should only be simulated when the system is in equilibrium state.By definition, the steady state flow equation means for each grid cell, the flow and storage don’t change with time. In most natural scenarios, these requirements are rarely met. However, the hydrologic system may be very close to equilibrium under certain circumstances. For instance, during winter time, the water flow is ignorable if most hydrologic processes are impaired due to the snow cover and frozen ground.

• Surface water hydrology modeling: deal with different types of precipitation

In surface water hydrology, precipitation is one of the most important components. However, within most hydrology models, it is unclear of how precipitation is actually represented. For example, in a typical water cycle illustration from Wiki, precipitation is described as

## March

• Scientific writing: how to prepare a good image figure

“A picture is worth a thousand words”, from Wikipedia, unveils the importance of figures in writing. There are many types of figures or pictures, and in the field of Earth Science, image figure is one of the most important types. By image figure, I mean a figure composed of matrix, such as https://www.ncdc.noaa.gov/sotc/service/global/map-blended-mntp/201601.gif Link of the image.

## February

• IDL programming: control variables in IDL across routines

IDL, unlike C++, has its own approach and principle when handling variables across routines. These features are often very different with the variable scope within C++. I will try to answer several key questions and prove them using simple demos. Question 1: Will the value of a variable be changed after it’s passed into a function and changed inside? Demo: