2024

    September

    • How to calculate the buffer zone on the global scale

      It is relatively simple to calulate the buffer zone at the regional scale because we can use a map projection. However, if we need to calculate the buffer zone at the global scale, there isn’t a way so far.

      The real challeage is that at the global scale, we cannot use a single map projection. In order for an algorithm to work on the global scale, it must accept either (1) latitude-longitude based system, or (2) 3D x-y-z coordinate system. Currently, there isn’t a model or tool that accept either of them.

      To resolve this issue, I built a customized GDAL from the source code and this model uses the latitude-longitude based system to calculate the proximity distance. To run the tool, the following steps are needed:

      • Clone the git repository from: git@github.com:changliao1025/gdal.git
      • Check out the changliao branch using VS Code or Git command git checkout changliao
      • Create a Python environment using conda If you are on a HPC, you first need to load the correct module such as: module load conda/Miniconda3-py311_23.11.0-2.lua You also need to load the cmake: module load cmake/3.24.3 Then you can create a conda environment using: “conda create –name gdal cxx-compiler gcc python zlib libarchive shapelib proj expat xerces-c numpy swig netcdf4 libpng ” This environment will meet most of the requirement for the build, but if you receive any error in later steps, you may need to add additional python packages during this step.
      • Activate this conda environment using conda activate gdal
      • Set up cmake configuration using the conda environment. This is critical because many gdal API will not work for system wide compilers export CC=~/.conda/envs/gdal/bin/gcc export CXX=~/.conda/envs/gdal/bin/g++ export CMAKE_PREFIX_PATH=~/.conda/envs/gdal export LD_LIBRARY_PATH=~/.conda/envs/gdal/lib:$LD_LIBRARY_PATH
      • Build the gdal following the instruction from: https://gdal.org/development/building_from_source.html
      • Change directory to the download git repository
      • mkdir build
      • cd build
      • cmake -DCMAKE_INSTALL_PREFIX=~/.conda/envs/gdal -DGDAL_PYTHON_INSTALL_PREFIX=~/.conda/envs/gdal -DBUILD_JAVA_BINDINGS:BOOL=OFF -DBUILD_SHARED_LIBS=ON -DBUILD_CSHARP_BINDINGS:BOOL=OFF .. > out.txt 2>&1
      • cmake –build .
      • cmake –build . –target install
      • Once you see this line: – Installing: /global/homes/l/liao313/.conda/envs/gdal/lib/pkgconfig/gdal.pc That means you have successfully built gdal within a conda environment, and you can call gdal python API as usual.

      This still does not fully solve the problem because the left and right edges are not connected.

    May

    • Mesh independent flow direction modeling using HexWatershed 3.0

      After attending the CSDMS annual meeting in 2022, I have continued collaborating with the CSDMS community. Special thanks to all the people who provided feedback on our HexWatershed model. This year, the annual meeting organization contacted me again to ask whether I would be interested in attending the meeting and giving another tutorial. Lots of progress has been made over the past two years, coming from both DOE’s ICoM and InteRFACE projects. HexWatershed also received lots of updates and improvements, including the model structure and I/O/ So I said yes and started to organize the new materials based on our recent work.

      In this year’s tutorial, after several round of discussion with the team, I adopted a new workflow to break the barrier between us as model developers and users. That is the web browser-based jupyter notebook. https://github.com/changliao1025/hexwatershed_tutorial

      There are several significant designs in this workflow:

      • use mybinder (https://mybinder.org/) to host the entire environment, so users don’t need to install anything to run all code
      • separate Python and C++ code: this is not an ideal solution, but it works best currently for me as a developer to isolate different environments
      • Adopt the latest Python package structure in PyEarth, PyFlowline, and HexWatershed; each package is maintained in a different Python package, making it much easier for CI/CD.

      Besides, in this workshop, I used a different mesh type, the DGGRID mesh, which is very popular in the GIS DGGS field but not well recognized in the geophysics field.

      My collaborator Matthew G Cooper provided lot of help in this workshop as he helped with the testing and design, especially from a user’s perspetive. For example, we changed the model configuration module. I also developed a new model input checking machnism inspired by his confusion about the model domain boundary input. (remember the linux bash profile/bashrc nightmare?)

      We had a relatively minor group of attendees but still received lots of questions and suggestions, which are gradually integrated back into our model right now.

    2023

    November

    March

    February

    • Thoughts on research software

      This post is a reflection following up a recent article. https://www.nature.com/articles/s41559-023-02008-w

    • The domain file in ESM

      In order to set up a MPAS mesh-based MOSART/ELM simulation, I need to prepare a domain file.

    • Leap year and technical debt

      In my work, I need to convert an E3SM model output into a different format. The output file is in the netCDF and I found some interesting design issue in the model.

    January

    2022

    July

    • Break the model

      Recently I have broken a few models I wrote previously into pieces, which gave me lots of thoughts on how I, as a modeler, work to improve our modeling skills.

    April

    January

    • Calculating the area in a 3D space

      In the HexWatershed model, we invented a new metric to evaluate the closeness of the modeled river network with real ones. This concept involves the calculation of ares from two sets of vector-based flowlines. The calculation, however, is not trival.

    • A review of HexWatershed and its future application

      Almost 4 years ago, I decided to write a model that can be used for the 3D land surface model routing component. At that time, they were already plenty of global routing models, except that all of them are for coarse spatial resolution. I need to create a model that is for high spatial resolution. However, this is an impossible task back then because I cannot run existing tools/models directly. The reason is that (1) existing regional scale model cannot be run at global scale; (2) global scale data usually has a different spatial reference. All of these lead me to develope the unstructured mesh-based flow direction model.

    2021

    September

    May

    April

    • HexWatershed Python package

      After recent discussion, it is clear that a Python package should be developed to make HexWatershed available to boarder audience. This decision means that there is much ongoing effort to re-structure the data flow and related algorithms behind HexWatershed. Below I will introduce some steps/efforts we made or will make in the next phrase of development.

    • Reducing the dependency of HexWatershed

      When HexWatershed was developed in earlier days, I gradually added support for stream burning and a few other features.

    January

    2020

    October

    September

    August

    • Stream burning on hexagonal mesh

      This is a follow up study of HexWatershed. In the latest development, we decided to add the so called “stream burning” capability into the HexWatershed model.

    July

    • Streaming burning on MPAS mesh

      To generate a stream network that is realistic with the NHD flow line, we need to recondition the DEM, or the so-called “stream burning” into the DEM.

    May

    April

    • Depression filling on a sphere

      One of my interests and projects is about flow routing on a global scale or a sphere. This work has several major challenges awaiting to be resolved. Here I will explain a few of them.

    January

    • The life span of a project

      Recently I have been working on a project that I need to prepare some maps. I had to go back to use some code I wrote almost 5 years ago. Then I realized that this happened to me many times.

    • A workflow for distributed parallel data analysis on HPC with checkpoint

      A typical task we do nowadays is to submit a job to the cluster to run some data analysis. But there are some limitations we can do as I know, to some extend.

    • A review on litter decomposition modeling

      Litter decomposition involves series of processes.

    • A review on dissolved organic carbon modeling

      Researchgate and Mendeley recommend some nice papers to me based on my own publications. Here are some quick review on the DOC modeling I read today.

    • Paper discussion Streamflow in the Columbia River Basin

      Streamflow in the Columbia River Basin: Quantifying Changes Over the Period 1951‐2008 and Determining the Drivers of Those Changes (https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/2018WR024256)

    • The impact of ice content on hydraulic conductivity

      In most aquifer test, the hydraulic conductivity is a function of the material, but it does not explicitly consider the impact of ice content on K. When soil or any material is partially or fully frozen, its actual K will decrease significantly. Here I want to explore how to model this impact in soil hydrology model. Special attention will be paid to the different impacts on vertical and horizontal K, respectively.

    2019

    December

    November

    • Learning in a diverse environment

      Asking for help sometimes can save your life. But how, where, when ask for help is always a challenge.

    • Actions that might help fighting climate change

      As an Earth science researcher, I talk a lot about climate change. But most of time I was basically sending message stating that climate change is real and bad. I didn’t spend much time thinking and explaining what should we do to battle climate change.

    October

    September

    • Lessons I have learnt during E3SM development

      I have been involved with the E3SM development since I joined PNNL as a postdoc. Over the course of time, I have learnt a lot from the E3SM model. I also found many issues within the model, which reflects lots of similar struggles in the lifespan of software engineering.

    • Rebuild the topology for billions of points on a sphere

      One of my projects requires an index system for a DGGS alike system with topology information. Unlike a 2D plane coordinate system which can be easily managed by a 2D matrix, plane coordinate system does work well on a sphere. One biggest issue is that we cannot locate the neighbors of each point.

    • Publication graphic generation workflow

      Preparing graphics for journal publication is an important step before we submit the manuscript. And sometimes this process is not as smooth as we expected. There are several problems we often encounter:

    • A revisit of spatial discretization

      Discretization by definition from Wikipedia: In applied mathematics, discretization is the process of transferring continuous functions, models, variables, and equations into discrete counterparts. This process is usually carried out as a first step toward making them suitable for numerical evaluation and implementation on digital computers. Now we add “spatial” to the term. The first intuitive definition is the discretization of functions in the spatial domain. There are two elements in this description: functions and spatial domain. For functions, we often refer to integral or ODEs/PDEs in numerical simulations. If these functions involve with gradient information, then they depend on spatial domain, which is how gradient is calculated. For spatial domain, we often refer to mesh or grid. And mesh can generally be classified into structured and unstructured grid. In practice, we have spent great effects on both aspects of the spatial discretization: mesh and corresponding function solvers. In Earth science, lots of functions involve with gradient. For example, gas exchange depends on pressure gradient or concentration gradient. As a result, Our spatial discretization is always associated with mesh grid. Many studies have developed methods to solve equations using the finite difference method (FDM), the finite volume method (FVM) and the finite element method (FEM). However, many of them have not considered the effects of mesh grid on these solvers. For example, most models use array to represent spatial domain. In this case, water flow at any location can only flow to 4 directions. However, in reality, water can flow to any direction because Earth has no discretization. The problem is how can we represent a direction if it is 30 degree? We need a better way to represent the spatial domain decomposition. We can improve the resolution. We can also use unstructured grid such as TIN. Ideally, we need a mesh that matches with reality. In my opinion, adaptive hexagon grid mesh is closest.

    March

    February

    • Tips for ArcSWAT issue

      I have to use ArcSWAT to prepare some SWAT model simulation inputs. And the experience wasn’t exactly good (maybe we need a different tool to do this the right way).

    2018

    December

    November

    October

    • The current state of ocean water storage

      In the hydrological cycle, ocean water has the largest storage without doubt. However, our estimates of the ocean water storage have great uncertainty. I have done some preliminary research into this field and I’d like to share my understanding of the current state of these estimates. I also post a related question in the ReseachGate website (Link).
      I will start with a few publications and reports. The first one is the USGS website: https://water.usgs.gov/edu/earthwherewater.html, which states the ocean storage to be around 96.5% of the global storage. The numbers are from “World fresh water resources” by Peter H Gleick 1993.

    August

    July

    June

    May

    • Hole and Boundary

      We sometimes do not understand well enough of what we are dealing with in our daily life. I was lucky enough to attend a seminar by Achille C. Varzi a few years ago, which inspired me on this topic: looking at hole and boundary from a different perspective.

    April

    March

    February

    • Another thought on Earth System Model challenge

      I recent had several discussions with several friends on various topics. Such as whether improving one component with an Earth System Model (ESM) necessarily improve the overall performance.

    January

    • How to write a computational climate model with confidence

      I have covered many aspects of climate model related topics so far and I have also discussed quite a few examples in model development. I think it is time to warp up a post to serve as a guidance for whose who may be interested in this field.

    2017

    September

    July

    June

    April

    • High performance computing: data management

      When you leave a position, such as graduation from university, you usually have to backup lots of data, which is usually considered as one critical step in data management.

    • Ecosystem modeling: uncertainty quantification

      I recently read a few news articles of skeptical attitudes towards climate change, and I decided to write something about it. I am not trying to promote climate change, but instead I want to point out there is great uncertainty in our current Earth system modeling.

    • Ecosystem modeling: model evaluation of current implementation in ECO3D 1.0

      As I am preparing my defense, the first version of ECOSYSTEM 1.0 in my thesis was finally completed. Looking forward to the next chapter of my life, I thought it is time to evaluate the ECOSYSTEM 1.0 to see what it is capable of and what still needs to be improved or expanded in the near future.

    March

    February

    January

    • Scientific writing: high quality scalable vector graphics

      I recently found a new way to prepare publication-ready high quality vector graphic using Google Drive. Here is the result:

    • High Performance Computing: Download and prepare data in a batch mode

      Over the time, I need to manipulate a lot of data on a Linux cluster. Some of these manipulations actually read/write data, whereas some are essentially file system operations, such as downloading the files. Here I present a list of similar operations suitable for HPC using pbs job approach whenever possible. I do not attempt to include all possible methods but only the ones that I find useful and easy to prepare in seconds. Download The most efficient way to download MODIS alike data using HPC. wget -r –no-parent -R “index.html” –retr-symlinks -A “.nc” ftp-url wget -r –no-parent -R “index.html” -A “MOD17A2.A2000.hdf” -A “MOD17A2.A2000.xml” http-url wget -r –no-parent -R “index.html” -A “MOD17.hdf” -A “MOD17.xml” http-url You can basically setup filter for file type, year and granule id. A live example: ///========================================================== #!/bin/bash
      #PBS -l nodes=1:ppn=1
      #PBS -l naccesspolicy=singleuser
      #PBS -l walltime=40:00:00
      #PBS -M your email address #PBS -m ae
      #PBS -N download
      #PBS -q standby
      cd $PBS_O_WORKDIR wget -r –no-parent -R “index.html” –retr-symlinks -A “.tar” ftp://somwhere ///==========================================================

    2016

    October

    • High Performance Computing: ParaFly

      In my recent posts I shared some first hand experience of parallel computing using OpenMP. While OpenMP is supported by many programming languages, there are still a few does not. So here I am sharing another approach to create a parallel computing job.

    • High Performance Computing: OpenMP and Anusplin

      This is a live example from the project I am working right now. Good luck if you are following! In order to speed my Anusplin program, I decided to use OpenMP. Anusplin is a package for climate data preparation.

    • Ecosystem modeling: how to deal with the time-variant datasets?

      Ecosystem modeling usually requires a variety of data to run a complete simulation. These data usually include both time-variant and time-invariant datasets. For example, we usually consider Digital Elevation Model (DEM) as time-invariant data because surface topography is relatively stable for a given period of time unless extreme events such as earthquake occur.

    September

    August

    July

    June

    • Numerical simulation: unit system

      Unit is always one of the most important factors in science. Most of the time, a number without unit is useless. For numerical simulation, a good design of unit system can be critical. Recently, I have written a short script/program to prepare climate data for my groundwater/surface water hydrology simulation. I downloaded some data from the National Climatic Data Center(NCDC) separately, and I did NOT notice the changes in unit system. In the end, I have to do everything over again.

    • Groundwater hydrology modeling: Why it is so difficult to simulate the groundwater dynamics in mountain area

      First, in mountain area, there is generally less groundwater observations, which makes it’s difficult to define the boundary condition. Without observations, the initial head can only be assumed or estimated using a steady state simulation. Second, groundwater recharge in mountain area is very low. This is because some mountain surface has no soil or a shallow soil layer. And the permeability of rock fractures is very low. As a result, infiltration from the surface/subsurface to groundwater systems is extremely low. Third, the hydraulic priorities of the fractured and unfractured rock material are also very low, which means the groundwater system is very sensitive. Beside, topography effects from mountain area usually require high spatial resolution grid cell in the numerical simulation.

    • Ecosystem modeling: challenges in water-carbon cycling in cold region numerical simulation

      Numerical simulation of water and carbon cycling in cold region usually encounter some unique challenges which must be taken into consideration in climate change study.

    • Ecosystem modeling: a question of chicken or eggs?

      Ecosystem modeling generally include three conceptual components: inputs, algorithm and outputs. For example, we usually need precipitation to estimate surface runoff. On the other hand, sometimes some outputs are also considered as inputs for other models. For example, vegetation dynamics also change the surface albedo and therefore the incoming radiation. In another word, the feedback among different processes are often too complex that we generally have to ignore some feedback processes.

    May

    April

    • Surface water hydrology modeling: scaling issue in Precipitation Runoff Modeling System(PRMS)

      Grid-based HRU network for stream routing in PRMS could be convenient for several reasons. Apparently the ideal data structure as matrix could be one of them.But it is not always the case if the scaling issue is not well handled. Below is an example how the scale issue could be a potential problem for the simulation framework. First, I would like to introduce the CRT. Cascade Routing Tool (CRT) is a computer application for watershed models that include the coupled Groundwater and Surface-water FLOW model GSFLOW and the Precipitation-Runoff Modeling System (PRMS). More information could be found at (http://water.usgs.gov/ogw/CRT/) The example run result is listed as:

    • Groundwater hydrology modeling: relation between steady state and transient simulation in MODFLOW simulation

      Standard groundwater modeling usually uses the three-dimensional Richard’s equation to describe the groundwater flow. Solution to the Richard equation usually requires numerical methods such as finite-difference equation used in USGS MODFLOW. To solute the array of equations within MODFLOW, an initial head is required regardless of steady state (SS) or transient (TR) simulation. As the MODFLOW manual states that in a steady state simulation, the storage terms are ignored since storage of each grid cell is constant with time. One of the common problems to run the MODFLOW is that we don’t have the initial head data for the simulation. Therefore it is usually suggested that a SS simulation could be run at first to provide the head for the TR simulation. This is because SS is independent with initial head, but TR isn’t. However, whether placing the SS simulation ahead of TR simulations is doubtful in many practices! Here are the explanations: First, the SS flow equation should only be simulated when the system is in equilibrium state.By definition, the steady state flow equation means for each grid cell, the flow and storage don’t change with time. In most natural scenarios, these requirements are rarely met. However, the hydrologic system may be very close to equilibrium under certain circumstances. For instance, during winter time, the water flow is ignorable if most hydrologic processes are impaired due to the snow cover and frozen ground.

    • Surface water hydrology modeling: deal with different types of precipitation

      In surface water hydrology, precipitation is one of the most important components. However, within most hydrology models, it is unclear of how precipitation is actually represented. For example, in a typical water cycle illustration from Wiki, precipitation is described as

    March

    • Scientific writing: how to prepare a good image figure

      “A picture is worth a thousand words”, from Wikipedia, unveils the importance of figures in writing. There are many types of figures or pictures, and in the field of Earth Science, image figure is one of the most important types. By image figure, I mean a figure composed of matrix, such as https://www.ncdc.noaa.gov/sotc/service/global/map-blended-mntp/201601.gif Link of the image.

    February

    • IDL programming: control variables in IDL across routines

      IDL, unlike C++, has its own approach and principle when handling variables across routines. These features are often very different with the variable scope within C++. I will try to answer several key questions and prove them using simple demos. Question 1: Will the value of a variable be changed after it’s passed into a function and changed inside? Demo:

    January

    2015

    November

    October