執行wrf時無法完整的執行

助教們好:
不好意思打擾,我目前已經將wrf成功編譯,正在嘗試執行

使用的腳本如下:

#!/bin/bash -l
#SBATCH -A ACD112218
#SBATCH -N 1 
#SBATCH --ntasks-per-node=40
#SBATCH --cpus-per-task=1
#SBATCH -J wrfrun
#SBATCH -p ctest
#SBATCH --exclusive
#SBATCH -d singleton
#SBATCH -t 0:30:00
#SBATCH -o wrfPrac-%j.out

module purge
module load hdf5-1.8.21-t
module load compiler/intel/2019u5
module load netcdf-c-4.7.3-t 
module load netcdf-fortran-4.4.5-t
module list

export OMP_NUM_THREADS=1

ln -sf namelist.input-VALIDATE namelist.input
/usr/bin/time -p mpirun -np 40 ./wrf.exe
mkdir VALIDATE
mv rsl.* namelist.input namelist.output VALIDATE
mv wrfo* VALIDATE


ln -sf namelist.input-TIMING namelist.input
/usr/bin/time -p mpirun -np 40 ./wrf.exe
mkdir TIMING
mv rsl.* namelist.input namelist.output TIMING

echo "finish running"

Running Log 如下

[u7807382@lgn303 WRF_practice_kit]$ cat wrfPrac-8893883.out 

Currently Loaded Modules:
  1) hdf5-1.8.21-t           3) netcdf-c-4.7.3-t
  2) compiler/intel/2019u5   4) netcdf-fortran-4.4.5-t

 

 starting wrf task           28  of           40
 starting wrf task           34  of           40
 starting wrf task           22  of           40
 starting wrf task           25  of           40
 starting wrf task           31  of           40
 starting wrf task           27  of           40
 starting wrf task           30  of           40
 starting wrf task           23  of           40
 starting wrf task           24  of           40
 starting wrf task           35  of           40
 starting wrf task           26  of           40
 starting wrf task           38  of           40
 starting wrf task           21  of           40
 starting wrf task           36  of           40
 starting wrf task           20  of           40
 starting wrf task           29  of           40
 starting wrf task           32  of           40
 starting wrf task           37  of           40
 starting wrf task           39  of           40
 starting wrf task            2  of           40
 starting wrf task           15  of           40
 starting wrf task            1  of           40
 starting wrf task           14  of           40
 starting wrf task           10  of           40
 starting wrf task            6  of           40
 starting wrf task           12  of           40
 starting wrf task           33  of           40
 starting wrf task            3  of           40
 starting wrf task            7  of           40
 starting wrf task            9  of           40
 starting wrf task           17  of           40
 starting wrf task            0  of           40
 starting wrf task            4  of           40
 starting wrf task            5  of           40
 starting wrf task           16  of           40
 starting wrf task           18  of           40
 starting wrf task           19  of           40
 starting wrf task            8  of           40
 starting wrf task           11  of           40
 starting wrf task           13  of           40
real 330.75
user 0.03
sys 0.07
 starting wrf task           20  of           40
 starting wrf task           21  of           40
 starting wrf task           26  of           40
 starting wrf task           27  of           40
 starting wrf task           28  of           40
 starting wrf task           29  of           40
 starting wrf task           34  of           40
 starting wrf task           39  of           40
 starting wrf task           15  of           40
 starting wrf task           16  of           40
 starting wrf task           18  of           40
 starting wrf task           31  of           40
 starting wrf task           19  of           40
 starting wrf task           11  of           40
 starting wrf task           38  of           40
 starting wrf task           35  of           40
 starting wrf task           33  of           40
 starting wrf task           25  of           40
 starting wrf task           24  of           40
 starting wrf task           22  of           40
 starting wrf task           23  of           40
 starting wrf task           36  of           40
 starting wrf task           10  of           40
 starting wrf task           37  of           40
 starting wrf task           30  of           40
 starting wrf task           32  of           40
 starting wrf task            7  of           40
 starting wrf task           13  of           40
 starting wrf task            0  of           40
 starting wrf task            5  of           40
 starting wrf task           12  of           40
 starting wrf task           14  of           40
 starting wrf task            1  of           40
 starting wrf task            4  of           40
 starting wrf task            9  of           40
 starting wrf task           17  of           40
 starting wrf task            3  of           40
[mpiexec@cpn3087] real 1.10

目前在執行時遇到以下三個問題:

  1. 在VALIDATE中,rsl.error.00*都有出現wrf: SUCCESS COMPLETE WRF,但是前面有一些錯誤訊息,請問這樣有成功執行嗎?以下是rsl.error.0001的完整內容
[u7807382@lgn303 VALIDATE]$ cat rsl.error.0000 
taskid: 0 hostname: cpn3052
 module_io_quilt_old.F        2931 F
Quilting with   1 groups of   0 I/O tasks.
 Ntasks in X            5 , ntasks in Y            8
*************************************
Configuring physics suite 'conus'

         mp_physics:      8
         cu_physics:      0*
      ra_lw_physics:      4
      ra_sw_physics:      4
     bl_pbl_physics:      2
  sf_sfclay_physics:      2
 sf_surface_physics:      2

(* = option overrides suite setting)
*************************************
Domain # 1: dx =  3000.000 m
WRF V4.2.1 MODEL
 *************************************
 Parent domain
 ids,ide,jds,jde            1        1500           1        1500
 ims,ime,jms,jme           -4         307          -4         195
 ips,ipe,jps,jpe            1         300           1         188
 *************************************
DYNAMICS OPTION: Eulerian Mass Coordinate
 alloc_space_field: domain            1 ,            1996055616  bytes allocated
med_initialdata_input: calling input_input
 Input data is acceptable to use: wrfinput_d01
 CURRENT DATE          = 2019-05-05_22:00:00
 SIMULATION START DATE = 2019-05-05_18:00:00
Timing for processing wrfinput file (stream 0) for domain        1:   55.29344 elapsed seconds
Max map factor in domain 1 =  1.12. Scale the dt in the model accordingly.
 D01: Time step                              =    18.0000000      (s)
 D01: Grid Distance                          =    3.00000000      (km)
 D01: Grid Distance Ratio dt/dx              =    6.00000000      (s/km)
 D01: Ratio Including Maximum Map Factor     =    6.69506025      (s/km)
 D01: NML defined reasonable_time_step_ratio =    6.00000000
INPUT LandUse = "MODIFIED_IGBP_MODIS_NOAH"
 LANDUSE TYPE = "MODIFIED_IGBP_MODIS_NOAH" FOUND          33  CATEGORIES           2  SEASONS WATER CATEGORY =           17  SNOW CATEGORY =           15
INITIALIZE THREE Noah LSM RELATED TABLES
Skipping over LUTYPE = USGS
 LANDUSE TYPE = MODIFIED_IGBP_MODIS_NOAH FOUND          20  CATEGORIES
 INPUT SOIL TEXTURE CLASSIFICATION = STAS
 SOIL TEXTURE CLASSIFICATION = STAS FOUND          19  CATEGORIES
ThompMP: read qr_acr_qgV2.dat instead of computing
ThompMP: read qr_acr_qsV2.dat instead of computing
ThompMP: read freezeH2O.dat instead of computing
 mediation_integrate.G        1944 DATASET=HISTORY
 mediation_integrate.G        1945  grid%id            1  grid%oid            1
Timing for Writing wrfout_d01_2019-05-05_22:00:00 for domain        1:   51.82892 elapsed seconds
 Input data is acceptable to use: wrfbdy_d01
Timing for processing lateral boundary for domain        1:    1.59826 elapsed seconds
WRF NUMBER OF TILES FROM OMP_GET_MAX_THREADS =   1
 Tile Strategy is not specified. Assuming 1D-Y
WRF TILE   1 IS      1 IE    300 JS      1 JE    188
WRF NUMBER OF TILES =   1
Timing for main: time 2019-05-05_22:00:18 on domain   1:  169.33875 elapsed seconds
 mediation_integrate.G        1944 DATASET=HISTORY
 mediation_integrate.G        1945  grid%id            1  grid%oid            2
Timing for Writing wrfout_d01_2019-05-05_22:00:18 for domain        1:   35.91587 elapsed seconds
Timing for main: time 2019-05-05_22:00:36 on domain   1:   51.11222 elapsed seconds
open_hist_w : error opening wrfout_d01_2019-05-05_22:00:36 for writing. ***
 mediation_integrate.G        1944 DATASET=HISTORY
 mediation_integrate.G        1945  grid%id            1  grid%oid            2
Timing for Writing wrfout_d01_2019-05-05_22:00:36 for domain        1:   16.55020 elapsed seconds
Timing for main: time 2019-05-05_22:00:54 on domain   1:   31.76314 elapsed seconds
open_hist_w : error opening wrfout_d01_2019-05-05_22:00:54 for writing. ***
 mediation_integrate.G        1944 DATASET=HISTORY
 mediation_integrate.G        1945  grid%id            1  grid%oid            2
Timing for Writing wrfout_d01_2019-05-05_22:00:54 for domain        1:   16.52525 elapsed seconds
wrf: SUCCESS COMPLETE WRF
  1. 我檢查了一下wrfout,但卻有兩個檔案的檔案大小非常小,僅有512byte,且用cat cat wrfout_d01_2019-05-05_22\:00\:36檢查發現這兩個檔案都沒有東西,似乎沒有完全正常運作,僅有完成一半的工作,請問為甚麼會有這種情況發生呢,也想請問一下可能的解決方式。以下是VALIDATE中wrfout的檔案大小:
[u7807382@lgn303 VALIDATE]$ du -sh wrfout_d01_2019-05-05_22\:00\:*
11G     wrfout_d01_2019-05-05_22:00:00
2.6G    wrfout_d01_2019-05-05_22:00:18
512     wrfout_d01_2019-05-05_22:00:36
512     wrfout_d01_2019-05-05_22:00:54
  1. 檢查I./TIMING/rsl.out.0000卻發現裡面沒有任何內容,檔案大小是0,似乎是不預期的中止程式,但不確定為甚麼會這樣

以上是這幾天遇到的問題,不好意思打擾了
謝謝助教!

同學好,

前兩個問題應該是

open_hist_w : error opening wrfout_d01_2019-05-05_22:00:36 for writing. ***

這個錯誤訊息所造成的

同學可以使用 open_hist_w : error opening for writing 關鍵字在 Google 查找,可以找到一些可能的解決方式。其中還有漢堡大學的叢集競賽團隊參加 SCC16 撰寫的報告,當中他們也遇到了一樣的錯誤訊息,也提供了一些可能的解決方法,同學可以自行搜尋並參考。

關於 3. 的問題,我使用同學提供的腳本執行,./TIMING/rsl.out.0000 是有內容的。可能要請同學自行 debug 看看。

謝謝助教的回覆,第三個問題已經順利解決,剛剛有參考漢堡大學的報告了,目前正在重新測試中,非常感謝助教的協助!!