Autosubmit 4: Experiment Creation On HPC Issue
Hey guys! I've run into a bit of a snag while trying to create experiments using Autosubmit 4, and I thought I'd share the details and see if anyone else has bumped into this or knows a workaround. Let's dive in!
The Lowdown: What's Happening?
So, the gist of it is this: When I'm creating a new experiment in Autosubmit 4, specifically using the -H parameter to specify the HPC (High-Performance Computing) platform and the -y parameter to copy settings from a reference experiment, it seems like the HPC setting isn't quite sticking. The experiment creates successfully, but the configuration files, specifically expdef.yml, still point to the original experiment's HPC instead of the one I specified. This is a bit of a bummer because it means the experiment isn't set up to run on the intended HPC, which can lead to all sorts of issues.
Here's the Setup:
- Autosubmit Version: 4.1.15-conda (I'm running on this version, so your results may vary if you are on a different version)
 - Machine/Environment: bscesautosubmithub03 (I'm using this specific machine, but I'm not sure if it happens on others.)
 - Experiment ID: aa2c (This is the specific experiment I was working on when I discovered this issue.)
 
I was aiming to create an experiment on a different HPC platform than the original, but the expdef.yml file is not updated with the new one. The problem occurs when you try to create an experiment that is copied from a reference experiment using the -y parameter with the -H parameter to specify the HPC.
The Commands I Ran:
[eferre1@bscesautosubmit03 ~]$ module load autosubmit/4.1.15-conda 
[eferre1@bscesautosubmit03 ~]$ autosubmit expid -H ecmwf-hpc2020 -d"test exp creation on hpc2020" -y t0lo
Autosubmit is running with 4.1.15
The new experiment "aa2c" has been registered.
Generating folder structure...
Experiment folder: /esarchive/autosubmit/aa2c
Generating config files...
[WARNING] Yaml file /esarchive/autosubmit/aa2c/proj/auto-ecearth4/configs/autosubmit_files/default/settings.yml not found
[WARNING] Yaml file /esarchive/autosubmit/aa2c/proj/auto-ecearth4/configs/autosubmit_files/default/settings.yml not found
Experiment aa2c created
The Problem:
The core issue is that the HPCARCH variable in expdef.yml retains the HPC from the reference experiment instead of updating to the one specified with -H. Here's what the expdef.yml looks like after the experiment creation:
DEFAULT:
  # Job experiment ID.
  EXPID: "aa2c"
  # Default HPC platform name.
  HPCARCH: bsc-marenostrum5
As you can see, the HPCARCH should be ecmwf-hpc2020, but it is still set to bsc-marenostrum5. This is not the expected behavior. The configuration should be updated to reflect the new HPC platform I've specified during the experiment creation process. This means that when the experiment is submitted, it will try to run on the wrong HPC, which is obviously not what we want.
Reproducible Example: Try It Yourself!
If you want to try and replicate this, here's the simple recipe:
- Make sure you have Autosubmit 4 installed and configured.
 - Use the 
autosubmit expidcommand to start a new experiment. - Use the 
-Hoption, followed by the name of your desired HPC platform. - Use the 
-yoption to specify an existing experiment as a reference (this copies settings). - Check the 
expdef.ymlfile in your newly created experiment directory. Check the value ofHPCARCHand see if it's correct. 
This should give you a good idea of whether the problem exists in your environment as well.
Expected Behavior: What Should Happen?
Ideally, when you create an experiment with the -H parameter, Autosubmit should update the experiment's configuration to reflect the new HPC settings. So, in the expdef.yml file, the HPCARCH variable should be updated to the HPC specified with the -H flag. This ensures that the experiment runs on the correct platform, allowing it to work as intended.
Why This Matters
This might seem like a small thing, but it can create major headaches. The experiment will be configured to run on the wrong HPC platform. The job submission scripts, environment variables, and any other settings that depend on the HPC architecture will be incorrect, potentially leading to job failures, incorrect results, or even the waste of precious computing resources. When working in a multi-HPC environment, ensuring that the configurations are correctly set up is important to prevent running experiments in the wrong HPC, which could lead to many problems.
Potential Workarounds (For Now)
While we wait for a fix, here are a couple of potential workarounds:
- Manual Editing: After creating the experiment, you can manually edit the 
expdef.ymlfile to correct theHPCARCHvalue. This is a bit tedious, but it ensures that your experiment runs on the correct HPC. - Modify Configuration Files: You can try modifying the configuration files directly to set the HPCARCH variable. This is more difficult and can be error-prone, but it might be another way to get things working. You may need to modify the settings.yml or other configuration files where the HPC settings are defined to ensure the experiment uses the desired HPC platform. This may involve updating paths, modules, and other platform-specific configurations.
 - Experiment Without Copying: Avoid using the 
-yparameter, and instead configure your experiment from scratch. This can be time-consuming, but at least you will have control over the HPC setting. Start a new experiment without the-yflag, then manually set up all the necessary parameters, including the HPC configuration. While this takes more initial setup time, it ensures the experiment is correctly configured for the target HPC platform. 
Conclusion: Seeking a Solution
This is the problem I found, and I hope it helps you guys in some way! If you encounter the same problem, let me know, and we can compare notes. If you've found a fix or workaround, please share it. Any thoughts or suggestions are welcome!
In essence, the issue is that the HPC setting isn't updating correctly when creating experiments using the -H and -y parameters. This leads to the experiments being configured for the incorrect HPC, which can cause significant issues. So, keep this in mind when you're working with Autosubmit 4 and running experiments on different HPC platforms.
This problem can create problems when the experiment is not configured to run on the right HPC, which could lead to job failures or wasting computational resources. If you have any extra information, do not hesitate to reach out. Any help would be greatly appreciated. Hopefully, we can find a fix soon!