Installing atomate¶
Introduction¶
This guide will get you up and running in an environment for running high-throughput workflows with atomate. atomate is built on pymatgen, custodian, and FireWorks libraries to run materials science workflows. Briefly:
pymatgen is used to create input files and analyze the output of materials science codes
custodian runs your simulation code (e.g., VASP) and performs error checking/handling and checkpointing
FireWorks is used to design, manage and execute workflows.
Details about how atomate is designed can be found in the atomate paper and an overview of how these different pieces interact are in a Slideshare presentation. Running and writing your own workflows are covered in later tutorials. For now, these topics will be covered in enough depth to get you set up and to help you know where to troubleshoot if you are having problems.
It is assumed that you are comfortable with basic Linux shell commands and navigation. If not, Linux Journey and Linux Command breifly cover enough to get you started. It will also be helpful if you are familiar with Python, but it is not strictly required for installation.
Note that this installation tutorial is VASP-centric since almost all functionality currently in atomate pertains to VASP.
Objectives¶
Install and configure atomate on your computing cluster
Validate the installation with a test workflow that computes a band structure
Installation checklist¶
Completing everything on this checklist should result in a fully functioning environment. Each item will be covered in depth, but this can be used to keep track of the big picture and help reinstall on other systems.
Prerequisites¶
Before you install, you need to make sure that your “worker” computer (where the simulations will be run, often a computing cluster) that will execute workflows can (i) run the base simulation packages (e.g., VASP, LAMMPs, FEFF, etc) and (ii) connect to a MongoDB database. For (i), make sure you have the appropriate licenses and compilation to run the simulation packages that are needed. For (ii), make sure your computing center doesn’t have firewalls that prevent database access. Typically, academic computing clusters as well as systems with a MOM-node style architecture (e.g., NERSC) are OK. High-security government supercomputing centers often require custom treatment and modified execution patterns - some further details are provided later in this installation guide.
VASP¶
To get access to VASP on supercomputing resources typically requires that you are added to a user group on the system you work on after your license is verified. Ensure that you have access to the VASP executable and that it is functional before starting this tutorial.
MongoDB¶
MongoDB is a NoSQL database that stores each database entry as a document, which is represented in the JSON format (the formatting is similar to a dictionary in Python). Atomate uses MongoDB to:
store the workflows that you want to run as well as their state details (through FireWorks - required)
to parse output files and create database of calculation results (strongly recommended and assumed by most default settings of workflows, but technically optional)
Note that there are various tools to query this information later, ranging from highly useful tools that work on all MongoDB databases (e.g., MongoDB command line, MongoDB GUI programs) as well as analysis tools built into pymatgen-db, FireWorks, and atomate that are generally more specific to the information generated by these codebases.
MongoDB must be running and available to accept connections whenever you are running workflows. Thus, it is strongly recommended that you have a server to run MongoDB or (simpler) use a hosting service. Your options are:
use a commercial service to host your MongoDB instance. These are typically the easiest to use and offer high quality service but require payment for larger databases. mLab and MongoDB Atlas offer free 500 MB databases with payment required for larger databases; the free tier is certainly enough to get started for small to medium size projects, and it is easy to upgrade or migrate your database if you do exceed the free allocation.
contact your supercomputing center to see if they offer MongoDB hosting (e.g., NERSC has this, Google “request NERSC MongoDB database”)
self-host a MongoDB server
If you are just starting, we suggest the first (with a free plan) or second option (if available to you). The third option will require you to open up network settings to accept outside connections properly which can sometimes be tricky.
Next, create a new database and set up two new username/password combinations:
an admin user
a read-only user
You might choose to have two separate databases - one for the workflows and another for the results. We suggest that new users keep both sets of results in a single database and only consider using two databases if they run into specific problems.
Keep a record of your credentials - we will configure FireWorks to connect to them in a later step. Also make sure you note down the hostname and port for the MongoDB instance.
Note
The computers that perform the calculations must have access to your MongoDB server. Some computing resources have firewalls blocking connections. Although this is not a problem for most computing centers that allow such connections (particularly from MOM-style nodes, e.g. at NERSC, SDSC, etc.), but some of the more security-sensitive centers (e.g., LLNL, PNNL, ARCHER) will run into issues. If you run into connection issues later in this tutorial, some options are:
contact your computing center to review their security policy to allow connections from your MongoDB server (best resolution)
host your Mongo database on a machine that you are able to securely connect to, e.g. on the supercomputing network itself (ask a system administrator for help)
use a proxy service to forward connections from the MongoDB –> login node –> compute node (you might try, for example, the mongo-proxy tool).
set up an ssh tunnel to forward connections from allowed machines (the tunnel must be kept alive at all times you are running workflows)
Create a directory scaffold for atomate¶
Installing atomate includes installation of codes, configuration files, and various binaries and libraries. Thus, it is useful to create a directory structure that organizes all these items.
Log in to the compute cluster and make sure the Python module you want to use is loaded. We highly recommend to make sure Python is loaded upon login, e.g. through an rc file (e.g.
~/.bashrc
at most centers or~/.bashrc.ext
at NERSC)Create a directory in a spot on disk that has relatively fast access from compute nodes and that is only accessible by yourself or your collaborators. Your environment and configuration files will go here, including database credentials. We will call this place
<<INSTALL_DIR>>
. A good name might simply beatomate
, but you could also use a project-specific name (e.g.,atomate-solar
).Now you should scaffold the rest of your
<<INSTALL_DIR>>
for the things we are going to do next. Create a directories namedlogs
, andconfig
so your directory structure looks like:atomate ├── config └── logs
Create a Python 3 virtual environment¶
Note
Make sure to create Python 3.6+ environment as recent versions of atomate only support Python 3.6 and higher.
We highly recommended that you organize your installation of the atomate and the other Python codes using a virtual environment (e.g. virtualenv
or similar tool such as anaconda).
Ultimately, whether you want to use a virtual environment is optional and you don’t have to use one if you know what you are doing.
Virtual environments allow you to keep an installation of Python and all of the installed packages separate from the installation on the system.
Some of the main benefits are:
Different Python projects that have conflicting packages can coexist on the same machine.
Different versions of Python can exist on the same machine and be managed more easily (e.g. Python 2 and Python 3).
You have full rights and control over the environment. If it breaks, you can just delete the folder containing the environment and recreate it. On computing resources, this solves permissions issues with installing and modifying packages.
The easiest way to get a Python virtual environment is to use the virtualenv
tool.
Most Python distributions come with virtualenv
, but some clusters (e.g., NERSC) have moved towards using Anaconda, which is a popular distribution of Python designed for scientific computing that can serve the same purpose.
If the compute resource you want to access is using Anaconda, you will follow the same general steps, but create your environment with conda create
.
See the documentation for the conda command line tool here as well as a conversion between virtualenv and conda commands.
To set up your virtual environment:
Go to your install directory (
<<INSTALL_DIR>>
) and create a virtual environment there. A good name might beatomate_env
. The default command to create the environment would bevirtualenv atomate_env
, which creates a folderatomate_env
in the directory you are in.You can
ls
this directory and see that you have the following structure:atomate ├──atomate_env/ ├── bin ├── include ├── lib ├── lib64 └── pip-selfcheck.json ├── config └── logs
If you look in the
bin
directory, you will see several programs, such as activate, pip, and Python itself.lib
will be where all of your installed packages will be kept, etc. Again, if anything goes wrong in installing Python codes, you can just delete the virtual environment directory (atomate_env
) and start again.Activate your environment by running
source <<INSTALL_DIR>>/atomate_env/bin/activate
. This makes it so when you use the commandpython
, the version ofpython
that you use will be the one in thebin
directory rather than the system-wide Python. You can read the activation script if you are interested. It’s just does a little magic to adjust your path to point towards thebin
and other directories you created.Consider adding
source <<INSTALL_DIR>>/atomate_env/bin/activate
to your .rc or .bash_profile file so that it is run whenever you log in. Otherwise, note that you must call this command after every log in before you can do work on your atomate project.
Install Python packages¶
You have successfully set up an Python 3 environment in which to install atomate! Next, we will download and install all of the atomate-related Python packages.
You can install these packages automatically or in “development mode”. Development mode installation makes it easier to view and modify the source code to your needs, but requires a few more steps to set up and maintain.
To install packages automatically, the main tool we will use is pip (unless you have an Anaconda distribution where you’d again use conda). To install the packages run:
pip install atomate
Alternatively, if you would like to install atomate or any other codes in development mode via git, see the developer installation for installing atomate codes in development mode.
Configure database connections and computing center parameters¶
We’ve now set up your environment and installed the necessary software. You’re well on your way!
The next step is to configure some the software for your specific system - e.g., your MongoDB credentials, your computing cluster and its queuing system, etc. The setup below will be just enough to get your environment bootstrapped. For more details on the installation and specifics of FireWorks, read the installation guide.
Note
All of the paths here must be absolute paths. For example, the absolute path that refers to <<INSTALL_DIR>>
might be /global/homes/u/username/atomate
(don’t use the relative directory ~/atomate
).
Warning
Passwords will be stored in plain text! These files should be stored in a place that is not accessible by unauthorized users. Also, you should make random passwords that are unique only to these databases.
Create the following files in <<INSTALL_DIR>>/config
.
db.json¶
The db.json
file tells atomate the location and credentials of the MongoDB server that will store the results of parsing calculations from your workflows (i.e., actual property output data on materials). The db.json
file requires you to enter the basic database information as well as what to call the main collection that results are kept in (e.g. tasks
) and the authentication information for an admin user and a read only user on the database. Mind that valid JSON requires double quotes around each of the string entries and that all of the entries should be strings except the value of “port”, which should be an integer (no quotes).
{
"host": "<<HOSTNAME>>",
"port": <<PORT>>,
"database": "<<DB_NAME>>",
"collection": "tasks",
"admin_user": "<<ADMIN_USERNAME>>",
"admin_password": "<<ADMIN_PASSWORD>>",
"readonly_user": "<<READ_ONLY_PASSWORD>>",
"readonly_password": "<<READ_ONLY_PASSWORD>>",
"aliases": {}
}
If you want to test whether your db.json
is set up correctly (and you do not mind resetting your database!!)), try running the Python script below in the directory with your db.json
file:
from atomate.vasp.database import VaspCalcDb
x = VaspCalcDb.from_db_file("db.json")
x.reset()
print("SUCCESS")
If you would like to store data beyond the 16 Mb limit of MongoDB please read: Advanced Storage Stratagies.
my_fworker.yaml¶
In FireWorks’ distributed server-worker model, each computing resource where you run jobs is a FireWorker (Worker). Each worker (like NERSC or SDSC or your local cluster) needs some configuration:
A
name
to help record-keeping of what calculation ran whereTwo parameters (
category
andquery
) that can be used to control which calculations are executed on this Worker. Our default settings will just allow all calculations to be run.An
env
that controls the environment and settings unique to the cluster, such as the path to VASP executable or location of a scratch directory which is dependent on your computing system
If this is the only cluster you plan on using just one Worker for all of your calculations a minimal setup for the my_fworker.yaml
file is
name: <<WORKER_NAME>>
category: ''
query: '{}'
env:
db_file: <<INSTALL_DIR>>/config/db.json
vasp_cmd: <<VASP_CMD>>
scratch_dir: null
Where the <<WORKER_NAME>> is arbitrary and is useful for keeping track of which Worker is running your jobs (an example might be Edison
if you are running on NERSC’s Edison resource). db_file
points to the db.json
file that you just configured and contains credentials to connect to the calculation output database. The <<VASP_CMD>> is the command that you would use to run VASP with parallelization (srun -n 16 vasp
, ibrun -n 16 vasp
, mpirun -n 16 vasp
, …). If you don’t know which of these to use or which VASP executable is correct, check the documentation for the computing resource you are running on or try to find them interactively by checking the output of which srun
, which vasp_std
, etc. Optionally, you can set the scratch_dir
to something other than null if there is a particular location where you have fast disk access. This key sets the “root” scratch dir; a temporary directory will be created in this root directory for each calculation. Scratch directories are only temporarily while the calculation is executing.
If you later want to set up multiple Workers on the same or different machines, you can find information about controlling which Worker can run which job by using the name
field above, or the category
or query
fields that we did not define. For more information on configuring multiple Workers, see the FireWorks documentation for controlling Workers. Such features allow you to use different settings (e.g., different VASP command such as different parallelization amount) for different types of calculations on the same machine or control what jobs are run on various computing centers.
my_launchpad.yaml¶
The db.json
file contained the information to connect to MongoDB for the calculation output database. We must also configure the database for storing and managing workflows within FireWorks using my_launchpad.yaml
as in FireWorks’ server-worker model. Technically, these can be different databases but we’ll configure them as the same database.
The LaunchPad is where all of the FireWorks and Workflows are stored. Each Worker can query this database for the status of Fireworks and pull down Fireworks to reserve them in the queue and run them. A my_launchpad.yaml
file with fairly verbose logging (strm_lvl: INFO
) is below:
host: <<HOSTNAME>>
port: <<PORT>>
name: <<DB_NAME>>
username: <<ADMIN_USERNAME>>
password: <<ADMIN_PASSWORD>>
ssl_ca_file: null
logdir: null
strm_lvl: INFO
user_indices: []
wf_user_indices: []
Here’s what you’ll need to fill out:
<<HOSTNAME>>
- the host of your MongoDB db server<<PORT>>
- the port of your MongoDB db server<<DB_NAME>>
- the name of the MongoDB database<<ADMIN_USERNAME>>
and<<ADMIN_PASSWORD>>
- the (write) credentials to access your DB. Delete these lines if you do not have password protection in your DB (although you should).
You can optionally set logdir
to your <<INSTALL_DIR>>/logs
directory, although you shouldn’t need them. The strm_lvl
sets the verbosity of the log and user_indices
and wf_user_indices
can be used to speed up targeted database queries if your project grows very large and queries are slow.
Note: If you prefer to use the same database for FireWorks and calculation outputs, these values will largely be duplicated with db.json
(this is what our tutorial is assuming). If you prefer to use different databases for workflows and calculation outputs, the information here will be different than db.json
.
If you want to test whether your my_launchpad.yaml
is set up correctly (and you do not mind resetting your database!!)), try executing the following command in the command line:
lpad -l my_launchpad.yaml reset
my_qadapter.yaml¶
To run your VASP jobs at scale across one or more nodes, you usually submit your jobs through a queue system on the computing resources. FireWorks handles communicating with some of the common queue systems automatically. As usual, only the basic configuration options will be discussed. If you will use atomate as in this tutorial, this basic configuration is sufficient.
If you do change anything, one key aspect would be to change the rocket launcher command from rapidfire
to singleshot
, which will let you launch in reservation mode.
Using the qlaunch
with the -r
flag (reservation mode launching) means there is a 1:1 mapping of queue submission and VASP calculation.
This mode is also bit more complex than normal launching.
It may be worth going through the FireWorks documentation to understand the difference between these modes and making an informed choice about which mode to use.
A minimal my_qadapter.yaml
file for SLURM machines might look like
_fw_name: CommonAdapter
_fw_q_type: SLURM
rocket_launch: rlaunch -c <<INSTALL_DIR>>/config rapidfire
nodes: 2
walltime: 24:00:00
queue: null
account: null
job_name: null
pre_rocket: null
post_rocket: null
logdir: <<INSTALL_DIR>>/logs
The _fw_name: CommonAdapter
means that the queue is one of the built in queue systems and _fw_q_type: SLURM
indicates that the SLURM system will be used. FireWorks supports the following queue systems out of the box:
PBS/Torque
SLURM
SGE
IBM LoadLeveler
Note
If you aren’t sure what queue system the cluster you are setting up uses, consult the documentation for that resource. If the queue system isn’t one of these preconfigured ones, consult the FireWorks documentation for writing queue adapters. The FireWorks documentation also has tutorials on setting up your jobs to run on a queue in a way that is more interactive than the minimal details specified here.
nodes
, walltime
are the default reservations made to the queue as you would expect. queue
refers to the name of the queue you will submit to. Some clusters support this and appropriate values might be regular
, normal
, knl
, etc. as defined by the compute resource you are using. The account
option refers to which account to charge. Again, whether or not you need to set this depends on the resource. pre_rocket
and post_rocket
add lines to before and after you job launches in your queue submission script. One use of this would be to enter directives such as #SBATCH -C knl,quad,cache
to configure SLURM to run on knl nodes. Any parameters left null will not be used to write the queue file.
This is not at all required, but if you want to see what the queue templates look like, you can see them here. The values you put in your my_qadapter.yaml
file above are used to fill in the unknown values of the template.
FW_config.yaml¶
As you may have noticed, there are lots of config files for controlling various aspects of FireWorks. The master config file is called FW_config.yaml
, which controls different FireWorks settings and also can point to the location of other configuration files. For a more complete reference to the FireWorks parameters you can control see the FireWorks documentation for modifying the FW config. Here you simply need to accomplish telling FireWorks the location of the my_launchpad.yaml
, my_qadapter.yaml
and my_fworker.yaml
configuration files so that you can simply tell FireWorks the location of the master config file and don’t need to always specify the location of those other files.
Create a file called FW_config.yaml
in <<INSTALL_DIR>>/config
with the following contents:
CONFIG_FILE_DIR: <<INSTALL_DIR>>/config
The CONFIG_FILE_DIR
is expected to contain all your other FireWorks config files.
Finishing up¶
The directory structure of <<INSTALL_DIR>>/config
should now look like
config
├── db.json
├── FW_config.yaml
├── my_fworker.yaml
├── my_launchpad.yaml
└── my_qadapter.yaml
The last thing we need to do to configure FireWorks is add the following line to your RC / bash_profile file to set an environment variable telling FireWorks where to find the FW_config.yaml
export FW_CONFIG_FILE=<<INSTALL_DIR>>/config/FW_config.yaml
where <<INSTALL_DIR>>
is your (usual) installation directory. Remember that the FW_config.yaml
will in turn give the location of your other config files.
That’s it. You’re done configuring FireWorks and most of atomate. You should now perform a check to make sure that you can connect to the database by sourcing your RC file (to set this environment variable) and initializing the database by running the command
lpad reset
which should return something like:
Are you sure? This will RESET 0 workflows and all data. (Y/N) y
2015-12-30 18:00:00,000 INFO Performing db tune-up
2015-12-30 18:00:00,000 INFO LaunchPad was RESET.
Configure pymatgen¶
If you are planning to run VASP, the last configuration step is to configure pymatgen to (required) find the pseudopotentials for VASP and (optional) set up your API key from the Materials Project.
Pseudopotentials¶
The pseudopotentials should be in any folder as in the Prerequisites. For convenience, you might copy these to the same directory you will be installating atomate (such as <<INSTALL_DIR>>/pps
), but this is not required. Regardless of its location, the directory structure should look like:
pseudopotentials
├── POT_GGA_PAW_PBE
│ ├── POTCAR.Ac.gz
│ ├── POTCAR.Ac_s.gz
│ ├── POTCAR.Ag.gz
│ └── ...
├── POT_GGA_PAW_PW91
│ ├── POTCAR.Ac.gz
│ ├── POTCAR.Ac_s.gz
│ ├── POTCAR.Ag.gz
│ └── ...
└── POT_LDA_PAW
├── POTCAR.Ac.gz
├── POTCAR.Ac_s.gz
├── POTCAR.Ag.gz
└── ...
This directory structure is needed so that the underlying pymatgen code correctly finds the POTCARs. Enter the location of this directory into a file called .pmgrc.yaml
in your home folder (i.e., ~/.pmgrc.yaml
) with the following contents
PMG_VASP_PSP_DIR: <<INSTALL_DIR>>/pps
If you’d like to use a non-default functional in all of your calculations, you can set the DEFAULT_FUNCTIONAL
key to a functional that is supported by VASP, e.g. PS
to use PBESol.
Materials Project API key¶
You can get an API key from the Materials Project by logging in and going to your Dashboard. Add this also to your .pmgrc.yaml
so that it looks like the following
PMG_VASP_PSP_DIR: <<INSTALL_DIR>>/pps
PMG_MAPI_KEY: <<YOUR_API_KEY>>
Run a test workflow¶
To make sure that everything is set up correctly and in place, we’ll finally run a simple (but real) test workflow. Two methods to create workflows are (i) using atomate’s command line utility atwf
or (ii) by creating workflows in Python. For the most part, we recommend using method (ii), the Python interface, since it is more powerful and also simple to use. However, in order to get started without any programming, we’ll stick to method (i), the command line, using atwf
to construct a workflow. Note that we’ll discuss the Python interface more in the Running Workflows Tutorial and provide details on writing custom workflows in the Creating Workflows.
Ideally you set up a Materials Project API key in the Configure pymatgen section, otherwise you will need to provide a POSCAR for the structure you want to run. In addition, there are two different methods to use atwf
- one using a library of preset functions for constructing workflows and another with a library of files for constructing workflows.
This particular workflow will only run a single calculation that optimizes a crystal structure (not very exciting). In the subsequent tutorials, we’ll run more complex workflows.
Add a workflow¶
Below are 4 different options for adding a workflow to the database. You only need to execute one of the below commands; note that it doesn’t matter at this point whether you are loading the workflow from a file or from a Python function.
Option 1 (you set up a Materials Project API key, and want to load the workflow using a file):
atwf add -l vasp -s optimize_only.yaml -m mp-149 -c '{"vasp_cmd": ">>vasp_cmd<<", "db_file": ">>db_file<<"}'
Option 2 (you set up a Materials Project API key, and want to load the workflow using a Python function):
atwf add -l vasp -p wf_structure_optimization -m mp-149
Option 3 (you will load the structure from a POSCAR file, and want to load the workflow using a file):
atwf add -l vasp -s optimize_only.yaml POSCAR -c '{"vasp_cmd": ">>vasp_cmd<<", "db_file": ">>db_file<<"}'
Option 4 (you will load the structure from a POSCAR file, and want to load the workflow using a Python function):
atwf add -l vasp -p wf_structure_optimization POSCAR
All of these function specify (i) a type of workflow and (ii) the structure to feed into that workflow.
The
-l vasp
option states to use thevasp
library of workflows.The
-s optimize_only.yaml
sets the specification of the workflow using theoptimize_only.yaml
file in this directory. Alternatively, the-p wf_structure_optimization
sets the workflow specification using the preset Python function located in this module. For now, it’s probably best not to worry about the distinction but to know that both libraries of workflows are available to you.The
-c
option is used in file-based workflows to make sure that one uses thevasp_cmd
anddb_file
that are specified inmy_fworker.yaml
that you specified earlier. In the preset workflows, it is the default behavior to take these parameters from themy_fworker.yaml
so this option is not needed.
Verify the workflow¶
These commands added a workflow for running a single structure optimization FireWork to your LaunchPad. You can verify that by using FireWorks’ lpad
utility:
lpad get_wflows
which should return:
[
{
"state": "READY",
"name": "Si--1",
"created_on": "2015-12-30T18:00:00.000000",
"states_list": "REA"
},
]
Note that the lpad
command is from FireWorks and has many functions. As simple modifications to the above command, you can also try lpad get_wflows -d more
(or if you are very curious, lpad get_wflows -d all
). You can use lpad get_wflows -h
to see a list of all available modifications and lpad -h
to see all possible commands.
If this works, congrats! You’ve added a workflow (in this case, just a single calculation) to the FireWorks database.
Submit the workflow¶
To launch this FireWork through queue, go to the directory where you would like your calculations to run (e.g. your scratch or work directories) and run the command
qlaunch rapidfire -m 1
There are lots of things to note here:
The
-m 1
means to keep a maximum of 1 job in the queue to prevent submitting too many jobs. As with all FireWorks commands, you can get more options usingqlaunch rapidfire -h
or simplyqlaunch -h
.The qlaunch mode specified above is the simplest and most general way to get started. It will end up creating a somewhat nested directory structure, but this will make more sense when there are many calculations to run.
One other option for qlaunch is “reservation mode”, i.e.,
qlaunch -r rapidfire
. There are many nice things about this mode - you’ll get pretty queue job names that represent your calculated composition and task type (these are really nice to see specifically which calculations are queued) and you’ll have more options for tailoring specific queue parameters to specific jobs. In addition, reservation mode will automatically stop submitting jobs to the queue depending on how many jobs you have in the database so you don’t need to use the-m 1
parameter (this is usually desirable and nice, although in some cases it’s better to submit to the queue first and add jobs to the database later which reservation mode doesn’t support). However, reservation mode does add its own complications and we do not recommend starting with it (in many if not most cases, it’s not worth switching at all). If you are interested by this option, consult the FireWorks documentation for more details.If you want to run directly on your computing platform rather than through a queue, use
rlaunch rapidfire
instead of theqlaunch
command (go through the FireWorks documentation to understand the details).
If all went well, you can check that the FireWork is in the queue by using the commands for your queue system (e.g. squeue
or qstat
). When the job finally starts running, you will see the state of the workflow as running using the command lpad get_wflows -d more
.
Analyzing the results¶
Once this FireWorks is launched and is completed, you can use pymatgen-db to check that it was entered into your results database by running
mgdb query -c <<INSTALL_DIR>>/config/db.json --props task_id formula_pretty output.energy_per_atom
This time, <<INSTALL_DIR>>
can be relative. You should have seen the energy per atom you calculated for Si.
Note that the mgdb
tools is only one way to see the results. You can connect to your MongoDB and explore the results using any MongoDB analysis tool. In later tools, we’ll also demonstrate how various Python classes in atomate also help in retrieving and analyzing data. For now, the mgdb
command is a simple way to get basic properties.
You can also check that the workflow is marked as completed in your FireWorks database:
lpad get_wflows -d more
which will show the state of the workflow as COMPLETED.
Next steps¶
That’s it! You’ve completed the installation tutorial!
See the following pages for more information on the topics we covered here:
To see how to run and customize the existing Workflows and FireWorks try the Running Workflows Tutorial (suggested next step)
For submitting jobs to the queue in reservation mode see the FireWorks advanced queue submission tutorial
For using pymatgen-db to query your database see the pymatgen-db documentation
Troubleshooting and FAQ:¶
Q: I can’t connect to my LaunchPad database¶
- A
Make sure the right LaunchPad file is getting selected
Adding the following line to your
FW_config.yaml
will cause the line to be printed every time that configuration is selectedECHO_TEST: Database at <<INSTALL_DIR>>/config/FW_config.yaml is getting selected.
Then running
lpad version
should give the following result if that configuration file is being chosen$ lpad version Database at <<INSTALL_DIR>>/config/FW_config.yaml is getting selected. FireWorks version: x.y.z located in: <<INSTALL_DIR>>/atomate_env/lib/python3.6/site-packages/fireworks
If it’s not being found, check that
echo $FW_CONFIG_FILE
returns the location of that file (you could usecat $FW_CONFIG_FILE
to check the contents)- A
Double check all of the configuration settings in
my_launchpad.yaml
- A
Have you had success connecting before? Is there a firewall blocking your connection?
- A
You can try following the tutorials of FireWorks which will go through this process in a little more detail.
Q: My job fizzled!¶
- A
Check the
*_structure_optimization.out
and*_structure_optimization.error
in the launch directory for any errors. Also check theFW.json
to check for a Python traceback.
Q: I made a mistake using reservation mode, how do I cancel my job?¶
- A
One drawback of using the reservation mode (the
-r
inqlaunch -r rapidfire
) is that you have to cancel your job in two places: the queue and the LaunchPad. To cancel the job in the queue, use whatever command you usually would (e.g.scancel
orqdel
). To cancel or rerun the FireWork, runlpad defuse_fws -i 1
or
lpad rerun_fws -i 1
where -i 1 means to make perfom the operations on the FireWork at index 1. Run
lpad -h
to see all of the options.
The non-reservation mode for qlaunching requires a little less maintenance with certain tradeoffs, which are detailed in the FireWorks documentation.
Q: I honestly tried everything I can to solve my problem. I still need help!¶
- A
There is a support forum for atomate: https://discuss.matsci.org/c/atomate