.. title:: Creating workflows .. _creating workflows: ================== Creating Workflows ================== Introduction ============ The :ref:`running workflows tutorial` walked you through customizing and running a preset VASP workflow from the library of YAML files. While these presets are useful standards for making specific, straightforward calculations, it is often desirable to create your own workflows. Ideally, custom workflows can be constructed entirely with existing Fireworks. Generally speaking, any workflow that is a series of calculations where only the input files and calculation settings change can be constructed from existing Fireworks. The workflow can have either a simple or complex dependency graph. This guide will explain how to create such custom workflows in Python. If your desired workflow cannot be created with existing Fireworks, you'll have to code your own Fireworks. It should be mostly straightforward to do so by looking at the currently implemented Fireworks in atomate and using that as a guide to code your own. For now, we will assume that the Fireworks you need already exist. Objectives ========== * Get familiar with the internal structure of Workflows, Fireworks, and Firetasks * Learn to compose atomate Workflows in Python from existing Fireworks * Understand what powerups are and how to use them * Learn how the FireWorker environment can control behavior Prerequisites ============= In order for you to understand this guide you should * Have a working installation of atomate * Understand how to add Workflows to your FireWorks LaunchPad and run them * Be comfortable with Python functions and classes FireWorks Background ==================== All of the machinery for constructing workflows in atomate is from `FireWorks`_, which is a generalized library for constructing and running workflows. atomate uses the FireWorks library to construct workflows that combine materials science manipulation and analysis implemented in pymatgen_, database communication from pymatgen-db_, and job running from custodian_. The FireWorks documentation provides many excellent tutorials and guides for using and understanding the use of the library, which may be useful. A FireWorks Workflow is made up of Fireworks, which are made up of Firetasks. The description of each of these is best described by the `FireWorks documentation`_ itself: .. pull-quote:: * A Workflow is a set of FireWorks with dependencies between them. For example, you might need a parent Firework to finish and generate some output files before running two child FireWorks. * A Firework contains the JSON spec that includes all the information needed to bootstrap your job. For example, the spec contains an array of Firetasks to execute in sequence. The spec also includes any input parameters to pass to your Firetasks. You can easily perform the same function over different input data by creating Fireworks with identical Firetasks but different input parameters in the spec. You can design your spec however you’d like, as long as it’s valid JSON. The JSON format used for Firework specs is extremely flexible, very easy to learn (Python users familiar with dicts and arrays essentially already know JSON), and immediately makes rich searches over the input data available to end users through MongoDB’s JSON document search capabilities. * A Firetask is an atomic computing job. It can call a single shell script or execute a single Python function that you define (either within FireWorks, or in an external package). In atomate, we typically design each Firework to represent one calculation or analysis step. In the example below, a typical bandstructure calculation first optimizes a crystal structure, then performs a static calculation, and finally non-SCF calculations on the band path and uniformly though the structure (which can be executed in parallel since they are not dependent on one another). Each of these calculations (each time you would run VASP) corresponds to a Firework and thus this bandstructure workflow contains 4 Fireworks. One could design this workflow with more or less Fireworks, but in our opinion this is the most natural and useful mapping. For example, note that one Firework most typically corresponds to one queue submission on a supercomputing center (although this can vary depending on the chosen execution mode). The Firetasks of each Firework are all of the individual steps that are performed for each calculation step. In the example, the structure optimization Firework has tasks to write the VASP input files based on the input structure, run VASP, parse the VASP outputs, and pass the output structure on to the static calculation. For the purposes of this tutorial, you don’t need to worry much about the details of Firetasks, but it helps to appreciate that Fireworks are simply composed of a few of these tasks performed in sequence that allow one to abstract out tedious calculation details like file IO. In the same way, Workflows exist to abstract out the details of different calculation and analysis steps contained in FireWorks. .. figure:: _static/bandstructure_wf.png :alt: Bandstructure workflow. :scale: 50% Bandstructure workflow. The Optimize bulk structure Firework is a parent with one child: a Static SCF Firework. The Static SCF Firework has two children: a Static Non-SCF (line) Firework and a Static Non-SCF (uniform) Firework to get the DOS on a path and uniformly, respectively. In terms of execution, the Optimize bulk structure Firework will run the four Firetasks in sequential order with data as described by the spec. Once the Optimize bulk structure Firework and Static SCF Firework complete, the line and uniform SCF Fireworks can be launched and executed in parallel or in series. Since these Fireworks do not depend on each other, the execution order does not matter. Similar to the Optimize structure Firework, all of these Fireworks will run their respective Firetasks in series. .. _pymatgen: http://pymatgen.org .. _custodian: https://materialsproject.github.io/custodian/ .. _FireWorks: https://materialsproject.github.io/fireworks/ .. _pymatgen-db: https://materialsproject.github.io/pymatgen-db/ .. _FireWorks documentation: https://materialsproject.github.io/fireworks/index.html#workflow-model Creating atomate Workflows ========================== Sketch the workflow ------------------- The first step when you go to write any workflow is to sketch a graph of the workflow. In particular, you want to sketch a `directed acyclic graph`_, which for atomate just means that each calculation step is a node in the graph and parents only point to their children such that there are no loops (cycles) in the graph. Again, you should think of each large single invocation of the scientific code (VASP, LAMMPS, etc.) as corresponding to one Firework. Analysis tasks that aggregate results from several Fireworks, e.g. tasks that analyze volume deformations, also need their own Fireworks that have the calculations as parents. All of the workflow examples in the docs and the atomate paper (submitted) are valid workflow graphs. .. note:: An advanced method of workflow programming in the FireWorks package allows the workflow to change depending on the results of execution. For example, a Firework can add more Fireworks to the workflow based on results that it obtained. If you need some kind of looping or branching at a high level, you'll have to write a custom Firetask that creates new Fireworks on the fly. Writing custom Firetasks in atomate may be covered more in depth in a future workflow, but it is beyond the scope of this guide. `Fireworks documentation for dynamic workflows`_ discusses this from an abstract perspective. .. _directed acyclic graph: https://en.wikipedia.org/wiki/Directed_acyclic_graph .. _Fireworks documentation for dynamic workflows: https://materialsproject.github.io/fireworks/dynamic_wf_tutorial.html Finding Fireworks ----------------- Once you have identified each calculation or analysis step as a Firework in our graph, we must determine which Fireworks correspond to each node in our graph. The full Python documentation for each of the atomate Fireworks can be found in the :py:mod:`atomate.vasp.fireworks module ` documentation or the corresponding documentation page for the software you want to use. Currently FEFF, LAMMPS, and VASP are supported. Some available VASP Fireworks are * OptimizeFW * StaticFW * TransmuterFW * HSEBSFW * NonSCFFW * DFPTFW * SOCFW * MDFW * BoltztrapFW * NEBRelaxationFW * NEBFW .. warning:: The majority of these Fireworks are not complicated to run and you can use them directly. Some are less obvious and you should refer to the documentation for the specifics of each of these and how to use them before you do. For example, the HSEBSFW requires a previous calculation that gives VBM/CBM information or the high-symmetry kpoints. .. note:: Some of these Fireworks only differ in VASP settings or options. For example, a simple OptimizeFW could in principle be customized to achieve the same functionality as several other VASP Fireworks in the same way that two types of burgers at a fast-food restaurant might become equivalent if you customized each order enough. One of the main settings to pay attention to, particularly in VASP, is the input set used which determines things like functional, pseudopotential, and convergence settings. The input sets are all defined in pymatgen, such as `pymatgen.io.vasp.sets`_ or `pymatgen.io.feff.sets`_. Most of these are fairly straightforward, but one Firework to pay specific attention to is the TransmuterFW. The TransmuterFW is very powerful because it supports transforming structures by any of the transformations supported in `pymatgen.transformations`_. There are many supported transformations in pymatgen, some of the more common transformations include * SupercellTransformation: Create supercells from a scaling matrix * SubstitutionTransformation: Substitute one species for another * PerturbStructureTransformation: Applies a perturbation of a specified amplitude to each site * DeformStructureTransformation: Apply a deformation matrix to the lattice matrix of the structure You'll notice that more generic Fireworks and Fireworks for analysis tasks are not in this list. Because they are intended to be simple and modular, these Fireworks are actually written as plain Firetasks that will get wrapped into a Firework when the Workflow is constructed. We’ll cover how to use them as Fireworks in the next section, but know that they can be found at places like :py:mod:`atomate.common.firetasks module ` for software-agnostic tasks such as ``PassCalcLocs`` or :py:mod:`atomate.vasp.firetasks package >vasp_cmd<<'`` and ``'>>db_file<<'`` for each of these parameters, respectively. More on this behavior in the `env_chk`_ section. .. _Blanco et al. Comput. Phys. Commun., 158, (2004): http://dx.doi.org/10.1016/j.comphy.2003.12.001 .. _Togo and Tanaka, Scr. Mater., 108 (2015): http://dx.doi.org/10.1016/j.scriptamat.2015.07.021 .. _good visualization for Helmholtz energies and volumes with increasing temperature: https://atztogo.github.io/phonopy/qha.html#phonopy-qha More help --------- * Reading the source the base Workflows (:py:mod:`atomate.vasp.workflows.base`) would be a good place to find functional examples with reasonable parameters * Creating workflows can also be done to some extent in YAML files. You can adapt the example along with a short explanation of the :ref:`workflow YAML reference`. Modifying workflows =================== There are a few other interesting features of workflows in atomate that make writing dynamic and customizable workflows easier. Powerups -------- Powerups (:py:mod:`atomate.vasp.powerups`) enable modifications to be made to workflows **after** they have been created. The benefit of these is that you can apply powerups conditionally in code or to a large list of Workflows that you'll later add to your LaunchPad. Some powerups affect the behavior of your calculations, others simply add metadata or change how the individual Fireworks interact with the database under the hood. Some useful powerups that affect the behavior of VASP are * ``add_modify_incar``: Update the INCAR of Fireworks specifed by (partially matched) name at runtime * ``set_fworker``: specify certain FireWorkers for workflows. Useful for FireWorkers tuned for high-memory or high-precision jobs * ``modify_to_soc``: makes all of the VASP calculations that match the constraints take spin orbit coupling into account * ``remove_custodian``, ``use_custodian``, ``run_fake_vasp``: Choose to run VASP with or without custodian (or not at all, useful for debugging) Powerups that modify how FireWorks runs and can interact with workflows as they run * ``add_priority``: adds priority to root and child jobs to ensure that calculations that start have priority to finish over unstarted Fireworks * ``add_namefile``: put a ``FW-->>fw.name<<`` file in the launch directory so searching filesystems for particular Fireworks is easy. * ``add_trackers``: Fireworks will report the last few lines of OUTCAR and OSZICAR files that can be used to track jobs as they are still running * ``add_wf_metadata`` and ``add_tags``: add metadata to workflows for easier querying * ``add_stability_check`` and ``add_bandgap_check``: end workflows if the calculated structure has much lower energy than a materialsproject.org structure or if the bandgap is above or below a certain threshold .. _env_chk: env_chk ------- Workflows in atomate are powerful for getting science done quickly because they are designed to be easily run heterogenously on different computing resources. ``env_chk`` enables this functionality by letting the user specify parameters that support ``env_chk``, such as ``db_file``, ``vasp_cmd``, and ``incar_update``. These allow different resources (or simply different ``my_fworker.yaml`` files on the same compute resource) to have settings specific to workflows that they run. Some ideas for using ``env_chk`` like this are * Be able to quickly switch between different database files that are associated with different research projects * Ensure more consistent and easier usage of INCAR parameters you use often, such as setting a high ``NEDOS`` INCAR parameter * Set FireWorkers up for low and high precision jobs, or normal and high-memory jobs on the same computing resource. To use ``env_chk``, you don't have to do anything explicity, just pass ``'>>db_file<<'``, ``'>>vasp_cmd<<'``, ``'>>incar_update<<'`` to any parameter that supports ``env_chk``. Currently supported ``env_chk`` variables are: * ``>>scratch_dir<<`` * ``>>gamma_vasp_cmd<<`` * ``>>vasp_cmd<<`` * ``>>db_file<<`` If you think there are other potentially useful variables that should support ``env_chk``, please propose your idea in the `atomate Discourse forum`_ (or better, submit a pull request)! PassCalcLocs ------------ In order to be able to act on and modify the initial structure as it changes throughout workflows, there needed to be a way to refer to previous calculations that without hard coding file paths or by meticululously keeping track of paths throughout Firetasks and Fireworks. ``PassCalcLocs`` (:py:mod:`atomate.common.firetasks.glue_tasks`) solves this problem by easily tracking previous calculation directories and making them available to consecutive Fireworks, even between different computing resources. ``PassCalcLocs`` enables Fireworks to be dynamically added and removed from workflows during runtime, enabling features such as branching during the calculation. Such branching could be used by having different workflow steps for handing metals and non-metals. Conclusion ========== Understanding this guide has enabled you to create arbitrarily complex atomate workflows with any combination of Firetasks and Fireworks, but not everything was able to be covered in detail with examples. See the :ref:`customizing workflows` documentation for specific examples for customizing workflows that you can adapt to your needs. If any of this was unclear, or if you feel that useful documentation is missing, please leave us feedback on the `atomate Discourse forum`_! To see all of the different pieces you can control with Python, go to the :ref:`API documentation `. Many customization options and features of interest are not in atomate alone, but in `FireWorks`_, `pymatgen`_, and `custodian`_. Mastering FireWorks will enable you to get the most out of executing and managing your workflows. Mastering pymatgen will help you write complex materials workflows and perform sophisticated analyses of results. .. _atomate Discourse forum: https://discuss.matsci.org/c/atomate