A First Encounter with SIESTA

Are you using this tutorial for a school?

Ask tutors if you need to set up your local environment. Do that now before proceeding.

This first tutorial provides a first acquaintance with SIESTA by studying one simple molecule, CH4. It covers most of the basic features and concepts, without worrying too much about issues of convergence or fine-tuning.

Take me to the exercises!

background Background Information

Additional information: Slides

These slides from previous presentations might help improve your understanding of the concepts presented in this tutorial:

As a general background for basis sets, you can watch the 2021 talks by Emilio Artacho:

This tutorial will only cover basic, general options for basis set generation. We encourage you to jump into the dedicated tutorial for basis set optimization afterwards.

The inputs of SIESTA

The essential inputs of SIESTA are the FDF (Flexible Data Format) file and the pseudopotential file.

FDF file

It is the core input, since it provides information about the system and sets the values of several parameters regarding the quality of the calculation, as well as the pseudopotential information.

The FDF file (input.fdf) is a plain-text input file. It is structured in name-value pairs—that is, a name characterizing the data and its value. Units can be specified after the value. These name-value pairs can be part of more complex structures called blocks (%block label [...] %endblock). The FDF file is case-insensitive, and characters -, _, and . in a data label are ignored.

The list of available descriptors is extensive; however, most of them have default values. To describe the system for a calculation, the minimum information needed is provided by the following descriptors:

As a part of the FDF file we can find the information regarding the basis set used. SIESTA offers many options for the specification of the basis set. Two of the key concepts are the cardinality of the set and range of the orbitals; both can be specified, in most cases, simply with a couple of lines.

  • The cardinality of the set, or its size, can be set by PAO.BasisSize. It follows the usual convention of amount of functions per valence orbital, the possible options being SZ, SZP, DZ, DZP… For example, for C, a SZ basis set has 4 functions (s+p) and a DZP basis set has 13 functions (2*s, 2*p, and a d polarization shell).

  • The energy shift is a physical concept determining the increase in energy brought about by the confinement of the orbital; the lower the energy shift, the larger the cut-off radii of basis orbitals. It is set using PAO.EnergyShift.

As in any DFT code, the exchange-correlation functional must be specified; the default option is using the local density approximation (LDA) for our calculations. However, it is also possible to use other functionals, such as those of GGA type. The information of the exchange-correlation functional should be provided like this:

XC.functional GGA
XC.authors    PBE

Notice that both XC.Authors and XC.Functional are needed, and they must be consistent, check the manual. For XC.Functional GGA, the most used variants are BLYP, PBE and PBEsol. For XC.Functional VDW, frequently used functionals are DRSLL, LMKLL and VV. See the manual for more options on XC functionals.

Pseudopotentials

The other essential input of SIESTA is the pseudopotential information. SIESTA uses pseudopotentials to represent the core electron-ion interaction, as do most plane-wave codes (in contrast to so-called “all-electron” programs). In particular, the pseudopotentials are of the “norm-conserving” kind. The choice of a pseudopotential significantly impacts the quality of the calculation.

SIESTA reads pseudopotentials from different files based on the species information in %block ChemicalSpeciesLabel. By default, files are searched for in the current directory. SIESTA accepts different formats, but PSML is the one recommended. Curated databases of high-quality PSML files are available. In particular, the Pseudo-Dojo project offers PSML files for almost the entire periodic table, along with reports detailing the tests performed during their generation. The database includes variants for scalar and fully relativistic pseudopotentials using LDA, PBE, and PBEsol functionals.

Generating a new pseudopotential from scratch is beyond the scope of this tutorial, but more details can be found in Pseudopotentials. Note that ATOM generates psf files, which are also supported by SIESTA.

The pseudopotential should be tested. The method for testing depends on the type of system being simulated, so reviewing existing literature is strongly recommended.

The output

SIESTA writes a log of its workings to standard output, which is recommended to be redirected to an output file. In the standard output, the entire thinking process of SIESTA can be followed: starting from reading the input parameters, defining the system, and proceeding to the self-consistent-field convergence.

System properties can be found in it; however, not all of them are written by default. More detailed information will be given in the exercises: see below.

basic Look at the input files

Are you under a school environment?

Enter directory 01-CH4-Basic

You will find an input file named ch4.fdf, along with the files C.psml and H.psml which contain the information about the pseudopotentials.

A peek at the fdf file
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
#General system specifications
SystemName          CH4 molecule
SystemLabel         ch4
NumberOfAtoms       5
NumberOfSpecies     2

%block ChemicalSpeciesLabel
 1  6 C   # Species index, atomic number, species label
 2  1 H   # Species index, atomic number, species label
%endblock ChemicalSpeciesLabel

#Atomic coordinates
AtomicCoordinatesFormat  Ang

%block AtomicCoordinatesAndAtomicSpecies
  0.000      0.000      0.000   1
  1.219     -0.284     -0.377   2
 -0.284      1.219     -0.377   2
 -0.140     -0.140      1.219   2
 -0.833     -0.833     -0.503   2
%endblock AtomicCoordinatesAndAtomicSpecies


#Unit cell for the calculation
LatticeConstant 15 Ang
%block LatticeVectors
  1.000 0.000 0.000
  0.000 1.000 0.000
  0.000 0.000 1.000
%endblock LatticeVectors


# Basis set definition
PAO.EnergyShift 250 meV
PAO.SplitNorm   0.15
PAO.BasisSize   SZ

#Real space grid
MeshCutoff 125.0 Ry

# Convergence of SCF
MaxSCFIterations 50
DM.MixingWeight 0.4
DM.NumberPulay  2

# Type of solution
SolutionMethod diagon





We find first the descriptors that specify the system

 2
 3
 4
 5
 6
 7
 8
 9
10
SystemName          CH4 molecule
SystemLabel         ch4
NumberOfAtoms       5
NumberOfSpecies     2

%block ChemicalSpeciesLabel
 1  6 C   # Species index, atomic number, species label
 2  1 H   # Species index, atomic number, species label
%endblock ChemicalSpeciesLabel

Note

You will notice several lines starting with #. These are comments on the file and will not be read by SIESTA.

The SystemName is only used for tagging purposes, while the SystemLabel will determine the root name for all output files (i.e., they will all begin by ch4.* in our case). NumberOfAtoms indicates the total number of atoms in our simulation box, and NumberOfSpecies indicates the different kinds of atoms present in the system.

Pay special attention to the %block ChemicalSpeciesLabel. In this block, you assign an index, an atomic number, and a label to each atomic species. The atomic number is, as usual, the nuclear charge of the atom. The label will allow SIESTA to recognize the files containing the information about the pseudopotential and the basis set (when provided); this means that the name of the label must coincide with the name of the pseudopotential file. The amount of species in this block must be the one specified by the NumberOfSpecies input.

Note

Options in the fdf file are case-insensitive. This means that NumberOfAtoms is the same as numberofatoms or NUMBEROFATOMS.

A few lines below, we find the coordinates for the atoms in the system

13
14
15
16
17
18
19
20
21
AtomicCoordinatesFormat  Ang

%block AtomicCoordinatesAndAtomicSpecies
  0.000      0.000      0.000   1
  1.219     -0.284     -0.377   2
 -0.284      1.219     -0.377   2
 -0.140     -0.140      1.219   2
 -0.833     -0.833     -0.503   2
%endblock AtomicCoordinatesAndAtomicSpecies

In this block, each of the lines corresponds to an atom. The first three numbers are the X, Y and Z coordinates for each atom, and the last (integer) number indicates the species index in the %block ChemicalSpeciesLabel. In this case, there is a Carbon atom (species index = 1) in the position (0.0,0.0,0.0), and the rest are the four Hydrogens (species index = 2).

The AtomicCoordinatesFormat option can also be set to Bohr (which is the default value), or to Fractional if one desires to set the coordinates as a fraction of the lattice vectors (see below).

You might have wondered about the existence of this block in the input file:

24
25
26
27
28
29
30
#Unit cell for the calculation
LatticeConstant 15 Ang
%block LatticeVectors
  1.000 0.000 0.000
  0.000 1.000 0.000
  0.000 0.000 1.000
%endblock LatticeVectors

It does not seem to make sense for a ‘molecule’ calculation. In fact, SIESTA always uses periodic boundary conditions (PBC); this means that we are doing a calculation for an infinite collection of regularly spaced molecules. If we want to simulate an isolated molecule, it is important to have enough distance between the molecule and its neighboring images. At the very minimum, there should not be overlap between the orbitals on different image molecules; however, this size should be bigger for molecules with a dipole, which will have a long-range interaction with their images in PBC.

The input option LatticeConstant is mandatory and multiplies the lattice vectors by that constant. In this case, we will end up with a cubic simulation box of 15 x 15 x 15 Angstrom.

Additional information: Periodic crystals

Imagine that instead of studying methane, we want to study bulk silicon. Silicon is a periodic face-centered cubic structure that can be defined by the primitive unit cell with a = 5.546790 Å, given by the lattice vectors:

\(a_1 = \frac{a}{2}(\hat{y} + \hat{z})\), \(a_2 = \frac{a}{2}(\hat{x} + \hat{z})\), and \(a_3 = \frac{a}{2}(\hat{x} + \hat{y})\).

The unit cell contains two atoms positioned at \((-0.125, -0.125, -0.125)\) and \((0.125, 0.125, 0.125)\) in fractional coordinates.

Answer: Can you write the structural parameters to define silicon bulk?

The descriptors that define the structure are: NumberOfSpecies, NumberOfAtoms, %block ChemicalSpeciesLabel, LatticeConstant, %block LatticeVectors, AtomicCoordinatesFormat and %block AtomicCoordinatesAndAtomicSpecies

Thus, the input will have an structure like the following:

SystemLabel       Si
NumberOfSpecies   1
NumberOfAtoms     2
%block ChemicalSpeciesLabel
  1  14  Si
%endblock ChemicalSpeciesLabel

LatticeConstant         5.546790 Ang
%block LatticeVectors
  0.00  0.50  0.50
  0.50  0.00  0.50
  0.50  0.50  0.00
%endblock LatticeVectors

AtomicCoordinatesFormat     Fractional
%block AtomicCoordinatesAndAtomicSpecies
-0.125  -0.125  -0.125     1    28.086
  0.125   0.125   0.125     1    28.086
%endblock AtomicCoordinatesAndAtomicSpecies

In the rest of the file you will find options regarding the quality of the calculation and other performance related variables:

33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
# Basis set definition
PAO.EnergyShift 250 meV
PAO.SplitNorm   0.15
PAO.BasisSize   SZ

#Real space grid
MeshCutoff 125.0 Ry

# Convergence of SCF
MaxSCFIterations 50
DM.MixingWeight 0.4
DM.NumberPulay  2

# Type of solution
SolutionMethod diagon

The specifics of these will be explored in other tutorials; but to provide a brief overview:

  • The first three variables are a rough definition the quality of the basis set. The number of orbitals per atom is defined by the parameter PAO.BasisSize, and we have set it to select a minimal basis (SZ) for a quick, cheap calculation disregarding quality.

  • The parameter MeshCutoff controls the fineness of the real-space grid used to compute the integrals for the matrix elements of the Hamiltonian. A higher value corresponds to a finer grid.

  • The next three options control the performance of the self-consistent field (SCF) cycle: MaxSCFIterations, DM.MixingWeight, DM.NumberPulay.

  • The kind of solver used is set by SolutionMethod.

While the parameters specifying the system are mandatory, all other parameters have some default values and, in principle, it is not necessary to explicitly include them in the input file. However, it is important to note that the default values do not always guarantee a fast or a good quality calculation.

basic Running SIESTA

To run SIESTA via command line, you just need to do

$ siesta < ch4.fdf > ch4.out

or simply

$ siesta ch4.fdf > ch4.out

You will see that a lot of output files are generated in the simulation directory. We will begin by having a look at the main output file, ch4.out (or any name that you gave to the standard output).

basic A look at the output

Take a look at ch4.out once the program has finished. The file contains a human-readable account of the doings of SIESTA for this calculation:

  • A header with the version and compilation options, and a copy (in the first mode above) or a mention (in the second mode) of the input file.

    Check it!
    Siesta Version  : 5.0
    Architecture    : ----
    Compiler version: GNU-11.4.0
    Compiler flags  : -fallow-argument-mismatch;-O3 -march=native
    PP flags        : ----
    Libraries       : ----
    Parallelisations: MPI
    
    [...]
    
    ************************** Dump of input data file ****************************
    #General system specifications
    SystemName          CH4 molecule
    SystemLabel         ch4
    NumberOfAtoms       5
    NumberOfSpecies     2
    
    [...]
    
    # Type of solution
    SolutionMethod diagon
    ************************** End of input data file *****************************
    
  • A summary of existing species and pseudopotential information.

    Check it!
    initatom: Reading input for the pseudopotentials and atomic orbitals ----------
    Species number:   1 Atomic number:    6 Label: C
    Species number:   2 Atomic number:    1 Label: H
    
    [...]
    
    Valence configuration for ps generation: (assumed as above)
    For C, standard SIESTA heuristics set lmxkb to 2
    (one more than the basis l, including polarization orbitals).
    Use PS.lmax or PS.KBprojectors blocks to override.
    For H, standard SIESTA heuristics set lmxkb to 1
    (one more than the basis l, including polarization orbitals).
    Use PS.lmax or PS.KBprojectors blocks to override.
    
  • Details on the basis set generation.

    Check it!
    <basis_specs>
    ===============================================================================
    C                    Z=   6    Mass=  12.011        Charge= 0.17977+309
    Lmxo=1 Lmxkb= 2    BasisType=split      Semic=F
    L=0  Nsemic=0  Cnfigmx=2
              i=1  nzeta=1  polorb=0  (2s)
    
    [...]
    
                      izeta = 1
                    lambda =    1.000000
                        rc =    4.810269
                    energy =   -0.447136
                    kinetic =    0.945150
        potential(screened) =   -1.392287
          potential(ionic) =   -1.930221
    atom: Total number of Sankey-type orbitals:  1
    
    atm_pop: Valence configuration (for local Pseudopot. screening):
    1s( 1.00)
    Vna: chval, zval:    1.00000   1.00000
    
    Vna:  Cut-off radius for the neutral-atom potential:   4.810269
    
  • A summary of the values used in the calculation for the most important parameters.

    Check it!
    siesta: ******************** Simulation parameters ****************************
    siesta:
    siesta: The following are some of the parameters of the simulation.
    siesta: A complete list of the parameters used, including default values,
    siesta: can be found in file out.fdf
    siesta:
    redata: Spin configuration                          = none
    redata: Number of spin components                   = 1
    
    [...]
    
    ts: **************************************************************
    ts: Save H and S matrices                           =    F
    ts: Save DM and EDM matrices                        =    F
    ts: Only save the overlap matrix S                  =    F
    ts: **************************************************************
    
  • Information of the geometry and matrix sparsity for a given geometry.

    Check it!
    outcell: Unit cell vectors (Ang):
          15.000000    0.000000    0.000000
            0.000000   15.000000    0.000000
            0.000000    0.000000   15.000000
    
    outcell: Cell vector modules (Ang)   :   15.000000   15.000000   15.000000
    outcell: Cell angles (23,13,12) (deg):     90.0000     90.0000     90.0000
    outcell: Cell volume (Ang**3)        :   3375.0000
    <dSpData1D:S at geom step 0
      <sparsity:sparsity for geom step 0
        nrows_g=8 nrows=8 sparsity=1.0000 nnzs=64, refcount: 7>
      <dData1D:(new from dSpData1D) n=64, refcount: 1>
    refcount: 1>
    new_DM -- step:     1
    
    [...]
    
    InitMesh: MESH = 108 x 108 x 108 = 1259712
    InitMesh: Mesh cutoff (required, used) =   125.000   143.274 Ry
    New grid distribution [1]: sub = 2
    
    stepf: Fermi-Dirac step function
    
  • Followed by a full breakdown of the initial, non-converged energy.

    Check it!
    siesta: Program's energy decomposition (eV):
    siesta: Ebs     =       -86.615317
    siesta: Eions   =       380.812454
    [...]
    siesta: Etot    =      -214.688379
    siesta: FreeEng =      -214.688379
    
  • A log of the self-consistent-field (SCF) cycle.

    Check it!
          iscf     Eharris(eV)        E_KS(eV)     FreeEng(eV)     dDmax    Ef(eV) dHmax(eV)
      scf:    1     -224.477055     -214.688379     -214.688379  1.130398 -6.803327  1.039357
    timer: Routine,Calls,Time,% = IterSCF        1       0.314  28.15
      scf:    2     -214.708050     -214.704182     -214.704182  0.022332 -6.481242  0.243330
      scf:    3     -214.704223     -214.704283     -214.704283  0.002871 -6.378875  0.148294
      scf:    4     -214.704298     -214.704314     -214.704314  0.001596 -6.231017  0.052566
      scf:    5     -214.704324     -214.704321     -214.704321  0.000517 -6.269207  0.001236
      scf:    6     -214.704321     -214.704321     -214.704321  0.000031 -6.268437  0.000383
    
    SCF Convergence by DM+H criterion
    max |DM_out - DM_in|         :     0.0000307918
    max |H_out - H_in|      (eV) :     0.0003831179
    SCF cycle converged after 6 iterations
    
  • A full breakdown of the total energy decomposition and forces, and stress.

    Check it!
    siesta: Program's energy decomposition (eV):
    siesta: Ebs     =       -88.492716
    siesta: Eions   =       380.812454
    [...]
    siesta:         Total =    -214.704321
    siesta:         Fermi =      -6.268437
    
    [...]
    siesta: Atomic forces (eV/Ang):
    siesta:      1    0.140447    0.140447   -1.039696
    siesta:      2   -2.496202    0.517116    0.806177
    siesta:      3    0.517116   -2.496202    0.806177
    siesta:      4    0.363575    0.363575   -1.130990
    siesta:      5    1.475788    1.475788    0.558249
    siesta: ----------------------------------------
    siesta:    Tot    0.000724    0.000724   -0.000084
    
    siesta: Stress tensor (static) (eV/Ang**3):
    siesta:     0.001324   -0.000018   -0.000133
    siesta:    -0.000018    0.001324   -0.000133
    siesta:    -0.000133   -0.000133    0.000671
    
  • The last part of the output file includes information on the cell pressure and the total electric dipole of the system.

    Check it!
    siesta: Cell volume =       3375.000000 Ang**3
    
    siesta: Pressure (static):
    siesta:                Solid            Molecule  Units
    siesta:          -0.00001205          0.00000000  Ry/Bohr**3
    siesta:          -0.00110663          0.00000026  eV/Ang**3
    siesta:          -1.77300959          0.00042191  kBar
    (Free)E+ p_basis*V_orbitals  =        -214.210260
    (Free)Eharris+ p_basis*V_orbitals  =        -214.210260
    
    siesta: Electric dipole (a.u.)  =   -0.011736   -0.011736    0.009580
    siesta: Electric dipole (Debye) =   -0.029830   -0.029830    0.024350
    
  • And to finish, literature regarding the simulation done.

    Check it!
    cite: Please see "ch4.bib" for an exhaustive BiBTeX file.
    cite: Please clearly indicate Siesta version in published work: 5.1-MaX
    cite: This calculation has made use of the following articles
    cite: which are encouraged to be cited in a published work.
        Primary SIESTA paper
          DOI: www.doi.org/10.1088/0953-8984/14/11/302
    

As you can see, SIESTA generates a ton of output files besides the main log/out file. Each of these might interest you depending on what you intend to do with your simulation, but some of the most commonly relevant files are:

  • *.DM files (ch4.DM) contain a restart of the Density Matrix. Useful to restart already converged or close-to-convergence calculations without doing so from scratch.

  • *.XV files (ch4.XV) contain a restart of the current geometry and cell size. This is particularly useful to restart or continue molecular dynamics runs.

  • *.FA files (ch4.FA) contain total forces over all atoms. Each line corresponds to the cartesian force (x, y and z) over each atom (in eV/Ang).

  • *.EIG files (ch4.EIG) contain the eigenvalues of the Hamiltonian for all k-points involved. k-points are explored in more detail in Sampling of the BZ with k-points.

  • *.times and TIMES files (ch4.times and TIMES) contain timing information, which is invaluable when optimizing parameters for fast performance.

Now that you had a brief overview of how running SIESTA works, let’s play a bit with your CH4 inputs. See what happens if…:

  • You change the fineness of the real-space grid. Try an unreasonably low value for the the parameter MeshCutoff (around 10 Ry) and check the resulting total energy and forces (you can also find the forces in the file ch4.FA). Then experiment with a much higher one (300 Ry). More on this topic will be covered in this tutorial.

    Answer: Change of MeshCutoff.

    Everytime that you run SIESTA in a folder, all the output will be overwritten, thus to be able to compare the results of the diferent runs, a specific directory should be created for each of them.

    The information or the total energy can be extracted in a more automatic way doing a grep of Total = of the standard output file and to extract the total force look for the line that contains siesta:    Tot; for example if you have structured the calculation in different folders:

    $ grep -i  "siesta:    Tot" */output.dat
    
    10ry/output.dat:siesta:    Tot    1.798364    1.798364   -3.227789
    300ry/output.dat:siesta:    Tot   -0.000058   -0.000058   -0.000291
    
    $ grep -i  "siesta:    Total =" */output.dat
    
    10ry/output.dat:siesta:         Total =    -230.840670
    300ry/output.dat:siesta:         Total =    -222.538059
    
  • You can play with the size of the lattice parameters to go from ‘interacting’ molecules to effectively isolated ones. Look at the variation in the total energy as a function of the cell size, to see how the interaction between molecules decreases with increasing distance between images. For this non-polar molecule, the interaction should be very small.

    Answer: Change of lattice parameter.

    Everytime that you run SIESTA in a folder, all the output will be overwritten, thus to be able to compare the results of the diferent runs, a specific directory should be created for each of them.

    The information or the total energy can be extracted in a more automatic way doing a grep of Total = of the standard output file and to extract the total force look for the line that contains siesta:    Tot. For example, if you have structured the calculation in different folders:

    $ grep -i  "Total =" */output.dat
    
    15AA/output.dat:siesta:         Total =    -222.538221
    35AA/output.dat:siesta:         Total =    -222.538164
    7_5AA/output.dat:siesta:         Total =    -222.538303
    

basic Simple basis-sets options

Now let’s change some options related to basis-sets.

Are you under a school environment?

Enter directory 02-CH4-Basis

You can find the input files here:

A peek at the fdf file
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
#General system specifications
SystemName          CH4 molecule
SystemLabel         ch4
NumberOfAtoms       5
NumberOfSpecies     2

# Species index, atomic number, species label
%block ChemicalSpeciesLabel
 1  6 C
 2  1 H
%endblock ChemicalSpeciesLabel

#Atomic coordinates
AtomicCoordinatesFormat  Ang
%block AtomicCoordinatesAndAtomicSpecies
  0.000      0.000      0.000   1
  1.219     -0.284     -0.377   2
 -0.284      1.219     -0.377   2
 -0.140     -0.140      1.219   2
 -0.833     -0.833     -0.503   2
%endblock AtomicCoordinatesAndAtomicSpecies

#Unit cell for the calculation
LatticeConstant 15 Ang
%block LatticeVectors
  1.000 0.000 0.000
  0.000 1.000 0.000
  0.000 0.000 1.000
%endblock LatticeVectors

#Real space grid
MeshCutoff 125.0 Ry

# Convergence of SCF
DM.MixingWeight 0.4
DM.NumberPulay  2

# Variables for Geometry optimization
# MD.TypeOfRun   CG
# MD.NumCGsteps  50

# Basis set definition
PAO.BasisSize   SZ
PAO.EnergyShift 250 meV

Change basis-set size or cardinality of the basis. Set PAO.BasisSize to any of: DZ, SZP, DZP, TZP. See what happens with the total energy.

Answer: Change of basis-set size.

In order to modify the bases-set size you should edit the line with the specific descriptor in the fdf file.

33
34
35
36
# Basis set definition
PAO.EnergyShift 250 meV
PAO.SplitNorm   0.15
PAO.BasisSize   SZ

SZ should be replaced by DZ, SZP or DZP.

Everytime that you run SIESTA in a folder, all the output will be overwritten; thus, to be able to compare the results of the diferent runs, a specific directory should be created for each of them.

The information of the total energy can be obtained by manually checking the standard output (as suggested above) or in a more automatic way doing a grep of Total = of the standard output file. For example, if you have structured the calculation in different folders:

$ grep -i "Total =" */output.dat
ene_DZ/output.dat:siesta:         Total =    -223.913569
ene_DZP/output.dat:siesta:         Total =    -224.380785
ene_SZ/output.dat:siesta:         Total =    -222.538221
ene_SZP/output.dat:siesta:         Total =    -222.825577

Let’s look at the effect of changing the basis set in some parameters with physical meaning, like the bond length. In this tutorial you will be doing a geometry optimization for CH4.

The input file ch4.fdf is already set up to optimize the geometry of a single methane molecule, but until now it was comemented. Let’s uncomment the corresponding lines by removing # (line 39 and 40).

38
39
40
# Variables for Geometry optimization
# MD.TypeOfRun   CG
# MD.NumCGsteps  50

At the bottom of the file, you will find two options used to define the basis set:

PAO.BasisSize   SZ
PAO.EnergyShift 250 meV

The idea of this exercise is to see what happens with the total energy and the bond lengths of the system when modifying these options. The bond lengths can be found in the ch4.BONDS_FINAL file.

The SystemLabel.BONDS_FINAL contains the information regarding the atomic position of each atom of the unit cell and their distance with respect to each of the other ones.

  • See what happens when you increase the size of the basis set to SZP and DZP.

    Answer: Effect on the bond’s length.

    The bond length will be found in the ch4.BONDS_FINAL. As we are studing ch4 we will have 1 C and 4 H, for each of them we will know the distances to their nearest neighbors. Let’s check the output for the SZ case focusing on the carbon atom:

    Neighbors of:     1 C at:  -0.0150 -0.0150 -0.0133
    4 H                     1.1911  Ang. Really at:  -0.1868 -0.1868  2.2244
    3 H                     1.1913  Ang. Really at:  -0.5086  2.0880 -0.6471
    2 H                     1.1913  Ang. Really at:   2.0880 -0.5086 -0.6471
    5 H                     1.1915  Ang. Really at:  -1.4499 -1.4499 -0.9885
    

    The index and position of carbon are given by the first line, then, for each H atom, we have the index, label and distance to the carbon atom. Also, their absolute position is included. Your final results should look similar to this:

    Size     Total energy               C-H2    C-H3   C-H4   C-H5
    --------------------------------------------------------------
    SZ : Total: -222.996782 eV, Bonds: 1.1913/1.1913/1.1911/1.1915
    SZP: Total: -223.522888 eV, Bonds: 1.1601/1.1601/1.1596/1.1591
    DZP: Total: -225.830785 eV, Bonds: 1.1126/1.1126/1.1146/1.1149
    
  • For DZP, try changing the energy shift to 100 meV, 50 meV and 10 meV.

    Answer: Change of the energy shift.
    Shift     Total energy              C-H2    C-H3   C-H4   C-H5
    --------------------------------------------------------------
    100: Total: -225.929648 eV, Bonds: 1.1220/1.1220/1.1222/1.1223
    50 : Total: -225.929092 eV, Bonds: 1.1194/1.1194/1.1196/1.1197
    10 : Total: -225.760365 eV, Bonds: 1.1289/1.1289/1.1292/1.1296
    

Consider that the experimental bond-length for gas methane is 1.087 Å.

  • Which has the biggest effect in the energy, increasing the cardinality/size of the basis set, or changing the energy shift?

    Answer: Effect in the energy.

    From the first experiment, we can see that the cardinality of the basis set induces a change in the range of a few eV, which is much larger than that induced by the energy shift. Be aware that this happens when the basis set is relatively small; for basis sets larger than DZP (such as TZP) you don’t usually get a huge improvement. However, going to SZ to SZP, or from SZP to DZP, the change in flexibility of the basis set is a determining factor in the quality of the results.

  • In which case would you consider the basis set is “good enough”?

    Answer: How do we select the good one?

    We definitely need the DZP basis set to start getting reasonable results. But with regards to the energy shift, we can see that 50 meV is already reasonable enough to get good results. The total energy seems to be already converged to the meV level, and the bond lengths are also reasonable. That said, increasing the energy shift is not detrimental in this case, it will just increase the time of the calculation.

  • Is it ok to compare the results to an experimental bond length? Or would it be better to check against an already converged basis set?

    Answer: Is the comparison with the experimental possible?

    The question is tricky and depends on what you are going after. If you are testing convergence of parameters, then by all means, compare to an already converged case, even if it is with another type of basis set (such as gaussians or plane-waves). That said, if you are already sure that your input parameters are good enough, then comparing to experimental values is a good idea: it will give you a hint on how your level of theory (i.e., DFT with a given functional) is able to model the real world data. In most cases, the best approach is to compare trends between experiments and theory, rather than absolute values.

basic DFT functional

Are you under a school environment?

Enter directory 03-CH4-XC-Functional

You can find the input files here:

A peek at the fdf file
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
#General system specifications
SystemName          CH4 molecule
SystemLabel         ch4
NumberOfAtoms       5
NumberOfSpecies     2

# Species index, atomic number, species label
%block ChemicalSpeciesLabel
 1  6 C
 2  1 H
%endblock ChemicalSpeciesLabel

#Atomic coordinates
AtomicCoordinatesFormat  Ang
%block AtomicCoordinatesAndAtomicSpecies
  0.000      0.000      0.000   1
  1.219     -0.284     -0.377   2
 -0.284      1.219     -0.377   2
 -0.140     -0.140      1.219   2
 -0.833     -0.833     -0.503   2
%endblock AtomicCoordinatesAndAtomicSpecies

#Unit cell for the calculation
LatticeConstant 15 Ang
%block LatticeVectors
  1.000 0.000 0.000
  0.000 1.000 0.000
  0.000 0.000 1.000
%endblock LatticeVectors

# Basis set definition
PAO.EnergyShift 250 meV
PAO.SplitNorm   0.15
PAO.BasisSize   SZ

# Exchange-Correlation functional
XC.Functional LDA   # Try: GGA
XC.Authors    PZ    # Try: PBE, PBEsol

#Real space grid
MeshCutoff 125.0 Ry

# Convergence of SCF
DM.MixingWeight 0.4
DM.NumberPulay  2


Up to now we have been implicitly using the local density approximation (LDA) for our calculations, which is the default option in SIESTA. However, it is also possible to use other functionals, such as those of GGA type. Edit the ch4.fdf file to include this block:

XC.functional GGA
XC.authors    PBE

Notice that both XC.Authors and XC.Functional are needed, and they must be consistent. For XC.Functional GGA, the most used variants are BLYP, PBE and PBEsol. For XC.Functional VDW, frequently used functionals are DRSLL, LMKLL and VV. See the manual for more options on XC functionals.

Run SIESTA and look for possible lines with WARNING or ERROR in them.

Answer: Change of the functional.

Even though the program ran successfully, you will find the following warning in the output:

xc_check: WARNING: Pseudopotential generated with LDA PW92 functional

You are using a GGA functional with a pseudopotential generated using LDA, as this is not consistent!. Fortunately, we have produced also the pseudopotentials using GGA for you. They are in the files C.gga.psml and H.gga.psml.

Modify the input file again to use these files by simply changing the species strings:

%block ChemicalSpeciesLabel
1  6 C.gga   # Species index, atomic number, species label
2  1 H.gga   # Species index, atomic number, species label
%endblock ChemicalSpeciesLabel

Remember, the species name can be anything you want, but it must be consistent with the name of the pseudopotential files. Run the program again and check whether the warning disappears from the output.

It is worth noting that, as with pseudopotentials, DFT functionals are not easy to compare between each other systematically. Again, it is worth expending some time searching the literature for the type of simulations you are trying to run.

Additional information: Visualizing the structure

To visualize the structure included in the fdf file siesta provides a utility that allows you to convert it to stardand crystal formats, such us xsf or POSCAR, which then can be open with most of the graphical tools.

  • xv2xsf [it is available with the usual SIESTA compilation]. Converts the XV output, where atomic position and lattice vectors are saved, to an xsf file.

$ xv2xsf

It will ask which is the SystemLabel used, which, for this initial tutorial, it was ch4.

Specify  SystemLabel (or 'siesta' if none): ch4
Found and opened: ch4.XV

Then, a ch4.XSF is genereated, and it can be visualize, for example using ovito.

../../../_images/ch4.png
  • sgeom [it is an exerternal tool part to the sisl python library]. Converts either fdf file or the XV to POSCAR file, additionally allows you to do some simple modifications to the structure.

$ sgeom ch4.fdf POSCAR

or

$ sgeom ch4.fdf ch4.xsf