Note

Before doing this tutorial, we encourage you to do and review the more general tutorial on basis sets.

# Basis set optimization¶

Using the default basis sets generated by SIESTA might be enough for some applications, but doing some degree of manual optimization of the basis sets may help to achieve better results with similar computational costs. This is especially advisable when dealing with extremely large systems, where going for high basis set cardinality (triple-zeta, quadruple-zeta) is not really an option.

Note

This tutorial shows just one possible method to optimize your basis sets. Different people tend to use different techniques, there is no “one true way”. This method does, however, work reliably across different kinds of systems.

## General Concepts¶

When optimizing basis sets, the concept of transferability is one of the most important ideas to keep in mind. You might get the ultimate basis set that gives the perfect results for a given set of atoms in a given set of coordinates; however, if even a slight re-arrangement of those atoms produces terrible results, that means your basis set is not transferable. Ideally, one should aim for highly transferable basis sets that also produce reasonable results.

This ties in with a key aspect of basis set optimization: defining your optimization function, i.e., the magnitude that you will be using to monitor how well your basis is performing. One good magnitude to look at is the total energy of a given system, but even better is to use the concept of basis enthalpy (see below).

While it might be tempting to optimize basis sets using bond lengths, interaction energies, or even band structures, these typically result in poorly transferable basis sets. The use of these magnitudes as optimization functions is not encouraged; however, these can be used after the optimization is done in order to benchmark the behaviour of the newly optimized basis set.

Having defined the optimization function, one needs to define which are the optimization variables, i.e. the things that we will modify manually to improve our basis sets. While SIESTA has a ton of possibilities to modify your basis set, the most important targets for optimization are the orbital radii: the cut-off radii for first-zeta orbitals, and the matching radii for multiple-zeta. Polarization orbitals can also be optimized separately, but the way these are generated in SIESTA is generally good and they usually do not require further optimization.

Note

To summarize:
• We aim for good-quality, highly-transferable basis sets.

• Typically we will be modifying the cut-off radii of first-zeta and the matching radii of multiple-zeta orbitals.

• We will take the basis enthalpy (or the total energy) of a system as our main way of optimizing the basis sets.

## Basis Enthalpy¶

The basis enthalpy is a fictitious magnitude that adds an extra term to the energy, which depends on the volume of the orbitals; essentially, this prevents the basis orbitals from getting unnecessarily large. This extra term is essentially P*V, where V is the orbital volume and P is a fictitious “pressure”. This pressure is an input we will use in SIESTA.

For bulk materials such as metals (Cu, Fe) or semiconductors (Si, Ge), the default value of 0.2 GPa for the basis pressure produces reasonable results. However, for small atoms in isolated molecules (C, O, H, etc.) this pressure results in extremely short cut-off radii. One typically reduces the basis pressure to a tenth of the default value, i.e. 0.02 GPa.

How do you know whether your orbitals are large enough? A good rule of thumb is to consider that orbitals with a cut-off radius of less than 5.0 Bohr are very small, while a cut-off radius larger than 10.0 Bohr is unnecessarily large. Of course, there are a few exceptions: 1s and 2s orbitals of the first two rows of the periodic table (specially H) can be slightly smaller than 5.0 Bohr, but going below 4.0 Bohr should be avoided. In the oposite case, when dealing with surfaces or 2D materials one might need larger, diffuse orbitals in order to better describe the area around the exposed atoms: these diffuse orbitals might easily reach 10 to 12 Bohr.

## The Water molecule¶

Molecular dynamics of water (or aqueous systems in general) is still one of the most common types of simulations. In this tutorial, we will optimize the basis set for a single water molecule, and we will use the energy of a water dimer as a way to benchmark the quality of our basis functions.

Take a look at the wat.fdf file, in particular, the following lines:

PAO.SoftDefault     T
PAO.OldStylePolOrbs F
PAO.EnergyShift     0.001 Ry


The PAO.SoftDefault flag enables the usage of soft-confinement potentials (ensuring the derivatives of the radial functions become zero at the orbital cut-off radii). PAO.OldStylePolOrbs being set to False indicates that we will be using the newest way to generate polarization orbitals. You can try turning on and off these options to see how it impacts the radial shape of your basis functions.

The value for PAO.EnergyShift was chosen as explained in the previous tutorial. Since we will be optimizing the orbital radii, the value of the energy shift will become irrelevant once we are done; however, choosing an appropriate value for it provides a good starting point for the optimization.

Further below, you will find the full basis especifications under the PAO.Basis block. For example

%block PAO.Basis
H 1
n=1 0 2 P 1     # n, l (=0, s-orbital), number of zetas, include 1 set of polarization orbitals
0.0 0.0     # rcut for first zeta, rmatch for second zeta
O 2               # Number of different l
n=2 0 2         # n, l (=0, s-orbital), number of zetas
0.0 0.0     # rcut for first zeta, rmatch for second zeta
n=2 1 2 P 1     # n, l (=1, p-orbital), number of zetas, include 1 set of polarization orbitals
0.0 0.0     # rcut for first zeta, rmatch for second zeta
%endblock PAO.Basis


In this example, the input is the same as specifying the flag PAO.BasisSize DZP, since we have a double-zeta basis set for all valence orbitals, and we are including a single polarization orbital set for each atom.

Notice that the values for rcut and rmatch are all set to 0.0; this tells SIESTA that it should automatically find appropriate values for each of those radii using the energy shift. By the end of this tutorial, we will have set each of these manually.

Since we are using small atoms, the basis pressure will be set to 0.02 GPa, as you can see in the following line:

BasisPressure  0.02 GPa


It is also important to note that we have set up a relatively big simulation box. Since water is an isolated molecule and not a bulk metal, we want to avoid orbital overlaps between one molecule and its periodic images

LatticeConstant 15.0 Ang
%block LatticeVectors
1.0 0.0 0.0
0.0 1.0 0.0
0.0 0.0 1.0
%endblock LatticeVectors


## Optimizing the first-zeta cut-off radii¶

Note

Go to the subdirectory 1-BasisOptimization.

We will first start by optimizing the cut-off radii for the first zeta orbitals. This means we will be changing the first 0.0 we find for each of the orbitals:

%block PAO.Basis
H 1
n=1 0 2 P 1     # n, l (=0, s-orbital), number of zetas, include 1 set of polarization orbitals\\
*0.0* 0.0     # rcut for first zeta, rmatch for second zeta
O 2               # Number of different l
n=2 0 2         # n, l (=0, s-orbital), number of zetas
*0.0* 0.0     # rcut for first zeta, rmatch for second zeta
n=2 1 2 P 1     # n, l (=1, p-orbital), number of zetas, include 1 set of polarization orbitals
*0.0* 0.0     # rcut for first zeta, rmatch for second zeta
%endblock PAO.Basis


Modify the value for rcut in the 1s orbital of Hydrogen, going from 4.0 Bohr to 6.0 Bohr by 0.5 Bohr increases. Take note of the basis enthalpy of the system for each value of rcut, printed in the first line of the BASIS_ENTHALPY file.

Now we choose the cut-off radii for which the basis enthaly is minimized. Assuming the value is 4.5 Bohr, your basis block will look somewhat similar to this:

%block PAO.Basis
H 1
n=1 0 2 P 1     # n, l (=0, s-orbital), number of zetas, include 1 set of polarization orbitals
*4.5* 0.0   # rcut for first zeta, rmatch for second zeta
O 2               # Number of different l
n=2 0 2         # n, l (=0, s-orbital), number of zetas
0.0 0.0     # rcut for first zeta, rmatch for second zeta
n=2 1 2 P 1     # n, l (=1, p-orbital), number of zetas, include 1 set of polarization orbitals
0.0 0.0     # rcut for first zeta, rmatch for second zeta
%endblock PAO.Basis


After choosing a value for the Hydrogen 1s orbital, do the same procedure for the 2s orbital of Oxygen, and then for the 2p orbital. Explore values from 4.5 Bohr to 7.5 Bohr for both orbitals.

Do the values for the optimized cut-off radii coincide with what you would expect from chemical intuition? Which is the largest orbital and which one is the shortest?

For the purposes of this tutorial, we will stop here the optimization of rcut. In a more general case, it would be better to re-optimize all of the orbitals iteratively to guarantee that the rcut values are truly converged.

## Optimizing the second-zeta matching radii¶

We will now expand the previous block to the one we had at the beginning for the DZP basis set, but instead of zeroes in the rcut, we will place our optimized values for the single zeta basis (i.e. replace RCUT1, RCUT2 and RCUT3 for your own optimized values):

%block PAO.Basis
O 2                  # Number of different l
n=2 0 2            # n, l (=0, s-orbital), number of zetas
*RCUT1 0.0*    # rcut for first zeta, rmatch for second zeta
n=2 1 2 P 1        # n, l (=1, p-orbital), number of zetas, include 1 set of polarization orbitals
*RCUT2 0.0*    # rcut for first zeta, rmatch for second zeta
H 1
n=1 0 2 P 1        # n, l (=0, s-orbital), number of zetas, include 1 set of polarization orbitals
*RCUT3 0.0*    # rcut for first zeta, rmatch for second zeta
%endblock PAO.Basis


We will do the same as before, but now changing the value for the matching radii (which are still 0.0 in the block).

Using a similar procedure as before, find the value of rmatch for the 1s orbital of Hydrogen that minimizes the basis enthalpy, evaluating it from 1.5 Bohr to 3.0 Bohr. Do the same for Oxygen 2s and then 2p, changing the values from 2.5 Bohr to 4.0 Bohr for each of them.

## Calculating the binding energy of a water dimer¶

Note

Go to the subdirectory 2-WaterDimer.

Now is the time to put our optimized basis set to test. We will use as an example the binding energy of a water dimer, i.e.:

E~binding~ = E~dimer~ - 2*E~monomer~


Copy your optimized basis block into both *monomer.fdf* and *dimer.fdf*, and run a single point calculation for both systems. Keep the total energies of both systems. Now do the same but instead of the optimized basis set, use the default DZP basis set with different values for the energy shift (for example, 0.1 Ry and 0.0001 Ry).

How well does our newly optimized basis set perform? Look at both the total energies for each system, and the total binding energy. Take into account that the binding energy for the water dimer using planewaves is 228.6 meV ([Corsetti et al.: “Optimal finite-range atomic basis sets for liquid water and ice”, J. Phys. Condens. Matter 2013](https://arxiv.org/abs/1307.3187)).

## The Basis Set Superposition Error¶

Note

Go to the subdirectory 3-WaterDimerBSSE.

You might have noticed that when choosing the default setting for SIESTA, the binding energy obtained before can be twice as large as the planewaves result. Even for our optimized basis set, we might be up to 50 meV away from it. Is our basis set that bad?

In truth, we are missing one part of the puzzle that is key for atomic-centered basis in general: the Basis Set Superposition Error or BSSE. All kinds of DFT codes are subject to this error unless they are using planewaves. The general idea is that, given an atom A, the basis functions coming from neighbouring atoms B also help improve the description of the electronic density around A. If we remove those neighbours, we effectively end up with a poorer description.

It just so happens that in our case, the dimer, each of the water molecules is slightly better described if we take in to account the basis functions of the other water molecule. Thus, it is as if our monomer had a poorer basis set than the dimer.

Luckily enough, there are a few ways so solve this issue, the most common one involving ghost atoms. As mentioned in the the previous tutorial, ghost atoms are just points in space in which we add basis functions that would belong to a given atom if it were there. In our case, we will turn each of the water molecules in a ghost ending up with the following systems:

Water~1~ + (Ghost)Water~2~
(Ghost)Water~1~ + Water~2~


In terms of “true” atoms, each of those systems is a water monomer, but each have the additional basis functions that would correspond to the other water molecule if it were there. Thus, both systems have the same coordinates as the water dimer. Now, our binding energy becomes:

E~binding~ = E~dimer~ - E~ghost1~ - E~ghost2~


Copy your optimized basis block into both *ghost1.fdf* and *ghost2.fdf*, and run a single point calculation for both systems. Keep the total energies of both systems and recalculate the binding energy. Do the same for the other cases you might have tried in the previous step.

Are our binding energies now closer to the planewaves result? How much of our original binding energy can be attributed to the BSSE? What are the differences between our optimized basis set and the default ones?

## Optional: Optimizing the polarization orbitals¶

Note

Go to the subdirectory 4-PolarizationOrbitals.

One last step in the orbital optimization could be to optimize the polartization orbitals. For that purpose, we will expand the basis block to add one more orbital set per atom:

%block PAO.Basis
H *2*
n=1 0 2              # n, l (=0, s-orbital), number of zetas
*RCUT1 RMATCH1*  # rcut for first zeta, rmatch for second zeta
n=2 1 1 *Q 0.0*      # n, l (=1, p-orbital), number of zetas, charge value for confinement
*RCUT1*          # rcut for first zeta, rmatch for second zeta
O *3*
n=2 0 2              # n, l (=0, s-orbital), number of zetas
*RCUT2 RMATCH2*  # rcut for first zeta, rmatch for second zeta
n=2 1 2              # n, l (=1, p-orbital), number of zetas
*RCUT3 RMATCH3*  # rcut for first zeta, rmatch for second zeta
n=3 2 1 *Q 0.0*      # n, l (=2, d-orbital), number of zetas, charge value for confinement
*RCUT3*          # rcut for first zeta
%endblock PAO.Basis


Note that, in both cases, the polarization orbital belongs to the following principal number (n=3 for Oxygen, n=2 for Hydrogen). In addition, all polarization orbitals have the same cut-off radius as the orbital they are polarizing.

One of the recommended methods to generate appropriate polarization orbitals is the charge confinement method (more details in the manual), but suffice to say that this method relies on seeing how the atomic wavefunctions change in the presence of a charge confinement potential, $${ V(r;Q) = Q / r^{2}}$$. In order to optimize our polarization orbitals, we just need to change the value of the charge Q.

Modify the charge used in the Hydrogen 2p polarization orbital (the number following Q) from 3.5 to 6.5, and choose a value that minimizes the basis enthalpy. Do the same for the Oxygen 3d polarization orbital, modifying the charge from 5.0 to 8.0.

Repeat the calculations for the water dimer using your newly optimized basis set. How do results change?

Note

In addition to the charge value, one can also change a decay parameter $${\lambda}$$, changing the charge confinement potential to $${ V(r;Q, \lambda) = Q * e^{-\lambda r} / r^{2}}$$. This is simply added as an additional number after Q. For example, n=2 1 1 Q 3.0 0.4 has a value of 0.4 for $${\lambda}$$ and 3.0 for the charge.

The decay parameter is not independent from the value of the charge, therefore both values need to be optimized at the same time (or iteratively) for a given orbital. Refer to the manual for more information on the charge confinement potential.