Chain scan¤

Example script: examples/run_basic_optimalization.py | Code documentation

Description¤

Tool to run complex matrix scans of multiple parameters: i.e. for each value of given parameter, all the values of the other parameters will be tested.

Parameter 1 \ Parameter 2	Value 1	Value 2
Value A	Scan 1: Value 1 and Value A	Scan 2: Value 2 and Value A
Value B	Scan 3: Value 1 and Value B	Scan 4: Value 2 and Value B

Multiple ASTRA simulations are run at the same time, depending on the settings. Nevertheless, the number of simulations to be performed may be very large when scanning multiple more parameters.

Tip

To run a simple scan of one parameter in a given range, see parallel scan instead.

Example script¤

examples/run_chain_scan.py

"""Chain (matrix) scan execution scripts. 

Allows to define several parameters for which full matrix scan will be performed: there will be one simulation for each value of all parameters -> large number of simulations.
"""

import sys
sys.path.append("..")

from astra import chain_scan

# no changes above
#####################################################################################################
# change the values of the parameters

# directory where the scan should be performed, must be full path
working_dir = r""
# maximum number of parallel processes to be run
n_parallel = 6

#####################################################################################################
# no changes here

def main():
    chain = chain_scan.ChainScan(working_dir)

    #####################################################################################################
    # make changes here to match what you want to scan

    chain.add_layer("sig_clock", start=0.001, end=0.02, n_steps=10)
    chain.add_scanning_parameter("MaxB(1)", start=0.22, end=0.27, step=0.005)

    #####################################################################################################
    # no changes bellow

    chain.run()

if __name__ == '__main__':
    main()

For the script to work, the working_dir (working directory) specified when constructing the optimizer must be prepared as described here. The script itself is explained in detail later on this page.

With Python environment well set-up, the script can be simply run as

python run_chain_scan.py

Setup¤

Working directory setup¤

The working directory must contain a folder called BASE/, which serves as the base folder for running the scans. It must contain the files listed in the table bellow. It is recommended not to have any other files and folders in the working directory, except for pure text (.txt) files containing comments. This is to not mess up with the files created by the simulation.

Example directory setup

working_directory/ # (1)
    BASE/ # (2)
        ...
    ... # (3)

Working directory that is used in the initialization process.
BASE directory containing the initial setting files as well as the ASTRA executables. For the content of this directory, see the table bellow.
It is recommended not to include other files in the working_directory in order not to mess with the output files. It should be safe though to include .txt files with additional information (scan settings etc.).

File	Comment
`Astra.exe`	Astra executable file.
`generator.exe`	Astra initial particle distribution generator.
`.dat` files	Files defining the structures used in the simulations.
Generator file	File obtaining the initial distribution definition, which will be generated by the `generator.exe`. By default, the python code will assume this file is named `generator.in`, but this can be changed in the object setup.
Simulation definition file	File obtaining the setup definition. By default, the python code will assume this file is named `photo_track.in`, but this can be changed in the object setup.

Run script setup¤

Important

Please note that the run script does not have to be in the working directory. It is recommended to keep the script and working directory separated and have the run script in a subdirectory of the main code directory (for example create a scripts/ directory inside the main code directory) to avoid problems when importing the code. The working directory will be passed as an argument, as explained later in this section.

The astra/chain_scan.py must be imported from the astra subpackage of the main package:

from astra import chain_scan

In case this fails, the main package must be specifically added to the python path so python actually recognizes that. This can be achieved by the sys library and the full import is therefore:

import sys
sys.path.append("..") #(1)

from astra import chain_scan

Use .. when the script is located in a subdirectory of the main code folder (recommended), otherwise use the full path to the main code directory.

Optimization object setup¤

The optimization code is written as object-oriented. Therefore, after importing the corresponding source file, the optimization object has to be created.

from astra import chain_scan

def main():
    # define the optimizer and the working directory
    chain = chain_scan.ChainScan("<workingDirectory>") #(1)
    ...

if __name__ == "__main__":
    main()

Full path to the working directory. See the full list of possible parameters here.

The full list of parameters to be set when initializing the object is listed here. The most important one is the working_dir parameter, which defines where the optimization should take place.

Scan setup¤

After the optimization object is created, the scan settings must be defined. Two methods need to be used for this and the difference between them is explained bellow:

Layer

A layer needs to be created for each parameter to be scanned (except for the last one, see the following point), defining the values to be scanned.

To create a layer, use the ChainScan.add_layer method.
How to define the scanning range

There are multiple options how to define the range of the parameter to be scanned:
1. Full values list (values)
2. Start and end values (start, end) + step size (step)
3. Start and end values (start, end) + number of steps (n_steps)
This list also corresponds to the order at which the definition of the parameter is checked. Once the parameter is fully defined (one of the points above is fulfilled), the rest of the values may be internally overwritten to match the calculated values from the full definition.
Multiple parameters can be varied within one layer at the same time, use the layer_idx argument of the ChainScan.add_layer method to specify the layer and use the method multiple times to assign more parameters to the layer. The number of values to be scanned within one layer must be the same for all the parameters assigned to the layer.
Example
```
    ...
    chain = chain_scan.ChainScan("<workingDirectory>")

    chain.add_layer("Lx",  start=1.5, end=2.9, n_steps=10, layer_idx = 1)
    chain.add_layer("Q_total",  start=1, end=10, n_steps=10, layer_idx = 1)
    ...
```
When the layer index layer_idx is not specified, new layer will be created for the parameter. The layers are numbered by integers, starting by 0. The number of values to be scanned may vary for different layers.

A separate directory with subdirectories corresponding to the layers with higher index will be created for each value of the parameter(s) within this layer.
Scanning parameter

Important

The scanning parameter has to be always defined.

The last layer of the chain scan has to be defined differently, as a scanning parameter. It is recommended that the parameter with the larges number of values to be scanned is picked as the scanning parameter.

To define the last scanning parameter use the ChainScan.add_scanning_parameter method.
How to define the scanning range

There are multiple options how to define the range of the parameter to be scanned:
1. Full values list (values)
2. Start and end values (start, end) + step size (step)
3. Start and end values (start, end) + number of steps (n_steps)
This list also corresponds to the order at which the definition of the parameter is checked. Once the parameter is fully defined (one of the points above is fulfilled), the rest of the values may be internally overwritten to match the calculated values from the full definition.
For each value of this scanning parameter, a separate ASTRA run will be performed in the directory corresponding to the values of parameters in different layers.
Example
```
    ...
    chain = chain_scan.ChainScan("<workingDirectory>")

    chain.add_layer("Lx",  start=1.5, end=2.9, n_steps=10, layer_idx = 1)
    chain.add_layer("Q_total",  start=1, end=10, n_steps=10, layer_idx = 1)

    chain.add_scanning_parameter("MaxB(1)", start=0.125, end=0.375, step=0.005)
    ...
```
Multiple scanning parameters can be defined, provided that the number of values to be scanned is the same for all of them. Defining multiple parameters is achieved by adding the ChainScan.add_scanning_parameter method to the script for each parameter.

Output¤

In the background, for each combination of the values of the parameters in different layers, a parallel scan of the parameter(s) defined via the ChainScan.add_scanning_parameter method is performed.

Therefore, the output of the chain scan is divided into multiple directories corresponding to the values of parameters in different layers. The simulations themselves are located in the directory corresponding to the layer with the highest index. Different ASTRA output file runs correspond to the scanning parameter defined by he ChainScan.add_scanning_parameter method.

Example output folder structure with N layers

BASE/ # (1)
chain_output_file.xlsx #(2)
Layer1_Value1/
    Layer2_Value1/
        ...
            LayerN_Value1/
                <scanDirectoryOutput> # (3)
            LayerN_Value2/
                <scanDirectoryOutput> 
    Layer2_Value2/
        ...
            LayerN_Value1/
                <scanDirectoryOutput> 
            LayerN_Value2/
                <scanDirectoryOutput> 
    ...
Layer1_Value2/
    Layer2_Value1/
        ...
            LayerN_Value1/
                <scanDirectoryOutput> 
            LayerN_Value2/
                <scanDirectoryOutput> 
    Layer2_Value2/
        ...
            LayerN_Value1/
                <scanDirectoryOutput> 
            LayerN_Value2/
                <scanDirectoryOutput> 
    ...
...

BASE/ directory with the initial setup. See working directory setup.
The chain output file containing the values of parameters in different layers that have been scanned.
The simulations are performed within the LayerN_ValueX/ subdirectories: thus both the simulations output files and basic scan analysis are located here. ASTRA runs correspond to the values of the parameter(s) defined by the ChainScan.add_scanning_parameter method.

As the output data are distributed among multiple folders, the following script can be used to load all the results of all the simulations performed within the chain scan.

Example output analysis script

import pandas as pd
import pathlib

main_dir = pathlib.Path(r"fullPathToWorkingDir") #(1)

df = pd.DataFrame()
for i in range(df_chainOutput.shape[0]):
    output_file = main_dir
    data = {}
    for j in range(len(df_chainOutput.columns)):
        value = df_chainOutput[df_chainOutput.columns[j]][i]
        column = df_chainOutput.columns[j]
        while abs(value - round(value, rangeLen)) > 1e-9:
            rangeLen += 1
        output_file = output_file.joinpath(f"{column}_{value:.{rangeLen}f}") 
        data[column] = value

    output_file = output_file.joinpath(r"plots").joinpath(r"MaxB(1)").joinpath(r"data.xlsx") #(2)
    try:

        dff = pd.read_excel(output_file, index_col=0)
        for key in data.keys():
            dff[key] = data[key]
        df = pd.concat([df, dff])
        df = df.reset_index(drop=True)
    except:
        print("Output file not found: " + str(output_file))

df["MaxB(1)"] = df["scanning_parameter"] #(3)

Full path to the working directory.
Replace "MaxB(1) by the actual used scanning parameter defined in the ChainScan.add_scanning_parameter method.
Replace "MaxB(1) by the actual used scanning parameter defined in the ChainScan.add_scanning_parameter method.

Warning

To use the script above, replace MaxB(1) in it by the actual scanning parameter used, defined in the ChainScan.add_scanning_parameter method.

At the end of the script above, the df dataframe contains all the simulation results and the parameters used in the chain scan: each row in it corresponds to a performed simulation.

The df can be directly used for data analysis in Python. Alternatively, the contents of the dataframe can be saved to a .csv to be used in the data analysis software of the user's choice. To do so, append the following line to the output analysis script:

df.to_csv(r"nameOfTheCsvFile.csv")

Description of columns in the df from the presented analysis script

Each simulation is represented by a row in the dataframe. The first set of columns represents the simulation outcome and is described in detail in the table bellow. Additionally, the values of parameters in the defined layers and the scanning parameter value used for the given simulation are saved in the other columns: each column represent a different parameter.

Column	Description
Index	Index column.
`run`	Run number
`scanning_parameter`	Value of the parameter that was scanned.
`energy`	Bunch energy.
`energy_spread`	Rms energy spread.
`x_emit`	Rms normalized transverse emittance in the x plane.
`y_emit`	Rms normalized transverse emittance in the y plane.
`x_rms`	Rms transverse bunch size in the x plane.
`y_rms`	Rms transverse bunch size in the y plane.
`xBar_rms`	Rms beam divergence in the x plane.
`yBar_rms`	Rms beam divergence in the y plane.
`Active particle ratio`	Ratio of the active particles (those not lost during the simulation). 1 means no particles were lost.
`Beam size (z_rms)`	Rms bunch length.

To see all the column in the dataframe, append the following line to the output analysis script.

print(df.columns)

Last update: October 29, 2023
Created: September 14, 2023