Specifications

Detailed workflow

The sequence of events is represented with numbered arrows herebelow:

detailed workflow

Architecture of shared folder

In the same folder as main script or program, there is a file named “server.txt” which contains only the network path of main shared folder between all users used by GridCompute (folder that contains “Settings”, “Cases” and “Results”). Ex: \\Server010\\Folder\\GridCompute.

This shared folder has following architecture:

  • Settings folder (to be at least read-only for users, read-write for administrators)

    • Software_Per_Machine.csv: This file contains machine names and corresponding applications installed (using unique ID per application). The program looks for the 1 in the matrix.

      Ex:

      Machine name,Software 1,Software 2,Software 3
      Machine 1,1,1,1
      Machine 2,0,1,0
      Machine 3,0,0,0
      

      Note

      This file is only used to detect what machine can run process functions. All machines can submit or receive cases.

    • settings.txt: contains parameters of database on each line under the form: parameter name: value. Parameters to define are the following:

      • mongodb server: address of the mongo instance including connection port containing gridcompute database

        Ex: mongodbserver.com:888 or 10.0.0.1:888 or Machine123:888

      • user group: Login used to connect on mongo database.

      • password: Password used to connect on mongo database.

      • instance: Data instance to consider. Example: 0 or debug.

    • Applications folder

      • One file per application with unique ID (ex: Software 1).

        Warning

        The folder name cannot contain a dot.

        It contains:

        • send.py: Defines how to select input and send calculations.
        • process.py: Will run when analysis are executed based on input received and will create output files.
        • receive.py: Will run when output files are present on server.

        More details are provided in Application-specific scripts.

  • Cases folder: Contains one folder per user and inside, one folder per machine (so that user can see easily his files) storing each case as a zip file that has all input files/folders required.

  • Results folder: Contains one folder per user and inside, one folder per machine storing each result as a zip file that has all output files/folders required.

Note

A template folder is present in source code in template folder and can be used to set up the shared folder.

Architecture of Mongo Database

GridCompute communicates with a mongo database that contains all the details on cases. Following entries are present:

  • collection cases
    • _id: Unique Object Id based on timestamp of the case.
    • user_group: User group. Ex: ENGINEERING DEPARTMENT.
    • instance: Instance used to isolate grids. Ex: 0 or debug.
    • status: Current status of the case. It can be to process, processing, processed or received.
    • last_heartbeat: Timestamp of last heartbeat sent to notify the database that the process is still alive.
    • application: Application associated to the case.
    • path: Path on file server refering to input/output case.
    • origin: Machine/User who submitted the case to the database.
      • machine
      • user
      • time
        • start: Time the case has been submitted to server.
        • end: Time the results have been retrieved from server.
    • processors
      • processor_list: List of Machine/Users who tried to process the case (some attempts to process may have failed).
        • machine
        • user
      • time (start and end) for the last attempt to process
        • start: Time of the last attempt to process.
        • end: Time the process returned.
  • collection versions (optional)
    • _id: Versions of program recognized by database.
    • status: Can be either allowed, warning or refused.
    • message: Message to be displayed when status is not allowed.

Application-specific scripts

Applications can easily take advantage of distributed computing by creating 3 scripts, as detailed in following sections.

Note

Some examples are present in template/Shared_Folder/Settings/Applications.

send.py

This script is executed when submitting cases to server. It takes as input a file selected by the user and returns one or several cases to submit to the server.

send.select_input_files(filepath)

Submit a case to the grid.

This function returns, from a selected file, one or several cases to run. Each case can be made of several input files.

Parameters:filepath (str) – Path of the file selected.
Returns:str list: A list (or tuple) of cases. Each case is a list (or tuple) of input files required to process a case.

process.py

This script is used to process cases. Its input is the ordered list of files submitted in send.py script. At the end of execution, a list of output files is returned, which is submitted to the server.

process.process_case(input_files)

Process a case and return its results.

This function process a case from the grid and returns a list of output files that are sent back to the server. Process is executed in a temporary folder where all files are copied.

Parameters:input_files (str list) – ordered list (or tuple) of input files path.
Returns:str list: An ordered list (or tuple) of output files to return to the server.

receive.py

This script is used to receive cases that have been processed, ie to specify what we want to do with the output files returned from process.py script.

receive.receive_case(output_files)

Receive a case from the grid.

This function receives a case that has been processed on the grid. Process is executed in a temporary folder where all files are copied.

Parameters:output_files (str list) – ordered list (or tuple) of output files path.
Returns:None.

Main code layout

For details on GridCompute source code layout, refer to Source code layout.