Skip to content
Kyle Kernick edited this page Jun 10, 2024 · 8 revisions

Overview

This documentation is for individuals either wanting to contribute to Heatmapper, or deploy it to a server.

Deployment

For reference, you should look at Running Client-Side PyShiny, as those instructions are also applicable to hosting Heatmapper on a server.

The setup.sh script located in the root of the repository can setup a complete environment for running Heatmapper, including setting up a virtual environment, installing dependencies, cloning Heatmapper, and resolving LFS files. It’s a bash script, so deployment on a Windows server will need to be done manually. From the repo, find setup.sh from the file list. Upon clicking on it, GitHub should take you to a viewer, with a download button in the top-right corner. Or, if you only have access to a terminal, you can curl the script via:

curl -O https://raw.githubusercontent.com/WishartLab/heatmapper2/main/setup.sh

From there, make it executable:

chmod +x setup.sh

Then, place it into the directory you want Heatmapper to live in. setup.sh will create two directories:

  1. The python virtual environment in venv
  2. The Heatmapper source code in heatmapper2

Once the script is finished, or you’ve manually handled dependencies and installation, you’ll next need to activate the Virtual Environment (Assuming you’re using a venv and not just installing dependencies to the system). From the folder containing the venv folder, run:

source venv/bin/activate

You can deactivate the virtual environment at any time by typing:

deactivate

Now, enter the heatmapper2 directory. For batch deployment, there are two scripts which automate the process:

  1. deploy.sh will deploy each application on the host, starting at port 8000 for Expression, and ending with 8006 for Spatial. Each process will run in a separate process, so the script (and user session) can be closed without tearing down the applications themselves.
  2. teardown.sh will send a KILL signal to all applications listening on the ports 8000-8006. If you’re only selectively hosting Heatmapper’s applications, this might kill non-related applications if they’re listening to that port.

However, if you want to be more selective about which applications are run, you have two primary options:

  1. Running it as a PyShiny application. To do this, navigate into the project to run, such as expression, and enter its src directory. From there, execute: shiny run --host 0.0.0.0. The host argument is important to be listening on all network interfaces. If you want to enable reloading, so that changes to the src folder will be transparently noted and changed within the application—without needing to stop it—add --reload. To specify a port, use the --port argument
  2. Running it as a Static, WebAssembly application. This mode will instead server a connecting client with the WebAssembly files, which are then run on their computer. From the project folder expression, there are two sub-folders, src and site. Simply run python3 -m http.server --directory site --bind localhost 8008, where the value 8008 specifies the port.

Contributing

This section outlines some general guidance on working within the Heatmapper repository.

Coding Convention

For sake of consistency, Python Code should:

  1. Always use from imports, rather than importing the entire module: Do from shiny import App, not import shiny
  2. Use double quotes rather than single quotes for strings
  3. Use tabs, rather than spaces
  4. For naming convention:
    1. Local variables should use snake_case
    2. Global variables, functions, classes, and Shiny IDs should use PascalCase
  5. Prefer code that is more concise. If a function only has a single line, put in the function definition, such as async def Reset(): await DataCache.Purge(input)
  6. Strive to consistently document the code-base. All non-trivial functions should have doc strings, which should follow Doxygen format.
  7. Use shared.py definitions over creating something custom. If functionality is missing, add it to the shared.py implementation.
  8. Always use the Cache object for handling input
  9. Always use the Filter function to determine column names.
  10. Always use the NavBar function to create a navigation bar shared across all applications.
  11. shared.py should always be a symlink within the src folder. Do not copy it.

When creating a new Application, there’s a few things to note:

  1. You should create a DataCache variable from the shared.Cache class, which will handle all your user-input. This should be in the server function.
  2. If you need to extend the Cache, such as adding more file-types, create a function that you can pass to the Cache call.
    1. Treat it like a switch statement. You will be passed a single argument, path. Compare against the suffix to see if it matches your custom file type. If it doesn’t, return DataCache.DefaultHandler(path). Do not modify the Default Handler, it bogs down all the applications.
  3. FileSelection should be used to generate the UI for uploading/selecting input. Importantly:
    1. It will create Shiny input IDs SourceFile for whether the user is selecting Upload/Example. File for the user-uploaded file, and Example for the selected example. Additionally, it will create the ExampleInfoButton and ExampleInfo IDs. ID conflicts cause Shiny to fail.
    2. You will need to manually set ExampleInfo. The easiest way to is to make a reactive function that looks at a dictionary defined in the server: def ExampleInfo(): return Info[input.Example()]
    3. The multiple argument should be used with caution. It requires you do manually handle parsing input. See Spatial for an implementation
  4. The MainTab function supports adding additional tabs via the *args argument. See Spatial or Expression for implementations. It will create IDs: Heatmap, which should be your main page Heatmap, Table, which you shouldn’t need to touch, as it handles creating all the associated values, and itself has an ID of MainTab. You may need to add ID’s Reset so that your reactive functions update when the user updates the table.
  5. You will need to manually Filter columns. This involves calling Filter in a reactive function with the following arguments:
    1. The input, usually (await DataCache.Load(input).columns
    2. The type of column to look for, see shared.py for values.
    3. A UI element to update, such as NameColumn

Rebasing

When changes are made within the code-base, they are not reflected in the WebAssembly site, which can cause incongruity when pushed to GitHub. Run the rebase.sh script at the root of the repository to perform this action across all applications.

Configuration

Heatmapper is designed to be easily deployed for different purposes, and to this effect most of the interface can be modified without having to modify the code itself (Technically you modify code, but that’s just so that configuration is bundled in web assembly).

Each project contains a config.py file, a Python file which provides defaults and overrides to every configurable option in that program. However, the base config.py is within Heatmapper’s version control system, which means that modification of it can cause clashes when attempting to update. For that reason, you should copy config.py, creating a file named user.py. Heatmapper will first check if user.py exists, and use that for configuration, only falling back to config.py if the former doesn’t exist. Do not modify config.py Consider the configuration provided in Pairwise:

# Distance/Correlation
"MatrixType": Config(selected="Distance", visible=True),

This variable is attached to the associated input.MatrixType which defines whether the user wants to select a Distance Matrix, or Correlation Matrix. Let’s break it down:

  • MatrixType, the input name, and cannot be modified as it’s explicitly used within the main program. You cannot add new configurations (Every user input that can be modified is already present in the file)
  • Config is from shared.py, and is simply a class that wraps configuration. Every configuration is an Config object.
  • selected is the only required argument of any configuration. This specifies what Heatmapper should assign as the default value when loading the application. A comment above each Config outlines what your values can be. Some configurations uses value instead, which is simply because some inputs “select” a value, such as the titular ui.input_select, whereas others simply have a value, such as ui.input_checkbox. The configuration already provides the correct keyword, so this has no impact on actually configuring the application so long as the original configuration keyword isn’t deleted.
  • visible is an optional argument that defaults to True. When visible is True, the associated user input in the sidebar will be visible when loading the application, and the user can make modifications to the value. When visible is False, the input will be hidden from the sidebar, and the user will be unable to change the selected value. This is useful where an application has no need for the option to be available (Such as only needing to display Distance Matrices) and helps declutter the sidebar and prevent user confusion.
  • Finally, something that is not shown in any of the default configurations, is that the Config class takes any key-word argument and stores it, applying them directly to the Shiny input object. Therefore, if we wanted to make sure the MatrixType’s radio buttons are not inline, we could modify the configuration to MatrixType = Config(default="Distance", inline=False). You may notice that Heatmapper already defines inline=True within Pairwise’s code, but Config objects will check for these conflicts, and will default to the Configuration. You can therefore override all of the parameters of the input, save the input type itself. Refer to Shiny’s excellent documentation if you want to make any such changes; note that you cannot change the input type itself, and some modifications may cause issues with the application (IE specifying multiple=True where Heatmapper does not expect multiple inputs)

Heatmapper has some configurations that do not have an associated value. There are such types, both of which warrant additional explanation:

  1. Configurations that are only there for visibility. Consider: "DownloadTable": Config(). This is an input that doesn’t expose any “values,” it’s simply a button. These configurations exist to toggle visibility of features through the visible keyword.
  2. Configurations that are dynamic inputs. Examples include "Keys": Config() in Spatial, and "KeyColumn": Config() in Geomap. These inputs are dynamically updated by Heatmapper because input files often have different column names for different values, such as some files using NAME, others using KEY, etc. While These configurations support both setting a selected= and visible= keyword, the behavior differs in important ways:
    1. When visible=True, the selected value will be defaulted to, so long as it exists in the data. If you define selected="NAME", Heatmapper will default (Remember, the user can still change this value when the input is visible) to the selected value, case-sensitive, until a file is provided where the column does not exist. When that happens, Heatmapper will use its Filtering mechanism and automatically choose a more appropriate column name.
    2. When visible=False, the selected value is constant and unchanging. Even if the column doesn’t exist in the input data, Heatmapper will use it; this means that you need to be very careful with what you select for a default value, and what input you provide to the application, as if the column name doesn’t exist, Heatmapper will not rectify the incongruity and will simply fail to render.

Column Filter is an important facet of Heatmapper’s design, so it’s recommended not to touch the dynamic inputs, especially disabling their visibility, as it encumbers the application to hard-coded values that are antithesis to its design. However, if your use-case requires very specific file formats, where the column names are known and will not change, disabling the Filtering can reduce user confusion.

Working with Configuration

If you’re working within the code-base, you may wonder how to actually work with Configuration values. In essence, they’re just wrappers on Shiny’s input values (If the input UI’s aren’t visible, that’s literally all they are). They can’t be used as reactive decorators, but with caching you shouldn’t need to use reactive decorators in the first place.

Configuration variables are optional. You can use regular Shiny input’s just as well as you can use configuration values, but while you don’t need to use the former to use the latter, the reverse is not true. To create a Configuration value, there are three steps:

  1. Define the Config class within the config.py file. See the above Configuration section on its structure.
  2. Wrap the ui.input value in the app_ui with the Configuration’s UI members. For example, if you have a config "MatrixType": Config(), you’ll want to take the Shiny input with id="MatrixType within the app_ui, and change it to: config.MatrixType.UI(ui.input, id="MatrixType", ...) Some things to note:
    1. The ui.input object does not take the keyword arguments, don’t do ui.input(id="MatrixType", ...))
    2. You must exclusively use keyword arguments, and they’ll be passed to the ui.input object
  3. Replace uses of input.X() with config.X(). Don’t use them in reactive decorators.

Caching

Heatmapper employs two types of Caching, Web Resource Caching and Computation Caching:

Web Resource Caching

Web Resource Caching should always be utilized, and if you fetch information using the DataCache it will be done automatically. You’ll need to use the FileSelection function within your app_ui. If you need to fetch more than just a single example, you can fetch any arbitrary content using the Cache. Consider an example from Geomap. Firstly, you need to define a reactive variable, and an updater function:

JSON = reactive.value(None)

#...

@reactive.effect
@reactive.event(input.JSONUpload, input.JSONSelection, input.JSONFile)
async def UpdateGeoJSON(): JSON.set(await DataCache.Load(
		input,
		source_file=input.JSONUpload(),
		example_file=input.JSONSelection(),
		source=URL,
		input_switch=input.JSONFile(),
		default=None
	))

# ...
json = JSON()

Some things to note:

  1. Use a reactive variable. constantly querying the Cache is wasteful and inefficient.
  2. Ensure you have reactive decorators. This is one of the only functions in Heatmapper that you should have decorators, as this will cause the reactive variable to be modified, and will trigger all functions that rely on it.
  3. Make it asynchronous; as with decorators, this will be the only function where you should do this, and you should only let the server call this function. When you need the value, call the variable: json = JSON().
  4. You cannot use configuration values for the reactive values. You need to use regular Shiny input values.
  5. Note the arguments to the Cache:
    1. source_file is a Shiny ui.input_file. Shiny and Heatmapper handle taking user input and parsing it.
    2. example_file dictates the name of the example file. You have two formats in this regard.
      1. A file name relative to the source variable. By default, this points to your example directory, so if you have a file stored in example_input/my_test, input.JSONUpload() can simply be my_test.
      2. A URL. If the source_file starts with https://, source will be completely ignored and the source_file be fetched directly. Look at Geomap’s example files to see how one of the examples are fetched from outside the normal place, simply by using a URL.
    3. source Defines where example_file will be located. Usually, this the example_input folder for the application, but you can set it wherever you want. For this example, the URL points to data within the Geomap folder. Importantly, this source has to be local when running as a server, or remote when running as WebAssembly. You’ll need to use the Pyodide variable in shared to know what more Heatmapper is running in; for Geomap, it sets the URL to ../data in server mode and a link to GitHub otherwise. Heatmapper expects example files to be located on disk when not running under Pyodide.
    4. input_switch defines the input that defines whether we’re expecting an example file, or a user-uploaded file. If it’s equal to "Upload", it’ll be looking at source_file, otherwise it looks at example_file.
    5. default defines what to return if there’s nothing to return. This defaults to a DataFrame, but you may want to change it so whatever type you expect to return, otherwise you might get unexpected objects when there is nothing to return.

Computation Caching

Heatmapper also supports arbitrary computation caching, although you’ll need to go out of your way to use it. In essence, you’ll be using three functions in your Cache object: In(), Get(), and Store(). Firstly, you’ll need to make a list of inputs that this computation uses. That way, changes to inputs will ensure that an invalid cached object isn’t return. Heatmapper makes no effort to ensure all your inputs are accounted for. Consider the Imaging Caching used by Pairwise, Expression, and Image. Firstly, at the start of each Heatmap call, it creates a list of inputs:

		inputs = [
			input.File() if input.SourceFile() == "Upload" else input.Example(),
			input.Image(),
			config.ColorMap(),
			config.Opacity(),
			config.Algorithm(),
			config.Levels(),
			config.Features(),
			config.TextSize(),
			config.DPI(),
		]

Notice that we take the value of these (IE it’s a list of strings, not a list of reactive objects), and that we can be conditional about what values truly make up the hash (We don’t need both File and Example, we just need whatever is selected). Then, we use the first function, In():

if not DataCache.In(inputs):
	# ...

It’s recommended to check the absence of the object in the Cache, compute it and place it in the cache, and then return it so that both branches in the condition have the same return statement. In the case that the object isn’t in the Cache, the application will do the regular computation to create the output, and then stores it in the Cache:

b = BytesIO()
fig.savefig(b, format="png", dpi=config.DPI())
b.seek(0)
DataCache.Store(b.read(), inputs)

Note that we cannot store MatPlotLib plots directly, we save it as an image, and store the image’s bytes within the Cache, associating it with the inputs used to make it. Finally, we return the object within the Cache:

b = DataCache.Get(inputs)
with NamedTemporaryFile(delete=False, suffix=".png") as temp:
	temp.write(b)
	temp.close()
	img: types.ImgData = {"src": temp.name, "height": f"{config.Size()}vh"}
	return img

The Temporary File shenanigans aren’t important, what is important is that we use Get() to retrieve that binary stream, and then return it appropriately.

Clone this wiki locally