Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

experiment class for laghos #362

Open
wants to merge 13 commits into
base: develop
Choose a base branch
from

Conversation

august-knox
Copy link
Contributor

bin/benchpark experiment init --dest laghos

Don't believe there's any variants that need to be specified here

@github-actions github-actions bot added the experiment New or modified experiment label Sep 6, 2024
@github-actions github-actions bot added the application New or modified application label Sep 12, 2024
Copy link
Collaborator

@dyokelson dyokelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job so far! Some updates requested, let me know if you have any questions.

description='number of serial refinements',
workloads=['laghos'])

workload_variable('rp', default='0',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep this "example" default at what it was before so default value should be 5, since we can pass in different values now via the scaling variants in experiment.

workload_variable('rp', default='0',
description='number of parallel refinements',
workloads=['laghos'])
workload_variable('ms', default='4',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise let's update this default value back to 500

def compute_applications_section(self):
variables = {}

if self.spec.satisfies("scaling=weak"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's start with the scaling assuming a single node, but change problem size per rank instead, once we know we have this right we can add more nodes or make updates. So for weak scaling:
rs values: 1
num processes: 8
rp values (increases workload per rank): 1,2,3,4

Another option if you can also implement this is rs values 1,2,3 and mpi processes 1,8,64

variables["rs"] = ["5","6"]
variables["n_nodes"] = ["1","8"]
elif self.spec.satisfies("scaling=strong"):
variables["n_nodes"] = ["1","8"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of increasing nodes for strong scaling let's keep it on one node and increase MPI ranks, 1,8,64. We should lower the rs value to 1 or 2 here as well, depending on how long this takes to solve, these would ideally be less than 5 minutes to solve.

variables["rs"] = "5"
else:
variables["n_nodes"] = ["1","2","4","8","16","32","64","128"]
variables["n_ranks"] = "{sys_cores_per_node} * {n_nodes}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're setting ranks explicitly above let's move both n_nodes and n_ranks into the else, and set this to one node, 8 ranks since this will be for the example run.

"problem": {
#"variables": variables,
"experiments": {
"laghos_{n_nodes}_{n_ranks}": {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the other parameters into the naming scheme here, so basically anything that gets passed in (rs, rp, etc) should be in here, see the naming for the kripke experiment in https://github.com/LLNL/benchpark/pull/344/files

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@august-knox I enabled a dry-run test and it is failing. Please diagnose/fix.

@github-actions github-actions bot added the ci Involving Project CI & Unit Tests label Sep 24, 2024
@pearce8 pearce8 added ci Involving Project CI & Unit Tests changes requested Changes requested and removed ci Involving Project CI & Unit Tests application New or modified application labels Sep 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changes requested Changes requested ci Involving Project CI & Unit Tests experiment New or modified experiment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants