How to Talk to Computers¶

Imagine, Your Lab in the Cloud

Welcome to scientific research computing in the 21^st century!

While using a mouse in a Graphical User Interface (GUI) is more familiar to most of us, the real power of reproducible science comes from the Command Line Interface (CLI).

The CLI allows you to give precise, text-based instructions to a computer, automating tasks and managing massive datasets in ways a GUI cannot.

In this lesson, you'll learn to use the CLI on CyVerse, a powerful, cloud-based platform designed for science. Instead of installing complex software on your own machine, you'll use CyVerse to access everything you need right from your web browser.

What is CyVerse?¶

CyVerse is not just one tool, but a comprehensive cyberinfrastructure built to support researchers throughout the entire data lifecycle (Swetnam et al. 2024).

It's composed of four main integrated platforms:

Data Management: At its core, CyVerse provides a robust, large-scale data storage system. It allows you to store, manage, and share massive datasets securely and provides tools for high-speed data transfer. We'll be using this Data Store today.

Discovery Environment (DE): The DE is a web-based portal for accessing scientific tools and workflows. You can use its intuitive interface to launch analyses, from simple virtual machines to complex, multi-step bioinformatics pipelines, without needing to be an expert in high-performance computing.

Cloud Native Services: For more advanced users, CyVerse offers a container-based, composable infrastructure that runs on NSF's Jetstream-2 Cloud, AWS, GCloud, or Azure. These Infrastructure as Code (IaC) tools allows you to deploy your own custom applications and services at scale, giving you greater flexibility and control over your computational environments.

Artificial Intelligence (AI/ML): AI-Verde, provides access to cutting-edge hardware, including GPUs, and pre-configured tools to support generative AI (LLMs). In the Discovery Environment, and on Jetstream-2, we support a wide array of machine learning and artificial intelligence research. These platforms makes it easier to train, test, and deploy AI models on large datasets.

Today, we will use the Data Store to move files and the Discovery Environment to launch virtual analyses. We will briefly introduce generative AI tools. Next week, we'll focus on the AI/ML platforms to run a large language model and practice prompt engineering in more depth.

CyVerse Account Creation¶

Create your account: https://user.cyverse.org

You should be prompted to set a valid email address, an then create a password. For the sake of this workshp, use the same email address you gave to NCEMS staff.
Enroll in the NCEMS workshop: https://user.cyverse.org/workshops/192

As part of NCEMS, you will be granted special access to CyVerse.

We have pre-registered the email account you enrolled with NCEMS in the workshop enrollment form.

AFTER you have created and verified your CyVerse account, go to the Workshops in the User Portal and find the NCEMS workshop. Click on 'Enroll' to be automatically added to the workshop.

Your account will be promoted and you will be given access to the Discovery Environment's interactive applications (which we will use in the next section)
Log into the Discovery Environment (DE): https://de.cyverse.org

Discovery Environment Tour¶

If this is your first time logging into the DE, allow the Tour to show you all of the GUI features.

Starting a VICE App¶

Let's start out with the Cloud Shell

Interactive Apps (GUI) can be found in the Featured Apps list

Launch a Cloud Shell from the list or from the quick launch icon in the Tool Bar by clicking on the icon.

The Unix Shell¶

The CLI sees the computer stripped down to only a Terminal from where one can run powerful commands executed through the Shell.

Though there are technical differences between them, the terms Command Line Interface, Terminal, Shell, and BASH will be used more or less interchangeably throughout the lesson.

Attention Windows users

Much of what we are going to be teaching is based on open-source software which operates on cloud and is incompatible with Windows OS.

Unix-based systems such as Linux Ubuntu and MacOS X, as many scientific tools require a Unix Operating System (OS).

There are a number of software that allow Windows users to execute Unix commands, however we recommend the use of Windows Subsystem for Linux (WSL) 2.0.

Quickstart installation of Window's WSL

A system reboot is necessary

Open PowerShell in Administrator mode (open Search and look for PowerShell, right click and select "Run as Administrator")
type wsl --install
Restart your machine
Open Search and open WSL; create a username and password, wait for it to finish setting up (should take a few minutes)
You're now ready to use Linux on your Windows Machine!

Where is the WSL Home folder?

The Home folders for Linux and Windows are different. The Windows path to the WSL home folder is \\wsl$\Ubuntu\home\<username>.

We suggest creating a bookmark in your Windows machine to allow quicker access to the Linux partition (for quicker access to files).

To quickly open the folder, open WSL and execute explorer.exe .. This will open a folder in Windows at the Linux Home folder.

Managing Your CyVerse Data with GoCommands¶

Before launching an analysis, you need a way to move data between your personal computer and the CyVerse Data Store. The best tool for this is gocmd, a command-line client written by CyVerse research software engineers in Google's Go langauge.

You can install and run gocmd in your computer's terminal to move files from your computer(s) into CyVerse. You can also use gocmd to move files around CyVerse Data Store.

Install and Configure GoCommands

Installation instructions vary by operating system (Windows, macOS, Linux). Please follow the complete, up-to-date guide on the CyVerse Learning site.

GoCommands Installation Guide

Visit the official GoCommands Installation and Configuration Guide and follow the steps for your operating system.

Optionally, add the gocmd binary to your PATH environmental variable or .bashrc

The most important step is running gocmd init to connect the tool to your CyVerse account.

You will be queried to enter CyVerse specific information for the Data Store. You can leave the default values for the configuration.

Configuration Key	Value
`irods_host`	`data.cyverse.org`
`irods_port`	`1247`
`irods_zone_name`	`iplant`
`irods_user_name`	`<CyVerse Username>`
`irods_user_password`	`<CyVerse Password>`

Use these credentials for anonymous access to the Data Store:

Configuration Key	Value
`irods_user_name`	`anonymous`
`irods_user_password`	(leave empty)

graph LR
    subgraph "Your Local Machine"
        direction LR
        Local_PC[("💻<br>Local Computer")]
        Local_Files["Local Files<br>(e.g., my_data.csv)"]
    end

    subgraph "CyVerse Cloud Platform"
        direction TB

        subgraph "Jupyter Lab (Ephemeral Storage)"
            direction TB
            VM["⚙️ Jupyter Lab VM"]
            VM_FS["VM Filesystem<br>(/home/jovyan, etc.)"]
            VM_Mount["/data-store<br>(Mount Point)"]
            VM --> VM_FS
            VM --> VM_Mount
        end

        subgraph "CyVerse Data Store (Persistent iRODS Storage)"
            direction TB
            iRODS[("☁️<br>iRODS Data Store")]
            iRODS_Home["/iplant/home/&lt;user&gt;"]
            iRODS_Shared["/iplant/home/shared"]
            iRODS --> iRODS_Home
            iRODS --> iRODS_Shared
        end

        VM_Mount -- "Mounted via<br>Kubernetes CSI Driver" --> iRODS
    end

    Local_PC -- "Transfer via<br>gocmd put/get" --> iRODS

    classDef system fill:#f0f9ff,stroke:#0ea5e9,stroke-width:2px;
    classDef storage fill:#eef2ff,stroke:#6366f1;
    classDef connector stroke-dasharray: 5 5;
    class Local_PC,VM,iRODS system;
    class Local_Files,VM_FS,VM_Mount,iRODS_Home,iRODS_Shared storage;

Transfer Data

Once installed and configured, you can manage your files from your local terminal. Here are the essential commands:

Command	Explanation
`gocmd ls`	list the contents of your home directory in the CyVerse Data Store.
`gocmd put <local_file>`	put (upload) a file from your computer to your CyVerse home directory.
`gocmd get <cyverse_file>`	get (download) a file from your CyVerse home directory to your computer.

Example: Let's say you have a file my_data.csv on your computer that you want to analyze in CyVerse.

On your local computer's terminal

First, upload the file to CyVerse

gocmd put my_data.csv

You can verify it's there by listing your files

gocmd ls

Now the file is in the CyVerse Data Store, ready to be used in an analysis.

Other ways to move data to CyVerse

There are several ways to access the Data Store. These methods vary in speed, flexibility, and technical knowledge required. Different methods may suit your needs for different projects at different times.

Method	Access Point	OS	Upload/Download	Installation/Setup Required	Account Required	Max File Size
Discovery Environment	Web	Any	Both	No	Yes	2GB/file upload, no limit for import
WebDAV	Web & Command line	Any	Both	No	Yes (No for public data)	No limit
GoCommands	Command line	Any	Both	Yes	Yes (No for public data)	No limit
iCommands	Command line	Linux & macOS	Both	Yes	Yes (No for public data)	No limit
SFTP	Desktop App & Command line	Any	Both	No (Yes for desktop app)	Yes (No for public data)	No limit

Not Comfortable with the CLI yet?

That is totally okay!

There are many GUI based ways to move data to CyVerse Data Store.

CyVerse VM File System Layout (Ubuntu)¶

This diagram illustrates the typical file system structure you'll encounter in a CyVerse virtual machine.

graph TD
    A["/ (root)"]:::dir;

    subgraph "Standard Linux Directories"
        A --> B["/home"];
    end

    subgraph "User Home Directory"
        B --> C["/home/jovyan"];
    end

    subgraph "CyVerse Data (Mounted via Kubernetes)"
        A --> D["/data-store (Physical Mount Point)"];
        D --> D_Home["/data-store/home"];
        D --> D_Shared["/data-store/shared"];
        D_Home --> D_User["/data-store/home/&lt;cyverse-username&gt;<br/>(Personal Data)"];
        D_Shared --> D_Community["Community & Published Data"];
    end

    subgraph "Symbolic Link for Convenience"
         C --> E["/home/jovyan/data-store<br/>(Symbolic Link)"];
    end

    %% --- Links ---
    E-.->|links to| D;

    %% --- Styling ---
    classDef dir fill:#e6f4ff,stroke:#6ba2d1,stroke-width:2px;
    class A,B,C,D,D_Home,D_Shared,D_User,D_Community,E dir;

Diagram Key

Solid Lines (-->): Represent a direct parent-child relationship in the file system (e.g., /home is inside /).

Dotted Lines (-.->): Represent a symbolic link. The /home/jovyan/data-store folder is a shortcut that points directly to the /data-store mount point.

/data-store: This is not a normal directory. It's a special mount point managed by Kubernetes, connecting your virtual machine to the main CyVerse Data Store.

Launching a Virtual Machine in the DE¶

Now that your data is in the cloud, you can launch a virtual machine to work with it. We recommend starting with a Cloud Shell, or Jupyter Lab, as both provide a user-friendly interface with a built-in terminal.

Log in to the CyVerse Discovery Environment.¶

Click the Apps button to open the Apps window.

Search for a "Jupyter Lab" and select the latest version.

On the App page, you can give your analysis a name. For now, the default settings are fine.

Click Launch Analysis. Your App will take a few moments to start.

Once the status says "Running," click the hyperlink for your analysis. A new browser tab will open with your Jupyter Lab session.

The Gateway to Commands: The Terminal All actions from this point forward will be performed inside the terminal of your CyVerse virtual machine.

To Open the Terminal in Jupyter Lab: Click the File menu -> New -> Terminal.

You will see a window with a prompt, waiting for your commands. This is the Shell.

The commands for navigating the file system in your VM are the same standard Unix commands you would use elsewhere.

First, you'll need to get the data you uploaded with gocmd from the Data Store into your running VM. For this, we use gocmd get to move files.

In your CyVerse VM Terminal¶

Download the file from the Data Store to your VM¶

cd

gocmd get --progress /iplant/home/shared/NCEMS/pre-summit-foss/sample-data .

The get command tells the gocmd program to "get" or download data. the --progress command tells the program to print out verbose updates as the download runs.

Now you can see it in your VM's file system

ls -F

Is the pre-summit-foss/ folder there? What do you see?

Create a reproducible Python environment with a package manager¶

To ensure your analysis is reproducible, you should define the exact software and versions you need. We can do this using a Conda environment managed by mamba, a fast implementation of the conda package manager.

We will define our environment in a special file called environment.yaml

Create the Environment File In your terminal, create the file using the nano text editor:

cd ~/pre-summit-foss/sample-data
ls -F
cat cowsay.yml

# environment.yml
# This file defines the Conda environment for the CLI art app.
#
# To create the environment, run:
# conda env create -f environment.yml
# To activate the environment, run:
# conda activate cli_art_env
#
# IMPORTANT: This app also requires the following command-line tools
# which should be installed with your system's package manager:
#
# On Debian/Ubuntu (using apt-get):
#   sudo apt-get update
#   sudo apt-get install cowsay fortune lolcat
#
# On macOS (using Homebrew):
#   brew install cowsay fortune lolcat
#
# On Fedora/CentOS (using dnf):
#   sudo dnf install cowsay fortune-mod lolcat

name: cowsay

channels:
  - conda-forge
  - defaults

dependencies:
  - python=3.9
  - pip
  - rich

Conda vs Mamba

Conda is the original, industry-standard, open-source system for managing packages and environments for any language. It is robust and widely trusted.

Mamba is a re-implementation of the conda package manager in C++. It uses the same commands and configuration but dramatically improves speed by using parallel processing for downloading packages and a much faster dependency solver. For our purposes, it is a drop-in replacement for conda that will make setting up our environments much faster.

You might also be familiar with pip, Python's standard package manager, which installs packages from the Python Package Index (PyPI). While Conda/Mamba can manage packages for any language and even the Python interpreter itself, pip is exclusively for Python. It's common to use pip to install a package inside a Conda environment if that package isn't available on Conda channels. The general rule is to use Conda/Mamba whenever possible, and then use pip for any remaining Python-specific packages.

Build and Activate the Environment

Now, use mamba to create the environment from your file.

mamba env create -f cowsay.yml

Approve the installation

Mamba will download and install the specified packages. Once it's finished, you can activate your new environment to start using it.

conda activate cowsay

Your terminal prompt should now change to show (plotting_env), indicating that you are inside the environment. Any command you run will use the software installed there.

To exit the environment, simply run:

conda deactivate

Run the app¶

After you've activated the new conda environment, run the app.

python cowsay.py

What happens?

Managing Analyses¶

In the https://de.cyverse.org/analyses you can view your running or completed analyses.

Your analysis should have an active timer counting down, if it is still active. You can extend the time by clicking on the ellipses and selecting extend time.

You can also terminate your app by clicking on the red X.

Note: apps will time out on their own, but they will burn your valuable allocation hours if left running.

Practice shutting down your app by clicking on the check box and terminating it.

Jupyter Lab GPUs w/ Ollama¶

Start a new App.

In Apps, https://de.cyverse.org/apps choose a Jupyter Lab Pytorch GPU

Waiting for apps to start

When you start an interactive application expect to wait a minute or two. It should not take more than 5 minutes to start an Featured application -- private apps may take longer because their Docker container must be downloaded from an external registry into CyVerse.

Run Your Own AI Model with Ollama¶

For advanced tasks, CyVerse offers Apps with Graphics Processing Units (GPUs), which are essential for AI and machine learning. These GPU-enabled Apps come with Ollama pre-installed. Ollama lets you easily run powerful, open-source Large Language Models (LLMs) locally within your private session.

Download an AI Model To use a model, you first need to download it. We'll use gemma:2b, a relatively small but capable model from Google. This command only needs to be run once per VM.

In your CyVerse GPU VM Terminal¶

Open a Terminal in your Jupyter Lab

Install a small Gemma model from the Ollama registry:
```
# Install Ollama
pip install ollama
# Start Ollama Server
ollama serve
```
Run the Model

Open another terminal window.
```
ollama pull gemma:2b
```
Now you can run the model and interact with it directly from your command line.
```
ollama run gemma:2b
```
The prompt will change, and you can now ask the AI questions. Let's try one related to our last topic:
```
>>> What is the purpose of a conda environment?
```
The model will generate a detailed explanation right in your terminal. To exit the Ollama session, type /bye and press Enter.

This gives you a powerful, private AI assistant that can help you write code, debug errors, and understand complex topics—all within your secure CyVerse environment. We will explore this in much more detail in the next lesson!

Introductory Shell Commands¶

The following tutorial material was taken from the Carpentries Shell Module.

Download Some Data from the Carpentries

To follow along with the tutorial, please download and unzip this data. shell-lesson-data.zip

The Command Line Way to Download and Unzip!

Execute the following commands:

$ sudo apt install unzip
$ wget https://swcarpentry.github.io/shell-novice/data/shell-lesson-data.zip
$ unzip shell-lesson-data.zip

Help with Commands

For every command, typing man (manual) before the command, will open the manual for said command.

$ man ls

The above command will result in opening the manual for the ls command. You can exit the man page by pressing q.

Command Flags

Each command has flags, or options that you can specify. which are summoned with a -, such as <command> -<flag>.

$ ls -a -l -h

The above command calls for the -a (all), -l (long), -h (human readable) flags. This causes ls to output a list of all files (inculding hidden files/folders) with human readable file size (e.g., it will list 3MB instead of 3000000), permissions, creator, and date of creation.
If you do not know what flags are available, you can refer to the man command (or for many tools, use the -h (help) flag).

Tips for Directory Navigation

. refers to current directory

.. refers to above directory

/ is the directory separator

~ indicates the home directory

For example:

$ ls .            # lists files and folders in the current directory
$ ls ..           # lists files and folders in the above directory
$ ls ~            # lists files and folders in the home directory
$ ls ~/Documents  # lists files and folders in Documents (a folder present in the home directory)

open science — Linux Directory Structure

Command	Explanation
`pwd`	print working directory
`ls`	list content of folder
`cd`	change directory

By typing pwd, the current working directory is printed.

$ pwd

/home/jgillan

We can then use ls to see the contents of the current directory. By using the -F flag (ls -F) we can also see the type of file. Note: an asterisk (*) at the end of the object will denote a file, whilst a slash (/) will denote a folder.

$ ls -F 
shell-lesson-data/   shell-lesson-data.zip*

We can then move inside the folder of our choice doing cd. Doing ls following the opening of the folder of choice, will show the contents of the folder you just moved in. Feel free to explore the contents of the folders by using cd and ls.

$ cd shell-lesson-data
$ ls -F

exercise-data/  north-pacific-gyre/

$ ls -F exercise-data/

animal-counts/  creatures/  numbers.txt*  proteins/  writing/

Use the Tab key to autocomplete

You do not need to type the entire name of a folder or file. By using the tab key, the Shell will autocomplete the name of the files or folders. For example, typing the following

$ ls -F exer

and pressing the tab key, will result in autocompletion.

$ ls -F exercise-data/

You can then press tab twice, to print a list of the contents of the folder.

$ ls -F exercise-data/
animal-counts/ creatures/     numbers.txt    proteins/      writing/

Working with Files and Directories¶

Command	Explanation
`mkdir`	make a directory
`touch`	creat empty file
`nano` or `vim`	text editors
`mv`	move command
`cp`	copy command
`rm`	remove command

Return to shell-lesson-data, and create a directory with mkdir <name of folder>.

$ mkdir my_folder
$ ls -F

exercise-data/  my_folder/  north-pacific-gyre/

Notice the new my_folder directory.

Naming your files

It is strongly suggested that you avoid using spaces when naming your files. When using the Shell to communicate with your machine, a space can cause errors when loading or transferring files. Instead, use dashes (-), underscores (_), periods (.) and CamelCase when naming your files.

Acceptable naming:

$ mkdir my_personal_folder
$ mkdir my_personal-folder
$ mkdir MyPersonal.Folder

What will happen if you create a directory with spaces?

You will obtain as many folders as typed words!

$ mkdir my folder
$ ls -F
exercise-data/  folder/  my/  north-pacific-gyre/

Notice the two folders my and folder.

Create an empty file with touch <name of file>

$ touch new_file.txt

touch will create an empty file

Add text to the new file

nano new_file.txt

Use mv <name of file or folder you want to move> <name of destination folder> to move your newly created file to the directory you created previously (you can then use ls to check if you successully moved the file).

$ ls -F
exercise-data/  new_file*  my_folder/  north-pacific-gyre/

$ mv new_file.txt my_folder/
$ ls -F
exercise-data/  my_folder/  north-pacific-gyre/

$ ls -F my_folder/
new_file.txt*

mv can also be used to rename a file or folder with mv <name of file or folder you want to change> <new name>.

$ cd my_folder/
$ mv new_file my_file
$ ls -F
my_file*

cp is the command to copy a file with the syntax cp <name of file you want to copy> <name of copy file>

$ cp my_file copy_my_file
$ ls -F 
copy_my_file*  my_file*

Copying folders

To copy folders and the content of these folders, you will have to use the -r flag (recursive) for cp in the following manner cp -r <name of folder you want to copy> <name of copy folder> (following example is from the shell-lesson-data/ directory).

$ cp -r my_folder/ copy_my_folder
$ ls -F
copy_my_folder/  exercise-data/  my_folder/  north-pacific-gyre/

$ ls -F my_folder/
copy_my_file*  my_file*

$ ls -F copy_my_folder/
copy_my_file*  my_file*

To remove an unwanted file, use rm <name of file to remove>.

$ rm copy_my_file
$ ls -F 
my_file

Removing folders

Save as the "Copying Folders" note, you have to use the -r flag to remove a folder rm -r <name of folder you want to remove> (following example is from the shell-lesson-data/ directory).

$ rm -r copy_my_folder/
$ ls -F
exercise-data/  my_folder/  north-pacific-gyre/

Shell Script¶

Here we are going to show an example command line automation using a shell script. This is what makes the command line powerful!

Shell Script

A shell script is a file with the extension '.sh'. It is essentially a text file that lists out multiple shell commands. When the shell script is run, the computer will run all of the commands in sequence in an automated way.

Navigate to the shell-lesson-data directory

$ cd /home/jgillan/shell-lesson-data

Create the shell script

$ nano backup.sh

The text editor Nano will pop up and it will be empty.

Copy and paste the following commands into backup.sh

#use Bash shell to run the following commands
#!/bin/bash

## Variables
#the directory you want to back up (e.g., shell-lesson-data)
SOURCE_DIR="$HOME/Documents/shell-lesson-data"

#location where the backup will be stored
BACKUP_DIR="$HOME/Backup"

#used to create a unique name for each backup based on the current date and time
TIMESTAMP=$(date +"%Y-%m-%d_%H-%M-%S")

# name of the compressed backup file
ARCHIVE_NAME="backup_$TIMESTAMP.tar.gz"

# Create backup directory if it doesn't exist
mkdir -p "$BACKUP_DIR"

# Create a compressed archive of the source directory
tar -czf "$BACKUP_DIR/$ARCHIVE_NAME" -C "$SOURCE_DIR" .

# Output the result
echo "Backup of $SOURCE_DIR completed!"
echo "Archive created at $BACKUP_DIR/$ARCHIVE_NAME"

Exit nano with ctrl + x

Modify permission to make the shell script executable

$ chmod +x backup.sh

Run the shell script

$ ./backup.sh

Go back to your home directory and look for the new backup directory

$ cd ~
$ cd ls

There should be a new directory called 'Backup' with a compressed file within it.

How to Talk to Computers¶

What is CyVerse?¶

CyVerse Account Creation¶

Discovery Environment Tour¶

Starting a VICE App¶

The Unix Shell¶

Managing Your CyVerse Data with GoCommands¶

CyVerse VM File System Layout (Ubuntu)¶

Launching a Virtual Machine in the DE¶

Log in to the CyVerse Discovery Environment.¶

File System Navigation¶

In your CyVerse VM Terminal¶

Download the file from the Data Store to your VM¶

Create a reproducible Python environment with a package manager¶

Run the app¶

Managing Analyses¶

Jupyter Lab GPUs w/ Ollama¶

Run Your Own AI Model with Ollama¶

In your CyVerse GPU VM Terminal¶

Introductory Shell Commands¶

Navigation¶

Working with Files and Directories¶

Shell Script¶

More Carpentries Lessons on Linux Command line¶