Parallel R Workflows: Welcome!


Introduction

Dr Anna Krystallli

R-RSE


https://r-rse-parallel-r.netlify.app

👋 Hello

me: Dr Anna Krystalli

  • Research Software Engineering Consultant, r-rse

    • mastodon annakrystalli@fosstodon.org
    • twitter @annakrystalli
    • github @annakrystalli
    • email r.rse.eu[at]gmail.com
  • Editor rOpenSci

  • Founder & Core Team member ReproHack

Objectives

In this course we’ll explore:

  • Understand general concepts and strategies to parallelisation.

  • R packages available for parallelising computation in R.

  • Focus on the futureverse collection of packages.

  • Deploying Parallel workflows on the Iridis cluster

Background

Computation

transistor icon

Transistor icons created by surang - Flaticon

Computers represent info using binary code in the form of digital 1s and 0s inside the central processing unit (CPU) and RAM. These digital numbers are electrical signals that are either on or off inside the CPU or RAM.

Each transistor is a switch, that is, 0 when turned off and 1 when turned on. The more transistors, the more switches.

Transistors are the basic building blocks that regulate the operation of computers, mobile phones, and all other modern electronic circuits and is the basic unit of the CPU

Moore’s law

When the price is unchanged, the number of components that can be accommodated on the integrated circuit will double every 18-24 months, and the performance will double. In other words, the performance of a computer that can be bought for every dollar will more than double every 18-24 months

Yet…

we’ve hit clock speed stagnation

50 Years of Processor Trends. Distributed by Karl Rupp under a CC-BY 4.0 License

50 Years of Processor Trends. Distributed by Karl Rupp under a CC-BY 4.0 License

About computer hardware

CPU (Processing)

RAM (memory)

Hard Disks, Networks (I/O)

CPU

  • The central processing unit (CPU), or the processor, is the brains of a computer. The CPU is responsible for performing numerical calculations.

  • The faster the processor, the faster R will run.

  • The clock speed (or clock rate, measured in hertz) is the frequency with which the CPU executes instructions. The faster the clock speed, the more instructions a CPU can execute in a section.

RAM

  • Random access memory (RAM) is a type of computer memory that can be accessed randomly: any byte of memory can be accessed without touching the preceding bytes.

  • The amount of RAM R has access to is incredibly important. Since R loads objects into RAM, the amount of RAM you have available can limit the size of data set you can analyse. MEMORY BOUND

Even if the original data set is relatively small, your analysis can generate large objects

About R

R is an interpreted language

Compiled Languages

Converted directly into machine code that the processor can execute.

  • Tend to be faster and more efficient to execute.

  • Need a “build” step which builds for system they are run on

  • Examples: C, C++, Erlang, Haskell, Rust, and Go

Interpreted Languages

Code interpreted line by line during run time.

  • significantly slower although just-in-time compilation is closing that gap.

  • much more compact and easy to write

  • Examples: R, Ruby, Python, and JavaScript.

R performance

  • R offers some excellent features: dynamic typing, lazy functional evaluation and object-orientation

  • Side effect: operations are undertaken in single-threaded mode, i.e. sequentially

  • Many routines in R are written in compiled languages like C & Fortran.

  • R performance can be enhanced by linking to optimised Linear Algebra Libraries.

  • Many packages wrap more performant C, Fortran, C++ code.

  • R offers many ways to parallelise computations.

R as a language and environment is reasonably well established and understood. A combination of dynamic typing, lazy functional evaluation and object-orientation (in several flavors) makes for a somewhat unique combination.

One side-effect of this design is that core operations are undertaken in single-threaded mode, or, in other words, sequentially.

About Parallel Computing

Types of parallel processing (problem types)

  • A single program, multiple data: all processors use the same program, though each has its own data. (Data parallelism)

  • Multiple programs, single data: each processors uses a different program on the same data. (Task parallelism)

  • Multiple programs, multiple data: each processors uses a different program and its own data. (Mixed Data and Task parallelism)

Types of parallel programming (architecture types)

  • Embarrassingly parallel: the simplest type of parallelism to implement where each task in a job is completely independent of the others. There is no communication required between tasks, which is what makes it easy to implement.

  • Shared-Memory Parallelism: when tasks are run on separate CPU-cores of the same computer. In other words, a single program can access many cores on one machine. CPU-cores share memory because they are on the same computer and all have access to the same memory card.

  • Distributed-memory parallelism: running tasks as multiple processes that do not share the same space in memory. This is one of the more complicated types of parallelism, since it requires a high level of communication between different tasks to ensure that everything runs properly.

About this course

  • I normally like to live code…BUT!

  • There’s a lot of materials to get through so I will be copying & pasting from the materials alot

  • Have the materials handy to follow along

  • Please stop me for questions or to share your own experiences

  • Lunch around 1pm

Let’s go!

Parallel R Workflows: Welcome! Introduction Dr Anna Krystallli R-RSE https://r-rse-parallel-r.netlify.app

  1. Slides

  2. Tools

  3. Close
  • Parallel R Workflows: Welcome!
  • 👋 Hello
  • Objectives
  • Background
  • Computation ...
  • Moore’s law
  • Yet…
  • About computer hardware
  • CPU (Processing)...
  • About R
  • R is an interpreted language
  • R performance
  • About Parallel Computing
  • Types of parallel processing (problem types)
  • Types of parallel programming (architecture types)
  • About this course
  • Let’s go!
  • f Fullscreen
  • s Speaker View
  • o Slide Overview
  • e PDF Export Mode
  • ? Keyboard Help