Parallel Computing Architecture 048874

also known as Manycores for Machine Learning

Fall 2017/2018

Last updated: 14 January 2018

Time and Place

Sunday 17:00-19:00 in EE-961


Prof. Ran Ginosar

Course Plan

·         The PLURAL manycore architecture (and other architectures), including programming models, algorithms and performance evaluation

·         Machine learning on manycore (and on other architectures)

·         Parallel computing: Many manycores for Big Data Machine Learning


1.      What is parallel computing? Short summary / intro in

2.      The semi-annual beauty contest of supercomputers happens in

3.      Wikipedia (kind-of) describes Amdahl's Law, Gustafson's Law, Karp-Flatt Metric, Speedup, Parallel Computing, and various related topics linked from these pages

4.      Google-Tech-Talk by Mark Hill (U. Wisconsin) on Amdahl's Law for Multicores and visit his Amdahl's page. He starts in the right direction but then veers off and we will have issue with his vision.

5.      Google-Tech-Talk by Jack Dongarra (UTK) on HPC (high performance computing). He has been a key leader in developing linear algebra libraries for parallel computing. The lecture provides a good background on supercomputers and then suggests one of the best ways for fighting Amdahl's Law on them.

6.      Google-Tech-Talk by Dave Patterson (UC Berkeley) on The View from Berkeley and visit their web page. It attempts to review the entire field, but Dave makes a gloomy outlook which we should take as a challenge.

7.      Slides for Introduction to Concurrency in Programming Languages by Sottile, Mattson and Rasmussen on  (slides under “course material”). The presentation is useful from an architect's point of view.

8.      Slides of a tutorial following Structured Parallel Programming by McCool, Robinson and Reinders. There's lots of structure there!

9.      Implementer’s viewpoint of Deep Learning, Neural Networks and Machine Learning:
A short summary for this course is here. We will follow this summary in class and in homework.

10.  We plan to use the MNIST database of handwritten digits as an example for playing with Machine Learning. It has been widely used as a benchmark, except that most studies investigate recognition error rates and ignore performance and computational complexity.

Class Schedule

Lecture 1

25 Oct 2017

Intro, Motivation, PLURAL architecture

LLNL web, Top500 web

Lecture 2

29 Oct 2017



5 Nov 2017

No class


Lecture 3

12 Nov 2017

NN, Motivation and Background

Intro to NN, LLNL web

Lecture 4

19 Nov 2017

Plural architecture


Lecture 5

26 Nov 2017

Plural architecture, MTE simulation, PRAM


Lecture 6

3 Dec 2017

Other Manycores


Lecture 7

10 Dec 2017


HW1 due

17 Dec 2017

No class


Lecture 8

24 Dec 2017

Xeon Phil Linpack, HPC Challenge Web


Lecture 9

31 Dec 2017

Roofline Model, Roofline Xeon Phi KNLRoofline Compressed SpMV

HW2 due

Lecture 10

7 Jan 2018

GPU slides and John Nickolls lecture



14 Jan 2018

No class


Lecture 11

21 Jan 2018

F&O and XMT, ICE, ICE-poster



30 Jan 2018

HW3 due


Homework Assignments

1.     Sequential inference on FCNN MNIST inference with 2-layer 100 hidden units NN as a sequential C program (doc, parameters, labels, parameter reader, data reader) [due 10-Dec-2017]

2.     Manycore inference on FCNN using MTE (doc, Firas script, Roye script)

3.     Manycore inference on CNN using MTE (doc, parameter reader, parameters)


      Limited shared-memory version of HW3