In the summer of 2016, Ivo Filot went to Penn State University (USA) to stay for two months in Adri van Duin’s research group. The goal of his stay was to develop a new algorithm for fitting ReaxFF potentials. In this blog, kept especially for MCEC, Ivo tells you all about his research, his professional as well as his personal findings, and the perks and peculiarities of working halfway across the world.
Week 2: Parallelization
(Monday July 11th – Sunday July 17th)
Great to see that you are here for my next blog. Like I said, I typically start every working week with setting a few goals for myself. Last week, my goal was to get acquainted with my new environment, but I ended up solving a functionality problem as well. For this week, one of my targets is to parallelize my program. Now what exactly do I mean by that?
Parallelization is the process wherein you modify your existing code in such a way that the tasks can be divided into smaller tasks, which can be run at the same time.
Imagine you want to build a house which only consists of four walls (for the purpose of this analogy, it’s a very simple house). You could hire a single construction worker that builds all four walls. He will do a fine job, for sure. But, you could also hire four construction workers which will each build one of the four walls. Assuming these workers cooperate nicely, if you go for the latter option, your house will be ready much earlier.
The same goes for code. In the code, you write what kind of task needs to be done and you let your processors (or processor cores), a.k.a. your construction workers, do the job.
Now you might think, if all you need to do is use more cores to get the job done faster, why are not all programs parallelized? The reason for this is that not all jobs can be parallelized. Some tasks are, what we call in technical terms, inherently serial, while others can indeed be parallelized.
When you’re building a four-walled house, you first have to mix the mortar before you can start building the brick wall. You cannot simultaneously mix the mortar and lay the bricks, or prepare the mixture while in the meantime already building the wall, or mix the sand and cement by just piling bricks onto it. It just would not work. Thus, we say that the tasks of building a wall have to be executed in serial.
To parallelize a program, the programmer asks the following question: “Which part of the program depends on other parts, and which do not?” The reason that tasks are serial is because they have a dependency. In other words, for some tasks, you first have to do something in advance before you can do it. In the end, I actually found several tasks that are independent and could be parallelized. So this brings us to the next question: “Which tasks do you choose?”
The answer here is to pick those tasks that lie hierarchically the highest. That sounds difficult, but bear with me. Let us go back to the example of building a house with only four walls and four construction workers. You could parallelize this task by letting each construction worker build his own wall. Alternatively, you can let each construction worker add a single stone to each line of stones in the same wall, because adding a single stone is more-or-less an independent task. (And to avoid any further problems with the laying of the cement, let us from now on assume the house is made of LEGO.)
As putting individual stones is actually part of building a wall, the former process (each worker builds his own wall) lies hierarchically lower than the latter (each worker contributes to the same wall). You might argue that in principle, it should not matter! All tasks cost exactly the same amount of time, whether you build a house wall-by-wall or stone-by-stone.
And then you would be right, of course. Because this is not the full story. Processor cores, just like construction workers, need to communicate with each other. Every time a single task is being executed, a bit of communication has to occur so that all processors or construction workers know what the others are doing. For hierarchically lower lying tasks, there is typically more communication needed – imagine having to communicate about the laying of every single brick vs. having to communicate about the building of one of the four walls. You can imagine how the former slows down the overall process, right? This is the reason why you want to parallelize higher tasks.
In the end, I was able to parallelize my program and it can now run efficiently on multiple cores. This is good news, as we can now make efficient use of supercomputers which require our program to be parallelized. Want to know more about supercomputers? Read more about them in next week’s entry, when I will be using them.