Mastering Concurrency in Java: Part 1 – Getting Started
Introduction
Concurrent programming allows a program to handle multiple tasks at the same time. These tasks can either take turns using the CPU (overlap) or run at the same time if there are multiple CPU cores. For example, consider a music streaming application like Apple Music or Amazon Music. These applications handle several tasks concurrently: fetching a song from the network, playing it, and displaying the lyrics—all at the same time. In this post we will go through the basics of concurrent programming in general.
Concurrency in an application can be achieved in two primary ways:
- Multi-Processing
- Multi-Threading
What Are Processes and Threads?
Process
A process is an independent program in execution. Each process operates in its own memory space, which is isolated from other processes. This means every process is allocated a specific amount of memory and resources during execution.
Thread
A thread is a smaller unit of execution within a process. A process can have multiple threads and always has at least one thread, typically called the Main Thread. All threads within a process share the same memory and resources, such as open files, network sockets, and global variables.
Threads vs. Processes
Since threads share the memory and resources of their parent process, the creation and management of threads involve less overhead compared to processes. This lightweight nature of threads makes them more efficient for tasks requiring frequent context switching or communication.
On the other hand, processes are fully isolated from one another. While this isolation enhances security and stability, it also makes communication between processes more complex. Processes must use Inter-Process Communication (IPC) mechanisms, such as pipes, sockets, or shared files, to exchange data. These mechanisms enable communication between processes on the same machine or even across different machines.
Thread Communication
Threads within a process can communicate easily since they share a common memory space. However, this shared memory can lead to synchronization issues and potential race conditions. Developers must carefully handle these problems using synchronisation tools like mutexes, semaphores, or locks to ensure thread-safe operations.
Let's discuss multithreading and potential issues like deadlocks, livelocks, and more in the next blog: "Mastering Concurrency in Java - Part 2: Low-Level Concurrency."
In contrast, since processes do not share memory, their communication requires explicit mechanisms like IPC. Although this adds complexity, it prevents the issues caused by shared memory in threads, such as data corruption or inconsistent states.
The Importance of Concurrent Programming
Let’s say you are building an application that allows users to search for a word across multiple files and display its occurrences, similar to the recursive grep
command in Unix-based operating systems.
Sequential Approach
In a sequential implementation:
- The application reads each file one by one.
- It searches for the word in the current file.
- Once it finishes processing the current file, it moves on to the next file.
- The process continues until all files are processed, and the results are finally displayed.
While this approach is straightforward, it has significant drawbacks:
- If the files are large or there are many files, the application can take a long time to finish.
- The CPU might remain underutilized because it processes files sequentially, waiting for I/O operations (like reading files from disk) to complete before moving to the next task.
Concurrent Approach
In a concurrent implementation, the process is broken down into smaller, parallel tasks. For example:
- Divide and Conquer: Split the list of files into smaller groups and assign each group to a separate thread or process.
- Parallel Search: Each thread/process searches for the word in its assigned files simultaneously.
- Aggregate Results: Once all threads/processes complete, their results are combined and displayed to the user.
This approach has significant advantages:
- Faster Execution: By processing multiple files simultaneously, the application can leverage multi-core CPUs, reducing the overall execution time.
- Efficient Resource Utilization: While one thread waits for a file to load, another thread can be actively searching in a file that’s already loaded. This keeps the CPU busy.
Relation Between Multi-Processing, Multi-Threading, and CPU Cores
The way multi-processing and multi-threading perform depends on the number of CPU cores in a system. Here’s a simpler breakdown of how they work:
Multi-Processing
- What it is: Multi-processing means running multiple processes at the same time. Each process is separate and can run independently.
- How it works with multiple cores:
- If a system has N cores, it can run N processes simultaneously, with each process using a different core.
- This helps make full use of all cores, especially for tasks that require a lot of CPU power (CPU-bound tasks).
- On a single core: Processes can't run at the same time. The CPU switches between processes, making them appear to run simultaneously, but only one process runs at a time.
Multi-Threading
- What it is: Multi-threading involves multiple threads running within the same process. Threads share the same memory and resources, so they can communicate easily with each other.
- How it works with multiple cores: Threads can run on different cores, allowing them to truly run in parallel, which is great for CPU-heavy tasks.
- On a single core: Threads take turns running because the core can only execute one thread at a time. This doesn’t provide true parallelism, but it still allows for concurrent execution (doing multiple things at once, just not at the exact same time).
- I/O-bound tasks: Multi-threading is especially useful for tasks like reading files or waiting for data because threads can keep working on one task while waiting for another to finish.
Concurrency in Java
Java has supported concurrency since its initial release (Java 1.0). The Thread
class (java.lang.Thread
) provides a low-level API for creating and managing threads, enabling basic multi-threading capabilities. However, low-level threading APIs can be error-prone and challenging to manage.
To simplify concurrent programming, Java introduced high-level concurrency APIs in Java 5 through the java.util.concurrent
package. These APIs streamline the development of robust and scalable concurrent applications.
In the next posts let us focus on,
Part 2: Low-Level Concurrency
In the next post, we will explore low-level concurrency concepts in detail. Topics include:
- Basics of threads and thread lifecycle
- Issues like memory inconsistency, retention, deadlocks, live-locks, and starvation
- Techniques to avoid these issues, such as:
- Synchronised blocks and methods
- Synchronised statements
- Atomic access
- Intrinsic locks
Part 3: High-Level Concurrency API
This part will introduce advanced concurrency tools provided in java.util.concurrent
, including:
- Executors and thread pools
- Synchronisers like
CountDownLatch
,CyclicBarrier
, andSemaphore
- Concurrent collections (
ConcurrentHashMap
,CopyOnWriteArrayList
, etc.) - Atomic variables (
AtomicInteger
,AtomicLong
, etc.) - Fork/Join framework
Each post will include examples to illustrate these concepts, helping you understand both the potential challenges and the tools to overcome them. We will discuss best practices to write clean, efficient, and thread-safe code.
Feel free to share your thoughts or ask questions in the comments below. Stay tuned for these in-depth guides, with the next part coming soon!