Java

Mastering Concurrency in Java: Part 1 – Getting Started

Deva Kumar Kilim

28 Nov 2024 • 5 min read

Introduction

Concurrent programming allows a program to handle multiple tasks at the same time. These tasks can either take turns using the CPU (overlap) or run at the same time if there are multiple CPU cores. For example, consider a music streaming application like Apple Music or Amazon Music. These applications handle several tasks concurrently: fetching a song from the network, playing it, and displaying the lyrics—all at the same time. In this post we will go through the basics of concurrent programming in general.

Concurrency in an application can be achieved in two primary ways:

Multi-Processing
Multi-Threading

What Are Processes and Threads?

Process

A process is an independent program in execution. Each process operates in its own memory space, which is isolated from other processes. This means every process is allocated a specific amount of memory and resources during execution.

Thread

A thread is a smaller unit of execution within a process. A process can have multiple threads and always has at least one thread, typically called the Main Thread. All threads within a process share the same memory and resources, such as open files, network sockets, and global variables.

Threads vs. Processes

Since threads share the memory and resources of their parent process, the creation and management of threads involve less overhead compared to processes. This lightweight nature of threads makes them more efficient for tasks requiring frequent context switching or communication.

On the other hand, processes are fully isolated from one another. While this isolation enhances security and stability, it also makes communication between processes more complex. Processes must use Inter-Process Communication (IPC) mechanisms, such as pipes, sockets, or shared files, to exchange data. These mechanisms enable communication between processes on the same machine or even across different machines.

Thread Communication

Threads within a process can communicate easily since they share a common memory space. However, this shared memory can lead to synchronization issues and potential race conditions. Developers must carefully handle these problems using synchronisation tools like mutexes, semaphores, or locks to ensure thread-safe operations.

Let's discuss multithreading and potential issues like deadlocks, livelocks, and more in the next blog: "Mastering Concurrency in Java - Part 2: Low-Level Concurrency."

In contrast, since processes do not share memory, their communication requires explicit mechanisms like IPC. Although this adds complexity, it prevents the issues caused by shared memory in threads, such as data corruption or inconsistent states.

The Importance of Concurrent Programming

Let’s say you are building an application that allows users to search for a word across multiple files and display its occurrences, similar to the recursive grep command in Unix-based operating systems.

Sequential Approach

In a sequential implementation:

The application reads each file one by one.
It searches for the word in the current file.
Once it finishes processing the current file, it moves on to the next file.
The process continues until all files are processed, and the results are finally displayed.

While this approach is straightforward, it has significant drawbacks:

If the files are large or there are many files, the application can take a long time to finish.
The CPU might remain underutilized because it processes files sequentially, waiting for I/O operations (like reading files from disk) to complete before moving to the next task.

Concurrent Approach

In a concurrent implementation, the process is broken down into smaller, parallel tasks. For example:

Divide and Conquer: Split the list of files into smaller groups and assign each group to a separate thread or process.
Parallel Search: Each thread/process searches for the word in its assigned files simultaneously.
Aggregate Results: Once all threads/processes complete, their results are combined and displayed to the user.

This approach has significant advantages:

Faster Execution: By processing multiple files simultaneously, the application can leverage multi-core CPUs, reducing the overall execution time.
Efficient Resource Utilization: While one thread waits for a file to load, another thread can be actively searching in a file that’s already loaded. This keeps the CPU busy.

Relation Between Multi-Processing, Multi-Threading, and CPU Cores

The way multi-processing and multi-threading perform depends on the number of CPU cores in a system. Here’s a simpler breakdown of how they work:

Multi-Processing

What it is: Multi-processing means running multiple processes at the same time. Each process is separate and can run independently.
How it works with multiple cores:
- If a system has N cores, it can run N processes simultaneously, with each process using a different core.
- This helps make full use of all cores, especially for tasks that require a lot of CPU power (CPU-bound tasks).
On a single core: Processes can't run at the same time. The CPU switches between processes, making them appear to run simultaneously, but only one process runs at a time.

Multi-Threading

What it is: Multi-threading involves multiple threads running within the same process. Threads share the same memory and resources, so they can communicate easily with each other.
How it works with multiple cores: Threads can run on different cores, allowing them to truly run in parallel, which is great for CPU-heavy tasks.
On a single core: Threads take turns running because the core can only execute one thread at a time. This doesn’t provide true parallelism, but it still allows for concurrent execution (doing multiple things at once, just not at the exact same time).
I/O-bound tasks: Multi-threading is especially useful for tasks like reading files or waiting for data because threads can keep working on one task while waiting for another to finish.

Concurrency in Java

Java has supported concurrency since its initial release (Java 1.0). The Thread class (java.lang.Thread) provides a low-level API for creating and managing threads, enabling basic multi-threading capabilities. However, low-level threading APIs can be error-prone and challenging to manage.

To simplify concurrent programming, Java introduced high-level concurrency APIs in Java 5 through the java.util.concurrent package. These APIs streamline the development of robust and scalable concurrent applications.

In the next posts let us focus on,

Part 2: Low-Level Concurrency

In the next post, we will explore low-level concurrency concepts in detail. Topics include:

Basics of threads and thread lifecycle
Issues like memory inconsistency, retention, deadlocks, live-locks, and starvation
Techniques to avoid these issues, such as:

Synchronised blocks and methods
Synchronised statements
Atomic access
Intrinsic locks

Part 3: High-Level Concurrency API

This part will introduce advanced concurrency tools provided in java.util.concurrent, including:

Executors and thread pools
Synchronisers like CountDownLatch, CyclicBarrier, and Semaphore
Concurrent collections (ConcurrentHashMap, CopyOnWriteArrayList, etc.)
Atomic variables (AtomicInteger, AtomicLong, etc.)
Fork/Join framework

Each post will include examples to illustrate these concepts, helping you understand both the potential challenges and the tools to overcome them. We will discuss best practices to write clean, efficient, and thread-safe code.

Feel free to share your thoughts or ask questions in the comments below. Stay tuned for these in-depth guides, with the next part coming soon!