Event Sourcing and CQRS

Reading Time: 5 minutes

Introduction

Event Sourcing and CQRS patterns are out there for a quite long time and the patterns caught my attention, while I was reading blogs. I have gone through some materials and prepared a demo project to reinforce my understanding. Furthermore, I’ll not bore you with lots of in-depth details, which can be found elsewhere, but give fundamentals and show the key points to better apprehend the topic and the concepts.

Event Sourcing

The fundamental logic in Event sourcing is to preserve the changed states of a system that are raised by Domain Events. Ultimately, event sourcing is only about insertion, appending on top of the previous event, and not allowing modifications. In this case each, inserted event is purely atomic and immutable.

The idea is similar to the Ledger in Accounting, as the accountant keeps inserting all the transactions, but not modifying.

Furthermore, the corrections are possible, while insert a new correction event on top of the previous event. This way modifications are provided and the data in the system is consistent.

Additionally, the major benefit of the Event Sourcing pattern is the ability to replay the events, so that you can rebuild the history and see the transitions, if you are to look for a specific piece of history. On top of that, you can build snapshots as checkpoints, if you don’t want to traverse through the whole past, because the history can become quite enormous. Eventually I want to highlight a great example from Greg’s presentation[2]. You will see how the Event Sourcing pattern can transform an architecture and preserve the history of actions.

With such transformation, you will be able to have series of historical facts that cause to the designated events to occur

In addition to Greg’s illustration above, I’d like to state one of his statement here

State transitions are an important part of our problem space and should be modeled within our domain. Facts are about the past.

CQRS

The abbreviation of this pattern goes as Command Query Responsibility Segregation. It suggests an Ultimate separation of Read and Writes and treat them different commands. The pattern makes more sense when the ratio of one operation is higher than the other one.

To exemplify the statement if the ratio of Reads is higher than the Ratio of the Writes in your system, then you may consider this pattern. In data intensive application, this pattern increases the performance, however as a disadvantage, the complexity increases.

Event Source and CQRS patterns provide eventual consistency.

When the business transactions span through multiple microservices, ACID properties are not supported in such distributed systems. In remote communications, we accept the fact that the performed changes on the other end of the system are eventually consisted. Such patterns like Saga are not in the consideration of these patterns.

However, Saga pattern is helpful when administering a business flow and Orchestrator Saga can provide means of rollback scenarios by executing compensation operations.

Furthermore, the implementation of the CQRS pattern initially can start in the application layer. You can separate your interfaces in the code logic as follows:

public interface UserReadRepository {
      public User findById(Long id);      
      public List<User> getAll();
}
public interface UserWriteRepository {
      public void deleteById(Long id);      
      public void create(User user);
}

This is a simple and straightforward head-start, if you are decomposing a monolith or willing to convert your brand-new data intensive microservice into individual high-performing services. You’ll evaluate the logical separation of services in the upcoming comprehensive demo.

Demo Project

In the demo project, I’ll be simulating series of financial operations in the “ASLAN Bank” which exposes APIs that provide ability to make transactions and retrieve the account details.

Let’s look at the illustration here to evaluate what we will achieve in this demo

Let me introduce the participating parties here:

  • Customer DDoS service: For a good measure, this service sends random withdrawal and deposit requests of the existing accounts to the Gateway. It sort of bombards the backend with a higher ratio of writes, thus it can be a good way of evaluating the system under massive loads,
  • ASLAN Bank Gateway Service: As the name suggests, this service is a gateway whose sole purpose is to forward requests of :
    1. POST to the Account Service,
    2. GET to the Account History Service,
  • Account Service: The service takes up only the write requests via POST that are wrapped up with either deposits or withdrawals including account information and the designated amount:
    1. The service has its own customer account table which holds only the basic information of clients, but no history,
    2. On the other hand, this table provides consistency of the balance, which should not go below zero,
    3. Upon the completion of invariant checks, the account amount is updated in the database,
    4. Accordingly, the event is sourced to the Broker along with information of the current account balance, designated amount as well as the transaction type,
  • Account History Service: The Account History service is responsible of reads only in the system:
    1. Via REST, the service returns a response of client’s account history of Transaction Type and amount along with the current account balance,
    2. Asynchronously the account information is updated whenever an event is retrieved from the sourcing system,
  • Kafka Message Broker: The choice of Kafka broker is well suited with the purpose of being “Single Source of Truth” in the event sourcing terms. The participating services have the benefits of both exchanging messages asynchronously, and keeping the history of accounts’ states, in case of service failures.

Prerequisites and Source Code

The project stack is formed in Java and Spring, and you should have Java 8+ and Maven 3+ installed on your system. On top of the stack, Apache Kafka as a messaging broker which also requires Zookeeper. For the sake of the simplicity, I’ll be using dockerized Kafka and Zookeeper. For that reason, you will need Docker and Docker Composer installed in your system.

The whole home-brew projects are hosted in my Github. Just clone the repository and switch to the folder “CQRSSample” in which you will find all the necessary services.

Environment Setup, Running Projects and Resources

Kafka and Zookeeper

I stumbled upon this very useful Github repository and the author had done a great job for launching a Kafka Broker with no special setup. The project seems to be well maintained and provides ways to run single or multi Kafka Brokers. In addition, you can select to run an older version as well.

All the steps are explained in the Read-me, but I’ll briefly summarize the small steps for a single node launch:

  1. Clone the repository somewhere convenient in your file system,
  2. Open the “docker-compose-single-broker.yml” configuration file with your text editor,
  3. Add your computer’s ip address as a value to the “KAFKA_ADVERTISED_HOST_NAME” key,
  4. Open your terminal and switch to the cloned project folder,
  5. issue the command to run all:
sudo docker-compose -f docker-compose-single-broker.yml up

Running Projects

If you prefer command line, in your terminal open up the multiple projects “AslanBankGatewayService, AccountService and AccountHistoryService” and run the below command for each:

mvn spring-boot:run

Alternatively, if you want to evaluate the services under massive loads, then run the “CustomersDDoSService” after the above services start. As soon as the DDosService starts up, it will send requests to the Gateway service.

Resources

Here you can find the the resources to send requests of

getting the account history:

curl http://localhost:8080/accounts/3/history -H "Content-Type: application/json"

making a transaction:

curl -X POST http://localhost:8080/transactions/ -H "Content-Type: application/json" -d "{\"accountId\":\"1\",\"amount\":\"45.00\",\"type\":\"DEPOSIT\"}"

References

  1. Greg Young — A Decade of DDD, CQRS, Event Sourcing
  2. Greg Young – CQRS and Event Sourcing – Code on the Beach 2014
  3. An Introduction to CQRS and Event Sourcing Patterns – Mathew McLoughlin
  4. Building microservices with event sourcing and CQRS
  5. What do you mean by “Event-Driven”? by Martin Fowler
  6. CQRS by Martin Fowler
  7. Focusing on Events by Martin Fowler
  8. Event Collaboration by Martin Fowler
  9. Event Sourcing by Martin Fowler
  10. Eventually Consistent by Werner Vogels

Spring Cloud – Rabbit MQ Comprehensive Glance

Reading Time: 6 minutes

Introduction

In this blog post, I’ll dive into Spring Cloud Stream API that is underneath the Spring umbrella. The API is around for some time and has many of beneficial use cases. I felt the need to write this post, because as a typical nature, at work we were in diversion of achieving a stream line and integration among our microservices.

Sample Project

I have come to prepare a good demonstration projects one named “Automation” that will create events and the other is “Pki” that will consume all produced events. Upon successful validation, the event will be persisted in an in-memory database. Automation has a simple REST endpoint that will take a Certificate Order and proceed with the stream. Here is an illustration of the further achievement

Requirements

In order for running this project, you will need to have following setup:

  1. Maven 3,
  2. Java 8,
  3. Docker

Running a Rabbit MQ inside Docker:

docker run -d --hostname my-rabbit --name some-rabbit -p 15672:15672 -p 5672:5672 rabbitmq:3-management

Sending a Certification Request to the Automation API:

curl -X POST http://localhost:7000/api/certificates -H "Content-Type: application/json" -d "{\"commonName\":\"tugrulaslan.com\",\"algorithm\":\"sha256\"}" 

Running Tests

Both projects have their Unit and Integration tests. You can have a peek at them to find out how each module works.

Concept

Spring Cloud API allows application code to decouple from heavy burden tight integration-provider-specific code, as well settings and the details of the exchange systems. By utilizing the Spring Cloud Stream API, applications are agnostic to the details and just focused on communication. The API will ensure the queues, and topics in the messaging tool are created and the connection among the modules are achieved.

Application perform business code and transmit/emit the events using the inputs and outputs which we will have a look in the upcoming chapters. Here thedepicted figure from the project documentation demonstrates the integration layout:

SCSt with binder

Communication and Interfaces

As I stated earlier, in the Cloud API we are using just interfaces towards the communication. Since streaming diverts, I am going to conclude all the details in two different chapters on Producer and Consumer.

Furthermore, the API itself, provides a basic way to exchange messages over the default existing interfaces namely; “Sink”, “Source”, and “Processor”. However, I am taking my sample to a next level that will have its own custom interfaces to demonstrate more features.Eventually, the sample code will end up using a Topic Exchange communication with that each consumer will receive the type of the specific events that they like. You can see all the target samples and the Topic exchange type in the reference post[1]

Producer Side

On this side, we have the project Automation that is responsible of producing the events that are triggered via REST API. First of all let’s look at the Programming Side of the API and discover the interface called “CertificateEventStreamSource”

import org.springframework.cloud.stream.annotation.Output;
import org.springframework.messaging.MessageChannel;

public interface CertificateEventStreamSource {
    @Output("certificateOrderedChannel")
    MessageChannel certificateOrderedChannel();
}

Let me elaborate the concepts here:

  • @Output: The annotation defines a name for the communication channel as well as points out that the defined interface is a Producer. The given annotation name must match in the configuration file, that we will see soon,
  • MessageChannel: The interface provides ways to send the event to the MQ

For now, this is the most that your application needs to know in terms of the communication. Just a last step, we need to annotate the class that will use this interface shown in the class called “CertificateService”

@EnableBinding(CertificateEventStreamSource.class)

In addition, our next target is the configuration. The configuration steps are very simple. Generally, Spring Boot project provides Automatic Configuration out of the box for many projects, that helps you not to define certain default entries like rabbit local host string, port etc. You can see it in the documentation[2]

application.yaml

spring.cloud.stream.bindings.certificateOrderedChannel.destination=CREDS_EXCHANGE
spring.cloud.stream.rabbit.bindings.certificateOrderedChannel.producer.routing-key-expression='creds.certificate.ordered'

Let’s break the entries down for the explanation:

destination: It points out to the Exchange on the Broker. The name needs to match to the one defined in the Interface’s @Output annotation, then the framework will easily bind this interface. The corresponding value is the name of the Exchange that will appear in the Rabbit MQ,

routing-key-expression: The second and the last entry is for the key for the event. With the help of this entry, Rabbit easily binds this event to this key, later the designated consumers will pull the events by this key.

If you like to know more about the Producer bindings that are available, consult the API documentation[3] . Down to this point, we have easily configured our Automation API to be fully working with the Event Stream. There is no more needed configuration required, and now we can focus on our consumer.

Consumer Side

For the consumption of the events, we will be observing the “PKI ” component which will emit the events, run some validation and then persist the event in database. The Elaboration of this component will vary compared to the Producer Automation, because I have enhanced this consumer with some beneficial use cases that i will explain hereby.

Before diving into the details, I’ll show the Interface then it will follow the configuration and so on. Now let’s look at “CertificateEventStreamSource”

import org.springframework.cloud.stream.annotation.Input;
import org.springframework.messaging.SubscribableChannel;

public interface CertificateEventStreamSource {

    String CERTIFICATE_ORDERED_CHANNEL = "certificateOrderedSubscribableChannel";
    @Input(CERTIFICATE_ORDERED_CHANNEL)
    SubscribableChannel certificateOrdered();
}

Let’s look into what we have here and give some light:

  • @Input: The annotation defines a name for the communication channel as well as points out that the defined interface is a Consumer. The given name in the annotation must correspond to the binding in the property,
  • MessageChannel: The interface provides ways to emit the event from the MQ

These two are semantically same to the Producer’s interface. Additionally, we need to annotate the class “CertificateOrderedListener” with:

@EnableBinding(CertificateEventStreamSource.class)

So far we let our application to programmatically integrate with the API. Now we’ll move onto the configuration which will give the Cloud API the guidance on how to manage our stream and some failure scenarios:

application.yaml

spring.cloud.stream.bindings.certificateOrderedSubscribableChannel.destination=CREDS_EXCHANGE

spring.cloud.stream.rabbit.bindings.certificateOrderedSubscribableChannel.consumer.binding-routing-key=creds.certificate.ordered

spring.cloud.stream.rabbit.bindings.certificateOrderedSubscribableChannel.consumer.bind-queue=true

spring.cloud.stream.bindings.certificateOrderedSubscribableChannel.group=certificateOrderedQueue

spring.cloud.stream.rabbit.bindings.certificateOrderedSubscribableChannel.consumer.auto-bind-dlq=true

spring.cloud.stream.rabbit.bindings.certificateOrderedSubscribableChannel.consumer.republishToDlq=true

spring.cloud.stream.bindings.certificateOrderedSubscribableChannel.consumer.max-attempts=2

Let’s dive into the detail;, first of all notice the repetitive “certificateOrderedSubscribableChannel” definitions after each “bindings” entry, that points out to the input value in the @Input annotation. These two entries must match, otherwise we will never be able to bind the consumer properly. Now let’s move onto the each definition:

destination: It defines the exchange that we will connect to,

binding-routing-key: this key is defined bidirectionally, as we saw in the Producer, it indicates to the Event mapping. The more details are given in the Producer’s section,

bind-queue: sets queues for the given routing key,

group: grouping is a very vital feature, and deserves a bit longer explanation. By setting this value up, you define a strategy of each designated consumer receives one message at a time, it is so-called “robin round” fashion. If the grouping is not defined, then each consumer will get every message. I believe the explanation might confuse you, so let me visualize it

More information and available settings are in the documentation[4]

auto-bind-dlq: It sets up a DLQ and configures the original queue to forward rejections into this queue. In the following chapters, I’ll give an example from the application code,

republishToDlq: The binder forwards the message to the DLQ with the exception information in the header,

max-attempts: It sets maximum attempts for the application to receive the message upon errors,

You can find more about the available use cases for the Exceptions and the retry mechanisms in the documentation[5]

Exception Handling

The Spring Cloud API allows you to only define a strategy for the failure cases in your application, the rest of the details will be taken care. When exceptions are thrown by your applications, depending on your configuration, the API will wrap it up and handle the situation. In the above configuration, I have defined some strategies with the Retry Attempt and the Dead Letter Queue.

In the PKI application, I have a minor demonstration case that will achieve the mentioned following case with the below Certification case:

curl -X POST http://localhost:7000/api/certificates -H "Content-Type: application/json" -d "{\"commonName\":\"tugrulaslan.com\",\"algorithm\":\"sha1\"}" 

After PKI receives such event, it will retry twice and move it to the Dead Letter Queue “CREDS_EXCHANGE.certificateOrderedQueue.dlq

Benefits

I have populated some of the benefits, that I personally believe the Cloud API will bring into your projects;

  • Applications will be kept apart from heavy settings,
  • Abstraction from the inner workings of the exchange communication and specific API implementations,
  • Yet another abstraction benefit, while changing the Message Broker, you can easily change one broker to another, and your application will not be aware, list of binders[6],
  • Easy management on the exception cases land, retrieval strategies,
  • Spring Cloud Sleuth API is enabled by default, you can easily trace the messages,
  • Since its a Spring project other native projects can be easily hooked up,
  • Automatic marshaling and unmarshaling in the event exchange right out of the box without any prior configuration

References

  1. https://www.cloudamqp.com/blog/2015-09-03-part4-rabbitmq-for-beginners-exchanges-routing-keys-bindings.html
  2. https://docs.spring.io/spring-boot/docs/current/reference/html/appendix-auto-configuration-classes.html
  3. https://docs.spring.io/spring-cloud-stream/docs/current/reference/html/_configuration_options.html#_producer_properties
  4. https://docs.spring.io/spring-cloud-stream/docs/current/reference/html/_configuration_options.html#_consumer_properties
  5. https://docs.spring.io/spring-cloud-stream/docs/current/reference/htmlsingle/#spring-cloud-stream-overview-error-handling
  6. https://cloud.spring.io/spring-cloud-stream/spring-cloud-stream.html#_binder_implementations

Exception Testing

Reading Time: 2 minutes

Introduction

The post will be briefly focusing on testing exceptions in Java. We will be looking at a traditional way of handling Exceptions in Junit as well as benefit from the AssertJ test library.

The case study

In this section we will see a small service that is responsible of throwing a custom checked exception only 🙂 then we will figure out how we can test the exception.

public class ProcessService {
    public void runSomeProcess(String processName) {
        String message = String.format("The process '%s' was interrupted because of unknown error", processName);
        throw new ProcessInterruptedException(message);
    }

    public static class ProcessInterruptedException extends RuntimeException {
        public ProcessInterruptedException(String message) {
            super(message);
        }
    }
}

Traditional way of testing

Generally speaking, whenever I had to test a method that resulted in an exceptional situation, I would always favored of a traditional method, assuming the expected exception would be thrown by the method and carried on.

@Test(expected = ProcessService.ProcessInterruptedException.class)
    public void shouldRunProcessAndCatchTheExceptionUsingJunit() {
        //given
        String processName = "NvidiaBackgroundScanner";

        //when - then
        processService.runSomeProcess(processName);
    }

The good old days, that just allowed me to define the expected exception in the annotation, accordingly the method invocation raises the exception.

What if we need to assert multiple things in the exception?

That’s a splendid question, isn’t it? What if we want to have a couple of checks, first and most important; the exception message. Then later the type of the exception. In this matter, the AssertJ test library comes to rescue! In the below code snippet, we will be fully covering our essential needs;

    @Test
    public void shouldRunProcessAndCatchTheExceptionUsingAssetJ() {
        //given
        String processName = "NvidiaBackgroundScanner";

        //when
        Throwable throwable = catchThrowable(() -> processService.runSomeProcess(processName));

        //then
        String expectedException = String.format("The process '%s' was interrupted because of unknown error", processName);
        assertThat(throwable).isNotNull();
        assertThat(throwable)
                .isInstanceOf(ProcessService.ProcessInterruptedException.class)
                .hasMessage(expectedException);
    }

As you can see, the AssetJ library catches the Exception and gives us ways to check the type of the Exception as well as the content! It is very simple and intuitive to test with AssertJ. That’s all folks. Last of all, here are the imports you need in the test

 import static org.assertj.core.api.Assertions.assertThat;
import static org.assertj.core.api.Assertions.catchThrowable;

IBM AMQP

Reading Time: 3 minutes

Introduction

This blog post will concludes information about the IBM MQ product with the AMQP protocol capabilities as well as an intuitive sample Producer and Consumer applications that I have coded. Furthermore, I have deliberately gathered some definitions from the IBM’s main page and enriched the content with my reflections.

Prerequisites

In order to have a smooth and easy setup, I’ll use an official IBM MQ Light docker image which brings good benefits of:

  • A nice and intuitive UI that shows summary of Clients and Messages,
  • getting rid of the complex installation and configuration of the IBM MQ with AMQP support. have been there done that before.

On top of that the project is prepared using Maven, assuming that you have this tool already.

IBM MQ Light Docker Image

I’ll be using docker image straight ahead from the repository. A well-documented explanation can be found @ GitHub page

In my local setup docker is installed in a virtual machine thus, in the following chapters, you will see my connection string as “virtualcentos“. Anyway let’s gather the image and run the container:

sudo docker run \
  --env LICENSE=accept \
  --volume /var/example:/var/mqlight \
  --publish 5672:5672 \
  --publish 9180:9180 \
  --detach \
  ibmcom/mqlight:1.0

Docker will pull the latest image and open the ports of 5672 for Application Connection and 9180 for the Web Application. Once the container is up and running, hit the address http://localhost:9180 and see the container works as expected.

Source Code

I have written a sample application that cover the Event Producing and Consuming scenarios. The full code is available @ my GitHub Repository

After checking out the code, you can use these queries to test the system

curl -X POST http://localhost:8080/api/certificate -H "Content-Type: application/json" -d "{\"name\":\"myCertificate\",\"fingerprint\":\"34:AB:45:3e:6y:XX\"}" 

curl -X POST http://localhost:8080/api/credential -H "Content-Type: application/json" -d "{\"username\":\"root\",\"password\":\"p@$$w0rd\"}" 

Concepts

There are four concepts in the ecosystem which I’ll briefly summarize. It is important to apprehend these basics.

Message

Message is the simplest form of the data that is being transmitted across the network. A message can simply consist of String, Bytes or JSON. For JSON formats, the API client internally utilizes GSON.

Topic

Topic is the main entry point for publishing applications, messages initially arrive in a topic. Besides that, topic is also responsible of routing the messages to the right destinations. Topic can contain multiple “/” that will make sub topics. For proper wild cards, consult the documentation

Destination

Destinations hold the messages and are associated with topics. A destination can have one or multiple topics associated with it. If a message is sent to a topic without a destination, it will not be delivered to any application. In addition, Destinations are always created with a time-to-live value, which can be automatically set or programmatically set. Once the TTL is exceeded, the destination will be removed automatically.

Sharing

Sharing allows applications to exclusively set to share the same destination. Applications sharing the same destination will receive the messages in the Round Robin fashion which is that messages are shared among the applications

Message quality of service (QOS)

IBM MQ product takes the matter of assurance of the connections and messages in varieties:

  • Message quality of service (QOS)
  • Subscription time-to-live
  • Subscriber auto-confirm
  • Message persistence

The QOS topic is very vital for designing messaging durability and performance. You can mix up different scenarios with different settings. For more detailed answers consult the IBM documentation.

IBM MQ Light Non-blocking API

The Non-blocking API is provided by the IBM to be able to connect to the product. Here is the link to browse the content of the Java API. In the sample project, you can find the maven repository for the Java client, for other clients, you can browse to the IBM Page. In this section I am more interested in drawing your attention

Useful Resources

Russian Roulette in Testing

Reading Time: 4 minutes

Description

Russian Roulette, the dangerous Russian game that we are all familiar with from usually movies. The Urban Dictionary gives such small description; “A game where a revolver with one bullet is placed and spun. Then you take turns putting the gun to your head and pull the trigger. If it’s empty, you pass it on until someone dies or becomes extremely injured.

By now we have filled our heads up with this literature, let’s narrow the subject down on how I relate this blog post to the Russian Roulette. Well, the good news is that I’ll not suggest to play it of course, but it is a good metaphor of a situation that I have experienced and the word came to my mind. Here follows the background; recently I was reviewing my Colleagues’ PRs and I noticed such code snippets in the IT test that they were getting objects from positional access in list data structures and assuming that the objects in the zero, first and second positions allegedly had expected values.

After I read those test cases a few times, the notion of the Russian Roulette popped up in my head. Eventually I came up with proper solutions towards the prevention of such messy assertions. I’ll describe two situations along with my suggested solutions. Please assume that demonstrations simple and non complex to outline more the cause.

Last of all, to demonstrate my samples, I am using the mirror servers of Fedora Linux Distribution @ https://mirrors.fedoraproject.org/mirrorlist?repo=epel-7&arch=x86_64

Situation: Asserting a Single value in a Collection

While coding a method that gathers or filters a list from the persistence unit, we tend to iterate through the list, grab and assume that the Nth element will be our expected value. In essence, it may lead to a very naive behavior, here is a code snippet

@Test
    public void shouldFindPolishHostFromSecureEuropeanHostsRussianRoulette() {
        //given
        final String expectedAddress = "https://ftp.icm.edu.pl/pub/Linux/fedora/linux/epel/7/x86_64/";

        //when
        final List<Host> hosts = repository.findByAddressContaining("http");

        //then
        assertThat(hosts).isNotNull();
        assertThat(hosts.get(2).getAddress()).isEqualTo(expectedAddress);
    }

As you observe the code and apprehend the logic quickly, we are getting a list of servers that are non-secure only http protocol. Here we have a catch which is, we are very very sure that the second element in the list will be the expected one. In fact it can be, however here can possible things may occur in the future:

  • If you update the memory database package or switch to another one, the new alternative may return the values in a different order because of its internal design. Thus, your test will obviously fail,
  • When the next developer adds a new entry before or after your test data, such tests will most likely fail.

Solution: Implement ways to look for the Exact Figure

The title of the solution reveals the whole deal right. Which one is more feasible way to ensure integrity, assumption or precision? Obviously precision is the keyword here. Here is the code snippet:

 @Test
    public void shouldFindPolishHostFromSecureEuropeanHostsNonRussianRoulette() {
        //given
        final String expectedAddress = "https://ftp.icm.edu.pl/pub/Linux/fedora/linux/epel/7/x86_64/";

        //when
        final Host host = repository.findByAddress(expectedAddress);

        //then
        assertThat(host).isNotNull();
        assertThat(host.getAddress()).isEqualTo(expectedAddress);
    }

My suggestion is when we are exactly looking for a single value, we must be in true certainty, and look for an exact figure like in the above example. Implement methods that will query for certain values or search the persistence unit.

Situation: Asserting a Collection of values with hard coded values

Throughout my software development career, I always came across with such situations in which I have a list of expected values that must be ensured the computer had produced same or similar output. This situation can come from different varieties and sorts. Instead of discussing possibilities at the table, let’s look at this code snippet and begin a rare case:

 @Test
    public void shouldFindAllSecureEuropeanHostsRussianRoulette() {
        //given-when
        final List<Host> hosts = repository.findByAddressContaining("https");

        //then
        assertThat(hosts).isNotNull();
        assertThat(hosts.get(0).getAddress()).isEqualTo("https://mirror.karneval.cz/pub/linux/fedora/epel/7/x86_64/");
        assertThat(hosts.get(1).getAddress()).isEqualTo("https://ftp.icm.edu.pl/pub/Linux/fedora/linux/epel/7/x86_64/");
        assertThat(hosts.get(2).getAddress()).isEqualTo("https://ftp-stud.hs-esslingen.de/pub/epel/7/x86_64/");
    }

In this example, we are hard core assuming that the given Nth element will have such outcome as we type in. In this case I have the same failure point of predictions as I declared in the above situation, nothing more to address here.

Solution: Compare Collections with Collections

As we get the spoiler from the title wouldn’t it be nice to purely compare collections with collections? I must say, I really admire the way that I approached this situation and the solution as well as the testing frameworks support into it. Let’s apprehend my solution here

@Test
    public void shouldFindAllSecureEuropeanHostsNonRussianRoulette() {
        //given
        ArrayList<String> expectedHosts = new ArrayList<>(Arrays.asList("https://mirror.karneval.cz/pub/linux/fedora/epel/7/x86_64/",
                "https://ftp.icm.edu.pl/pub/Linux/fedora/linux/epel/7/x86_64/",
                "https://ftp-stud.hs-esslingen.de/pub/epel/7/x86_64/"));
        //when
        final List<Host> hosts = repository.findByAddressContaining("https");

        //then
        final List<String> hostNames = hosts.stream().map(Host::getAddress).collect(Collectors.toList());
        assertThat(hosts).isNotNull();
        assertThat(hostNames).isNotNull();
        assertThat(hostNames).containsAnyElementsOf(expectedHosts);
    }

The code is very simple to understand its purpose, in this solution I really don’t need to know in which position I get what outcome. The importance here is to ensure the computer had prepared the output that has the desired output, it as simple as that. Furthermore, the testing framework provides such method that ensures the output meets the expectation

Bonus Solution: Extracted Properties of Collections

I asked my colleague at work for a favor of evaluation my work and I was really curious about his opinion and thoughts. He had pointed me some features from the AssertJ library that I find very useful and the feature Assertions on Iterables can be quite comprehensive as well. In basic terms I also added a new test to simply cover the concept up

@Test
    public void shouldFindAllSecureEuropeanHostsNonRussianRouletteNonStream() {
        //given
        ArrayList<String> expectedHosts = new ArrayList<>(Arrays.asList("https://mirror.karneval.cz/pub/linux/fedora/epel/7/x86_64/",
                "https://ftp.icm.edu.pl/pub/Linux/fedora/linux/epel/7/x86_64/",
                "https://ftp-stud.hs-esslingen.de/pub/epel/7/x86_64/"));
        //when
        final List<Host> hosts = repository.findByAddressContaining("https");

        //then
        assertThat(hosts).isNotNull();
        assertThat(hosts).extracting("address")
                .containsAnyElementsOf(expectedHosts);
    }

In my sample I only demonstrated the “extracting” that eliminated my stream operation. On the other hand I’d like to share some links that will demonstrate way more advanced test cases than my humble test.

Conclusion

Assumptions can be evil in programming. Unless you work on sorted data structures that will guarantee the order, still assuming the positional access can be malicious, my sincere suggestion is that we shall always strive for best practices to overcome such bad habits, thus we won’t waste time on fixing unnecessary test cases. You can find the full solution at my Github repository @ https://github.com/tugrulaslan/BlogCodeSnippets/tree/master/RussianRoulette

Featured Image Courtesy telegraph.co.uk – https://i.telegraph.co.uk/multimedia/archive/02500/Russian_roulette_2500016k.jpg

Heap

Reading Time: 3 minutes

Description

Heap is a tree-based data structure with some special attributes embedded into it.  Heap Data Structure has such characteristics;

  • It is a form of Complete Binary Tree,
  • It has a root node and its key is compared to the children nodes and the comparison is constantly carried out whenever a new node is aimed to be inserted.

In addition to the characteristics, the Heap can be internally implemented using Array or Priority Queues. The common practice is usually done with the Priority Queue.

Types of Heap

Heap Data Structure has mainly two types. These types correspond to how the order of the Heap is placed. Let’s have a look at the types in details;

Min Heap

Min Heap Tree

The values of children are greater than or equal to the value of their parents; which indicates that parent nodes tend to have lower values than the children nodes.

Max Heap


Max Heap Tree

The values of children are less than or equal to the value of their parents; which indicates that parent nodes tend to have greater values than the children nodes.

Complexity of Heap

The Time and Space complexities are summed up into a common table given as:

Usage Areas of Heap

Heap Data Structure makes great use in the following areas:

  1. Heap Sort: Very efficient sorting algorithm whose time complexities are all the same O (n log n),
  2. Priority Queues: The Priority version of Queue benefits efficiently from the Heap Data structure that provides insertion, deletion extract maximum and decrease key operations in the O (log n) time complexity

Terminology

Heapify

Heapifying is a recursive process of turning the Heap to the Max Heap type, our algorithm will go towards the non-leaf nodes and look for the largest node in the tree and in all possibilities, raise the greater values above top contentiously.

Max Heap

Parent Node

Left node of the Tree the presentation in the array: (index -1) / 2;

Left Node

Left node of the Tree the presentation in the array: 2 * index + 1;

Right Node

Right node of the Tree the presentation in the array: 2 * index + 2;

Heap Sort

Heap sort is a very efficient algorithm that performs very well in sorting arrays.

Time Complexity

All the cases are O(n log n)

Space Complexity

O(1)

Heap Sort Code

You can checkout my GitHub repository @ https://github.com/tugrulaslan/BlogCodeSnippets/tree/master/SortingAlgorithms

Radix Sort

Reading Time: 2 minutes

Definition

The optimal algorithm for the numbers range from 1 to n2. Radix Sort algorithm favors of Counting Sort internally to sort the array. The given keys consists of the digits in each element in the array.
It starts from the Least Significant Digit which is the left digit, then goes to the Most Significant Digit which means to the right.

Each digit goes to a corresponding numbered buckets. After the buckets is filled with the elements in the array, the elements are sorted once again according to the bin position. Let’s us see an example illustration to better apprehend the logic, we will sort the numbers “551, 12, 346, 311”;

Visual Array Status of Sorting Phases by Radix Sort
Statuses of Buckets in each pass

Now we have more or less how the Radix Sort works out internally. There is one gap I’d like to point what happens to 12 which has two digits compared to the others that have three digits. Well in this situation such numbers are appended with leading 0s and they always sit on the bucket zero.
n numbers consisting of k digits

Complexity

Time Complexity

n: number of elements
k: the range of the keys for each number. We will also repeat the operation for this amount.
All of the Time Complexities of Radix Sort is always O(n*k)

Space Complexity

Space complexity of Radix Sort is O(n+k)

Code

You can checkout my GitHub repository @ https://github.com/tugrulaslan/BlogCodeSnippets/tree/master/SortingAlgorithms

Linked List

Reading Time: 3 minutes

Definition

Linked List a linear data structure like Arrays, but its internal is completely different compared to other data structures. Let’s first have a visual look how the data structure looks like

The internal of Linked List

As you can see in the above image Linked List maintains a list of objects linked to them, as also the name suggests the same approach. To conclude some characteristics in Linked List;

  1. Each node has a next pointer/reference to the next or previous object then we are iterating through these references,
  2. The last nodes are usually null,
  3. However, the next/previous references  do not refer to null in certain times(in the following chapters, I’ll demonstrate the reasons)

Linked list is used in some data structures like Queues and Stacks internally. Check them out separately and see how Linked List fits in their requirements.

Advantages

  1. No size limitation compared to the arrays,
  2. It is not costly to insert and remove in between nodes. As such operation is very costly in arrays with larger data sets, because all the elements will be shifted,

Disadvantages

  1. Random data access is not possible, the whole data structure must be traversed to access the designated object,
  2. Storage to the next and previous nodes takes up some memory space.

Time and Space Complexity

image courtesy of  bigocheatsheet.com

Types of Linked Lists

Linked List has some varieties of implementations that often confuse us. I’ll show all the implementations in sub sections with visuals, descriptions and codes that will let you interact more and apprehend the slightest differences better.

Singly Linked List

In a Singly Linked List the traversal is unidirectional, each node refers to the next node in the link, and there is no reference to previous nodes.  The last node’s next refers to Null.

See the Implementation “SinglyLinkedList.java” and the Unit Test “SinglyLinkedListUnitTest.java” to apprehend all the operations and internals of the Singly Linked List.

Doubly Linked List

Doubly Linked List maintains a bidirectional path, thus it contains next and previous links, where next refers to the next node, and previous refers to the previous node. This maintenance comes with an extra overhead. Last of all, first node’s previous and last node’s next are Null.

See the Implementation “DoublyLinkedList.java” and the Unit Test “DoublyLinkedListUnitTest.java” to apprehend all the operations and internals of the Doubly Linked List.

Circular Linked List

Circular Linked List is the last variation of the implementation. I would like to call the Circular Linked List as the spiced up version of the Singly and Doubly Linked List implementations in my own terms. In addition, as the name suggests, the basic internal is that the Linked List is being circular. Now time to clear out the 3rd element in the definition and explain
two distinct characteristics in the Circular Linked List;

  1. The head and the tail of the data structure don’t point to NULL, but head’s previous reference, points to tail and tail’s next reference points to the head,
  2. Circular List can be made using Singly or Doubly Linked List implementations.
Singly Circular Linked List

Doubly Circular Linked List

Operations

Operation description goes here

  • isEmpty: Checks whether the Linked List is empty,
  • insertFirst: Inserts the given Node to the head of Linked List,
  • insertAfter: Inserts the given Node after the existing Node in Linked List, 
  • insertLast: Inserts the given Node at the end of Linked List, 
  • deleteFirst: Deletes the Node in the head of Linked List,
  • deleteNode: Deletes the given Node in Linked List,
  • deleteLast: Deletes the given Node from the end of Linked List,
  • containsIterativeSearch: Iteratively searches Linked List,
  • containsRecursiveSearch: Recursively searches Linked List.

Code

The code can be also found in my Github Repository @ https://github.com/tugrulaslan/BlogCodeSnippets/tree/master/DataStructures To see how the code works, you can also check its Unit Test.

Queue

Reading Time: 2 minutes

Definition

Queue is a linear data structure that maintains FIFO setting; First In First Out. Queue comes in two possible internal implementations; Singly Linked List or Array. When think of FIFO, we can assume a group of people waiting queued up for buying a cinema ticket. The first person in the queue gets to buy the ticket and it follows the rest of the people in the queue.

Sample Queue Usage Areas

  • Hardware Scheduling; CPUs and Disks are properly scheduled in the concurrent environments,
  • Asynchronous communication makes a great use case while two processes wait for each other to respond in sequence

Implementation

In the internally Queue can implement Singly Linked List or Array. Eventually the Time Complexity of the operations will slightly differ. In this stackoverflow Article, there are more insights and argument about the implementations. In my own implementation I preferred to use the Singly Linked List implementation.

Complexity of Queue

Since the internals of implementations differ for each variationSingly Linked List and Array, the operations can differ. The given table is suitable for Singly Linked List implementation;


image courtesy of  bigocheatsheet.com

Operations on Queue


Queue has three vital operations that we need to cover up. In some other languages and Stack implementations definitely have other additional operations like Java’s Queue implementation. However, these below operations are fundamental properties of the Queue data structure:

  • enqueue:  inserts the element to the head of the stack,
  • denqueue: removes the element from the head  and returns
    denqueued the value,
  • peek: returns the head data but doesn’t delete it, takes a peek at it.

Code

The code can be also found in my Github Repository @ https://github.com/tugrulaslan/BlogCodeSnippets/tree/master/DataStructures to see how the code works, you can also check its Unit Test.

Stack

Reading Time: 2 minutes

Definition

Stack is a very usable data structure that brings the LIFO setting into the game. Let’s elaborate LIFO; LIFO is the abbreviation of Last-In-First-Out. What does LIFO really mean for us? The intentions may vary and one of them is to have a pile of things stacked down to the bottom and take one from the top. Let’s apprehend the below illustration:

Yes, we understood it well. In the LIFO setting we insert towards bottom and and take from the top.

Sample Stack Usage Areas

  • In text editors “Undo” operations while we intend to revert an unwanted entry,
  • Browsers’ back buttons; make use of a similar way to be able to navigate to the earlier pages,
  • Recursive methods also utilize stack very well; starting from the first call till the last, all of the method executions are added on top of each other.

Implementation

In the internally Stack can implement Singly Linked List or Array. Eventually the Time Complexity of the operations will slightly differ. In this stackoverflow Article, there are more insights and argument about the implementations. In my own implementation I preferred to use the Singly Linked List implementation.

Complexity of Stack

Since the internals of implementations differ for each variation; Singly Linked List and Array, the operations can differ. The given table is suitable for Singly Linked List implementation;

image courtesy of  bigocheatsheet.com

Operations on Stack

Stack has three vital operations that we need to cover up. In some other languages and Stack implementations definitely have other additional operations like Java’s Stack implementation. However, these below operations are fundamental properties of the Stack data structure:

  • push: pushes the element on top of stack
  • pop: pops the element from the top and returns popped the value
  • peek: returns the head data but doesn’t delete it, takes a peek at it.

Code

The code can be also found in my Github Repository @ https://github.com/tugrulaslan/BlogCodeSnippets/tree/master/DataStructures to see how the code works, you can also check its Unit Test.