Knock IT Out !

Thursday 3 February 2022

Kubernetes Secrets?

Kubernetes Secrets is not actually a secret ;)

Sometimes, Kubernetes Cluster needs access to sensitive information. To make the secret available for the containerized application, #Kubernetes has a dedicated API resource object called Secret. This secret can be accessed via volume mount or environment variable.

These secret objects are usually stored in the distributed state database called etcd in Kubernetes. As it is encoded only with Base64, the access to etcd will expose the secrets stored in the etcd. We all know the fact that Base64 is not at all secure as anyone can decode it just like that.

So, How to enable restrictions over our secret file in Kubernetes?

Kubernetes External Secrets let us rely on third-party systems like HashiCorp Vault, AWS Secrets Manager, Azure Key Vault, etc to enable REAL SECRETS. #HashiCorp Vault is something of my interest here. So let's see more about HashiCorp Vault here.

It has two important components like Authentication Methods and Secrets Engines. Kubernetes External Secrets can authenticate Kubernetes service accounts with the help of Vault. This is created and used independently of the vendors (May be #AWS, #Azure , #GCP..)

===== Interesting features of #HashiCorpVault =====

Centralized Storage Of All Secrets -> Imagine having 100+ Services, Vault is your savior.

RBAC (Role-Based Access Control) -> Enable Read/ Write permission for the users over secrets?

Audits -> Who created? How many times are secrets accessed? Etc

Encrypted Secrets -> No more plain texts or bad encryptions

Dynamic Secrets -> Create dynamic secrets based on the requirement and calculated Time To Live (TTL) so possibility of less breach.

Password Rotation -> You wanna rotate secrets every 30 days? Simple with Vault.

Encryption As Service -> Flexible APIs for encryption

High Availability ->Vault supports multi-server mode for HA

When I say, What an awesome product it is! Experts warn me that the Complex Configuration of Hashicorp #Vault is the real trap.

Maybe, Its time to get some real hands on. Will make a post some other day about my experience.

Does Hashicorp Vault Configuration is that complex? Are you someone who deals with this configuration conundrum in your everyday job? Happy to know some of your interesting critical configuration/troubleshooting challenges/experiences in the comments below.

Wednesday 20 January 2021

Infrastructure as Code - Deployment of customizable and scalable webserver in Azure

Before the advent of DevOps automation, deployment of an application in the server is a complex process that requires server setup, network configuration, route table creation, software, and DB configuration. It requires a lot of manual effort and at the same time, prone to human errors. The manual process may result in scalability, availability, and consistency issues. The fast-paced improvements happening in the DevOp world present a solution to this problem with the help of potential automation. Infrastructure automation has become crucial nowadays, as it becomes common that the application deployment happens multiple times per day with different configuration requirements.

Generally, infrastructure management includes key terms like networks, virtual machines, and load balancers. Infrastructure-as-code (IaC) is the key DevOp practice used for defining, deploying, updating, and destroying the infrastructure. IaC ensures that the target environment is deployed with the same configuration irrespective of the environment’s starting state, thus solving the problem of environment drift in the release pipeline.

IaC has the environment description and version configurations in precisely documented code formats such as JSON and enables a declarative configuration model that can be relied on for quick configuration and consistent deployments avoiding any manual errors during the deployment. Thus, the team can get the benefit of creating a stable, rapidly reliable, and scalable infrastructure with ease.

As the system is configurable, the infrastructure maintenance like addition/removal of servers, software updates, software reconfigurations can be done with minimal changes to the configuration files alone.

The Infrastructure as code can be achieved with the help of several tools. Infrastructure provisioning and management can be done with the help of tools like Terraform, Red Hat Ansible, Chef, Puppet, Salt Stack, and AWS CloudFormation.Terraform helps in initial installation and configuration. To configure and deploy the application, the tools like Chef, puppet, and ansible can be used.

To explore Infrastructure-as-code with a simple project, I have created a scalable and customizable webserver using Azure, terraform, and packer.

I have created a custom policy in Azure that restricts the usage of resources without tags. I have assigned the policy to my resource group. I have specified the resource group in my server template and created the image using the free and open-source server templating software called Packer. This tool helps in creating images for the virtual machines thus helps in launching completely provisioned and configured machines in seconds. This is cross-platform compatible and has the capability to create multiple images in parallel. In the packer template, I have used builders, packer version, provisioners, and variables. Builders are basically an array of objects that defines the creation of images. Provisioners is an array that is used to run the scripts or commands over the image. Variables are the key-value pairs that define the user variables in the server template.

After building the image with the required configuration using the packer, I have used the provisioning tool Terraform to create the infrastructure based on the provided configuration. Terraform uses the domain-specific language called HashiCorp Language (HCL) to interact with the cloud providers. The biggest advantage of terraform is, it is cross-functional thus supports multiple cloud providers. Terraform allows the developer to create, update and destroy the infrastructure with ease. I have used the same resource group that I have used when creating the packer image here. The created packer image is used in the terraform configuration.

Terraform configuration includes resource group, virtual network, network security group, network interface, public IP, and load balancer. I have used a configurable file called variable.ts where you can configure the number of VMs to deploy, tags to include, name of the resource group, location preferences, etc.

Terraform can be initialized with the command ``terraform init`. We can see the plan by using the command ``terraform plan -out solution. plan`` . After the confirmation, we can apply the changes and deploy using the command ``terraform apply``.

More details about the project can be found at https://github.com/arunprakashpj/Azure-Infrastructure-Operations-Project

Friday 7 August 2020

Exploring Java 8 Features

Many companies rely on Java 7. But You know When Java 8 was released? It's almost 6 years back !! Yeah, Java 8 was released by March 18, 2014! Still not adopted by many! Let's explore one Java 8 Feature every day in our upcoming posts!

Java 8 Features

Functional Interfaces and Lambda Expressions
Java Stream API for Bulk Data Operations on Collections
forEach() method in Iterable interface
default and static methods in Interfaces
Java Time API
Collection API improvements
Concurrency API improvements
Java IO improvements
Miscellaneous Core API improvements

Tuesday 9 June 2020

Self Sovereign Identity

Every day we are seeing a lot of identity breaches. Starting from Facebook data leak to Aadhar information leak in India, the identity of people come under scanners. We all deserver to own our identity, control the usage without involving third parties. In a physical world, we all have a unique set of information to prove we are the citizen of the particular country, we know driving, we are eligible to vote via documents like passport, driving license, Voter ID, etc. We own the identity proofs and we can keep them safe but when it comes to the digital world, we are dealing with a different scenario.

Current Digital Identity has two major problems. First, we don’t own the identity. We identify ourselves with the username/passwords or by logging into different systems using the SSO Authentication from third party organizations like Google, Facebook, etc. Thus, our identity is not owned by us. We don’t control the fact, how our identity is used. The next big problem is oversharing of information. If you are going to vote, we are supposed to prove that we are 18+. But the voter ID reveals other unnecessary information like date of birth, address, etc. This problem is witnessed in both physical and digital identities.

Self Sovereign Identity came as a one stop solution to solve these problems. It combines attributes from different credentials and presents them as a single proof. It relies on Zero Knowledge Proof. Thus, the proof is just going to reveal Yes/No answers. If the question is, are you eligible to vote, the proof will give only Yes/No as the answer. The identity proof is presented in a way that, the verifier can verify the authenticity of the credentials like the credential issuer, its uniqueness, integrity and can ensure that its jot tampered or revoked without contacting the issuer.

For Example, if you want to vote, the issuer(government) will put the public key in the ledger store and issue the unique token to you. When you reach the voting booth, the verifier can verify if that’s you, just by checking the data in the ledger store. This ledger store is not a centralized authority. It is not run by any single organization. We call this ledger store as Sovereign Ledger which is tamper resistance and ordered chronologically.

The relationship between voter booth and you are made only once. This is unique. Again, consider a case, if you go to the bank, you will make another unique relationship by showing them the possession of the credentials. The connection setup and credential exchange happens off-ledger, privately, without involving the third parties. Finally, you will be provided with a digital token by the bank after authorization. After getting the credential and the relationship, no requirement to use a username/password. No login and nothing. Just by proving the possession of the credentials and the connection you setup, you are going to say, its me, and here is the digital proof.

By establishing a peer to peer connection, we are safe from any kind of Man in the Middle attacks. To Make it work, we need some open protocols and standards. Several Organizations around the world came forward to maintain the standard ledger abiding by certain principles and rules to ensure that, the identity control will be with people themselves. This factor separates the sovereign from bitcoins and Ethereum. Here, Hyperledger Community comes to the picture. Stay tunes for next write ups!

Tuesday 2 June 2020

Exploring Hazlecast Jet Runner

Apache Beam is an open-source, unified model for defining both batch and streaming data-parallel processing pipelines. Using one of the open-source Beam SDKs, users can build a program that defines the pipeline. The Beam Pipeline Runners translate the data processing pipeline that user-defined with his Beam program into the API compatible with the distributed processing back-end of his choice.

A Beam Runner runs a Beam pipeline on a specific (often distributed) data processing system. Available runners are listed below:

· DirectRunner: Runs locally on your machine

· Apex Runner: Runs on Apache Apex.

· FlinkRunner: Runs on Apache Flink.

· SparkRunner: Runs on Apache Spark.

· DataflowRunner: Runs on Google Cloud Dataflow

· GearpumpRunner: Runs on Apache Gear pump (incubating).

· SamzaRunner: Runs on Apache Samza.

· NemoRunner: Runs on Apache Nemo.

· JetRunner: Runs on Hazelcast Jet.

The vision of Beam is to support: End Users who want to write pipelines in the language of their choice, SDK Writers who wish to unleash the power of beam through various new languages and finally Runner Writers who has a distributed processing environment and looking forward to supporting the Beam pipelines.

The Hazelcast Jet Runner is one such runner that can be used to execute Beam pipelines using Hazelcast Jet. It allows the user to write a modern Java code that focuses purely on data transformation while it does all the heavy lifting of getting the data flowing and computation running across a cluster of nodes. It supports working with both bounded (batch) and unbounded (streaming) data.

As a part of my course work, I decided to specialize in the track of distributed systems. After completing Distributed Systems , I have enrolled for Advanced Distributed Systems course as well which gave me an interesting opportunity to develop a streaming query that analyses data from the Linear Road Benchmark and I deployed that query in a Flink cluster.

This was the initial spark that triggered my interest in Data Streaming, and I continued to explore Apache Flink, Apache Spark, Apache Samza, and their runner support to Apache Beam. While diving deep into Beam Pipeline Runners, the conference talks about Apache Flink runner for Beam and samza portable runner for Beam gave me an architectural insight about Beam portable runners. Recently I worked on these Distributed Computing Projects, and I gained some hands-on experience with basic data streaming modules. I have also developed a Blackboard to implement Strict, Loose, and eventual consistency models as a part of my distributed systems course work.

I was trying to explore more into the field of Distributed Systems. Finally, I found this interesting DAG-based distributed computing Java library, Jet Runner for building fault-tolerant and elastic data processing pipeline that can distribute DAG tasks across cores and nodes to run in parallel. One other interesting feature of JET is the use of application-level cooperative threads that enable efficient parallelism without any overhead of context switching in OS-level threads. Thus, high-end performance is guaranteed by Jet with no external planning requirement.

Identity Management

Identity Management is a very significant term. We use username/password to identify ourselves online and identity management is under the hands of organizations like Google, Facebook, etc that results in oversharing of information. A practical example is, Anyone over 18 is eligible to vote but the Voter ID not only reveals the age but overshares information like dob, address, etc Yeah! Oversharing of information! This is an interesting case to solve. As a one-stop solution, Self Sovereign Identity comes into the picture. The Decentralized Identifiers (DID) acts as the identifier for the verifiable digital identity. The holder of this DID will have complete control over his data rather than having it under some organizations. It helps us in overcoming the problem of information oversharing with its exciting properties like Decentralization, Self-Sovereignty & Interoperability. The cryptography enables us to use Zero-Knowledge proof to identify ourselves with ease with the help of ledger technologies like blockchain. Oh ! This is super exciting to explore !! This summer, Am one among the very few people selected by The Linux Foundation to work on the hyper ledger projects!

Sunday 6 October 2019

Understanding Chmod 777

One of the very common error , students face day to day is

"You do not have the permissions to upload file to the folder“

And as most of you aware that, chmod 777 <directory_name> is the command that gives you permission to Read/Write/Execute the the content of the folder , this article is about the number 777. Yes, what does it mean by the number 777 in the chmod ?

Each folder will have an 8-bit data for granting permissions. By default, 000 means, no permission is available. By granting Read permission, 4 bits are added to the data. By providing Write permission, 2 bits are added to the data. By providing execute permission, 1 bit is added to the data. So totally it's 7.