Search

Aug 3, 2025

AWS S3 Access Points

This post delves into AWS S3 Access Points, highlighting how they simplify managing data access at scale by providing dedicated access policies per application. Learn how Access Points streamline S3 permissions, enhance security with granular controls, and support services like AWS PrivateLink for secure connectivity. Discover best practices for implementing and leveraging S3 Access Points for efficient and secure data lake management on AWS, crucial for modern cloud architectures.

Jul 24, 2025

Scala Notes

Explore fundamental Scala programming concepts including its functional and object-oriented nature, immutable variables , mutable variables, type inference, and basic syntax for defining functions and classes. This guide introduces key Scala features for developers familiar with other programming languages looking to understand its core principles and get started with Scala development. Discover the power and expressiveness of Scala through this concise overview of its building blocks.

Jul 24, 2025

Scala Collections

Explores the powerful Scala Collections library, detailing immutable and mutable collection types like Lists, Sets, and Maps, along with their common operations for efficient data manipulation. Understand the benefits of immutability and the flexibility of mutable collections in Scala for building robust applications. This guide highlights essential tools for any Scala developer working with structured data.

Jun 8, 2025

Azure DevOps pipeline to deploy Elasticsearch

This guide provides a comprehensive walkthrough for deploying an Elasticsearch application on an Azure Virtual Machine using an automated Azure DevOps pipeline. The process is broken down into four main parts: Azure VM Setup, Azure DevOps Pipeline Setup, Troubleshooting and Optimisation, and Security Recommendations.

Jun 6, 2025

UV is better than Pyenv for Python

UV is an excellent alternative to Pyenv, though they serve slightly different purposes. I have been using pyenv for more than 10 years. Is this the time for the alternative? It is important to note that UV doesn't support Python 2.*.

May 1, 2025

LangChain for AWS Bedrock

This second part describes how to integrate LangChain with AWS Bedrock to build AI applications. It covers the implementation of AWS Bedrock with Amazon Titan and Claude models, as well as key LangChain components, including prompt templates, embeddings, memory, and chains. Code examples demonstrate everything from basic model invocation to creating conversational agents with memory, perfect for developers building production AI solutions.

📚 The first part explained the LLM basics of AWS Bedrock.

Apr 27, 2025

AWS Bedrock Foundation Models

Dive into the essentials of Large Language Models (LLMs) and Foundation Models (FMs) on Amazon Web Services (AWS). This guide explores leveraging AWS Bedrock and related services for building and interacting with powerful generative AI models.

Learn about key concepts including prompt engineering, fine-tuning techniques like prompt-based learning and domain adaptation, and managing inference parameters such as Temperature, Top K, and Top P. Discover how to utilise AWS FM APIs. This post provides the foundational knowledge required to get started with LLMS on AWS.

📚 The second part describes how to use LangChain with AWS Bedrock.

Apr 26, 2025

Maven Proxy handling

Here are the common challenges when working with development tools like Maven and VSCode behind a corporate proxy within a WSL 2 environment.

Currently, my computer is behind the corporate proxy. As a Java Programmer, when I used Maven 3 on the WSL 2 Ubuntu 20.4 Linux environment, I had to set the proxy in the settings.xml under the /home/user/.m2 folder. Although I’ve set the proxy, it doesn’t work as expected: mvn compile complains about unresolved Scala dependencies. Another problem is that VSCode doesn’t show the intellisense. I’ve already installed the Microsoft Java extension pack completely.

Feb 28, 2025

Lua filters for Pandoc

Lua filter used in Pandoc 3.6.3. This blog has solutions for:

Creating Glossary for ePub ver 3 book
GitHub style alerts

Feb 26, 2025

AWS PITR Explained

PITR stands for Point-in-Time Recovery, which is a feature offered by several AWS services to provide continuous data protection and the ability to restore data to a specific point in time.

Dec 30, 2024

Encrypting in Dockerfile and Decrypting in Python

This approach allows you to encrypt sensitive data (like a database password) during Docker build and decrypt it safely at runtime in your Python application.

Nov 3, 2024

Spark - create database and table

This is a short note to create a Hive meta store using Spark 3.3.1.

May 11, 2024

Semantic search with ELSER in Elasticsearch

Elastic Learned Sparse EncodeR(ELSER) is a retrieval model trained by Elastic that enables you to perform semantic search to retrieve more relevant search results.

👉 I created this blog post on docker to demonstrate Linux-optimised ELSER v2. The Elasticsearch version is 8.11.1.

Sep 29, 2023

Elastic Search Introduction

Learn Elasticsearch from zero to hero with this comprehensive guide covering installation, CRUD operations, mapping, and advanced search techniques.

Jul 18, 2023

Kafka PySpark streaming example

arcitecture of the streaming application

The diagram shows that the Kafka producer reads from Wikimedia and writes to the Kafka topic. Then Kafka Spark consumer pulls the data from the Kafka topic and writes the steam batches to disk.

Jul 8, 2023

Terraform For each iteration

This is to explain Terraform for each looping technique. In this example, 3 buckets are created to demonstrate the looping idea.

Jun 13, 2023

Spark to create a table in AWS Redshift

In this post, Spark reads the data from a CSV file to a DateFrame and saves that DataFrame as a Redshift table.

Jun 9, 2023

Spark Kafka Docker Configuration

This is the continuation of the [Spark Streaming Basics](/apache%20spark/2023/06/09/Spark-Streaming-part-1.html). I explained the basic stream example, which runs only on one AWS Glue container. The stream producer was Netcat, and the sink was a text file. In this post, the stream producer is still Netcat, but the sink is Kafka. Both Kafka and Spark running on Docker containers.

Jun 9, 2023

Spark Streaming Basics

This is a very basic example created to explain Spark streaming. Spark run on the AWS Glue container locally.

Apr 28, 2023

Introduction to Lambda Calculus

This is a short description of lambda calculus. Lambda calculus is the smallest programming language that is capable of variable substitution and a single function definition scheme. Haskell is the functional programming language based on lambda calculus, which I will explore. I already explained how to use VSCode for Haskell Development to support the code listed here.

Feb 26, 2023

Python Parameter passing

Discuss the most possible ways of passing parameters in the python functions.

Feb 25, 2023

Python Data Classes

Python Data classes using collections.namedtuple, typing.NamedTuple and latest @dataclass decorator.

Dec 3, 2022

Scala - S3 bucket operations

How to list and upload S3 bucket contents using Scala.

Dec 3, 2022

Scala - AWS EMR Serverless

AWS EMR Serverless is a cost effective AWS Service to which you can submit Spark Scala jobs.

Nov 19, 2022

AWS CI/CD pipeline to Copy files to S3 bucket

Sometime it is necessary to copy files to AWS S3 via CI/CD build pipelines.

Nov 19, 2022

Notes on Introduction to Advanced Bash Usage

While I am going through the following, the youtube talk and it’s associated presentation, my hand-ons were recorded here. It is recommended to go through the basics first. You can also refer to the Bash Ref Manual for more information.

Aug 13, 2022

Pandas type conversion

Sometimes we need to remove unnecessary data and save the column in the right format in the Pandas data frames.

Aug 13, 2022

AWS Glue run locally

This blog explains how to create an AWS Glue container¹ to develop PySpark scripts locally. I’ve already explained how to run the Glue locally using Glue Development using Jupyter.

Develop and test AWS Glue version 3.0 jobs locally using a Docker container ↩

Jul 24, 2022

Access AWS SSM via AWS Stepfunctions

Configuration will be availble throughout the pipeline, if that can be stored in the AWS Stepfunctions. Generally congiruation should be stored in the SSM parameter store. How to access the SSM parameter store from the AWS Stepfunction?

Jul 11, 2022

Glue Development using Jupyter

Developing and testing the Glue job in the viscose IDE is one of the best development opportunities because Jupyter doesn’t support IDE features. In this blog, I set up a Glue docker instance in the EC2 and use the vscode Jupyter notebook feature to develop Glue jobs. If you want to create more customized your own Docker image, please see AWS Glue run locally.

Apr 8, 2022

AWS CFN - Create IGW and NAT

In this post, let’s see how to create Internet Gateway (IGW) and NAT Gateway using Cloudformation (CFN).

This post is a continuation of the AWS CFN - Create VPC and subnets.

Apr 1, 2022

AWS CFN - Create VPC and subnets

This is a fundamental example of creating AWS VPC and the subnets using AWS Cloudformation(CFN). In the next post, I’ve discussed the AWS CFN - Create IGW and NAT.

Mar 20, 2022

Spark to consume Kafka Stream

A simple PySpark example to show how to consume Kafka stream (given Kafka tutorial).

Mar 10, 2022

Kubernetes API

Let’s see how to play K8s in MacOs using MniKube. Some of the topics are very basic such as How to create a namespace and pod in it. Shelling to the pod and after delete pod and the namespace. However, this is written to address the concepts such as configMap, secrets, resource sharing and Helm charts.

Mar 4, 2022

RegEx on MacOS

As I understood, RegExs are very useful for general work. Most of the following regular expressions (RegEx)s can be run on the macOS terminal, where you can get the great value of command line tools that have no value without RegExs (`grep`, `sed` and so on). In addition, I've used some popular tools to explain complex operations later in the document, which have been referenced under the footnotes.

Mar 2, 2022

PySpark Date Exmple

PySpark date in string to date type conversion example. How you can use python sql functions like datediff to calculate the differences in days.

Feb 25, 2022

Python Sequences

Here python list, tulple basic operations are discussed.

Oct 23, 2021

PySpark Data Frame to Pie Chart

I am sharing a Jupyter notebook.

Jun 28, 2021

Jenkins in Docker Container

This is the source code to create a Jenkins Docker container.

Apr 1, 2021

Java Annotations

Annotations are metadata that provide information at the retention level of Java source, class or runtime.

Mar 27, 2021

Understand JPMS

Java Platform Module System (JPMS) has been introduced since Java 9. With Java 9, JDK has been divided into 90 modules. This is a simple example created using IntelliJ IDEA.

As shown in the above diagram, there are three modules, Application, Service and Provider.

download soruce

Mar 23, 2021

Java Thread interrupt

It is important to understand how the Java thread interrupt work.

Source	Target	Action
New	Runnable	thread `start()`.
Runnable	Blocked	`synchronized` lock on.
Runnable	waiting	when object call `Object.wait()`.
Runnable	timed-waiting	when `Thread.sleep(...)`.
Runnable	Terminated	When thread finished.

Mar 16, 2021

Use of default and static methods

A default method added to maintain the backward compatibility which allows older classes (without modifications) to access new version of an interface.

Java 9 interfaces can have private methods and private staic methods. These methods support code reusabilit in the interface level.

Mar 15, 2021

Java Nested Classes

Classes can be defined inside other classes to encapsulate logic and constrain the context of use. For example:

Jan 20, 2021

Normalization

E. F. Codd proposed three normal forms: 1NF, 2NF and 3NF (1970). A revised definition (1974) was given by F. Boyce and Codd, which is known as the Boyce-Codd Normal Form (BCNF, which is 3.5NF) to distinguish it from the old definition of the third normal form. R. Faign introduced 4NF(1977) and 5NF(1979) and DKNF(1981). All the normal forms depend on functional dependency, but 4NF and 5NF have been proposed based on the concepts of multivalued dependency and joining dependency.

Jan 3, 2021

MapReduce

Hadoop MapReduce well explains the pain is writing too much code for simple MapReduce in Java. This organic blog explains how to use MRJob package in Python to write and execute Movie ratings.

Jan 1, 2021

HDFS Basics

After install the sandbox from the Hortonworks, you can visit the http://localhost:50070 page to find the information about the HDFS cluster. YARN job manager can be access via http://localhost:8088.

Nov 21, 2020

Java Future

Java Futures are the way to support asynchronuous operations. Learn the basics of Java 9 Parallelism before read this post.

Nov 21, 2020

Java Concurrent CompletableFuture

The CompletableFuture has been introduced since JDK 8 (2014). This is a abstraction over the `java.util.concurrent. Learn the basics of Java 9 Parallelism before read this post.

Nov 14, 2020

Spring boot property and profile management

Spring property and profile manangement is explained.

Nov 14, 2020

Spring boot CLI

This is a short explanation of how to use Spring boot CLI to create project and run in the macOS.

Nov 5, 2020

GitHub API.

GitHub API is Hypermedia based. This is an elementary post introducing how to interact with GitHub API using curl and the jq tools.

Nov 1, 2020

Quick sort in Python

Quick sort best and the avarage runining time is \(O(n\log{}n)\).

To learn more about Python generators, see python fun.

Oct 31, 2020

Selection sort in Python

Selection sort runining time is very high as \(O(N^2)\).

Oct 29, 2020

Binary search in Python

Binary Search is one of the most fundamental algorithm.

I explain the procedural and functional way of binary search algorithm.

Oct 20, 2020

Python run on containers

We have alrady explain Website hosted as a container. In this post explained how to host flask web application.

Oct 3, 2020

Apache Spark begins with PySpark

PySpark is one of the most popular ways of using Spark. This blog considers the use of the basic of Spark SQL with data frames.

Oct 3, 2020

Website hosted as a container

This is very short tutorial to show how to quickly create a web server using Docker container. The Docker should be installed in your machine as a prerequesit.

Sep 26, 2020

First step to AWS CDK

This is my first step of using AWS CDK in macOS. I am using Pyenv tool to create python enviroment as explained in the Python my workflow.

Here the simple example created using AWS CDK.

Followed AWS CDK Python workshop.

Sep 17, 2020

Jul 13, 2020

Mac keyboard shortcut to copy file path as markdown

How to create macOS Automator Quick Action to copy the file/folder markdown path using shortcut keys as the same way you copy the path name of the file/folder.

Posts