📝 Research :https://ojitha.blogspot.com.au for my lengthy articles.

Spark Dataset APIs

November 7, 2025
Scala Functors
Comprehensive technical guide to the Apache Spark Dataset API, defining it as a distributed collection that provides type safety while benefiting from the performance optimisations of the Catalyst Optimiser. It explains key internal mechanisms, such as Encoders, which manage the serialisation between domain-specific JVM objects and Spark’s internal binary format, using the MovieLens dataset to illustrate conceptual data entities. The text analyses fundamental transformations, including the functional narrow transformations like map and flatMap, and contrasts the standard, untyped join with the type-safe joinWith operation. Furthermore, the guide highlights significant performance considerations for wide transformations, noting that groupByKey requires a full data shuffle and lacks the map-side combine optimisation available in the standard DataFrame groupBy. Finally, the documentation scrutinises a physical query plan to detail how Adaptive Query Execution (AQE) dynamically optimises resource usage by adjusting partition sizes based on runtime statistics.
More…

Functional Programming Abstractions in Scala

October 31, 2025
Scala functional fundementals

Master the foundation of modern Scala development by exploring five essential functional programming abstractions. This guide takes a deep dive into algebraic structures, starting with Semigroup and Monoid for combining values. It progresses to type constructors, explaining how Functors transform wrapped data, Applicatives combine independent contexts, and Monads sequence dependent computations. By understanding these core patterns, developers can write more polymorphic, composable, and algebraically sound Scala code that works across diverse data types.

More…

Scala 2 Collections explained

October 27, 2025
Scala Functors

Scala collections are a powerful feature providing rich data structures for working with sequences, sets, and maps. The collection hierarchy comprises three main types: Sequence for ordered indexed access, Set for unique elements, and Map for key-value pairs. Scala emphasises immutable collections by default, ensuring thread-safety and referential transparency, while mutable collections enable efficient in-place modifications. Key collection types include List for linked list operations, Vector for random access, and Range for memory-efficient numeric sequences. Understanding the distinction between immutable and mutable collections is essential for writing safe, concurrent Scala code. Iterators enable lazy evaluation, allowing efficient processing of large datasets without consuming memory.

More…

Scala 2 Functors explaned

October 26, 2025
Scala Functors

This comprehensive guide explores Scala 2 Functors, one of functional programming's fundamental abstractions rooted in category theory. Learn how Functors enable transforming values within computational contexts like List, Option, Either, and Future without leaving that context. The article covers mathematical foundations, including categories, objects, and morphisms, then demonstrates practical Scala implementations with concrete examples. Discover the Functor trait's map operation, understand identity and composition laws, and explore advanced concepts like contravariant functors and functor composition. Perfect for Scala developers seeking to master functional programming patterns, this guide bridges theoretical category theory with real-world Scala code examples and best practices.

More…

Ontology Evals for LLMs

October 8, 2025

My previous work on ontologies defined in RDF Structured Data mining1 and Apache Jena2 provide formal, structured representations of domain knowledge that can be harnessed as evaluation frameworks (Evals) for assessing outputs of LLMs. Using the well-known Pizza ontology3 as a running example, illustrate how domain-specific ontologies can guide the evaluation of LLM-generated content such as recipe emails. The framework emphasises mapping unstructured LLM outputs into ontology-aligned structured data, applying reasoning engines to verify factual and logical coherence, and deriving quantitative and qualitative evaluation metrics.

More…