Software Theory and Practice – Blog for Distributed Framework for Data Analytics

How to index geospatial data with Apache Spark Solr Connector and query with Solr Client

Alvin Henrick Leave a Comment 6705 Views

This post will describe how can we ingest the geospatial data into Apache Solr for search and query. The pipeline is built with Apache Spark and Apache Spark Solr connector. The purpose of this project is to ingest and index

Apache Spark Analytical Window Functions

Alvin Henrick 1 Comment 45297 Views

It’s been a while since I wrote a posts here is one interesting one which will help you to do some cool stuff with Spark and Windowing functions.I would also like to thank and appreciate Suresh my colleague for helping me

Apache Spark User Defined Functions

Alvin Henrick 1 Comment 33016 Views

I have been working with Apache Spark for a while now and would like to share some UDF tips and tricks I have learned over the past year. Below is the sample data (i.e. people.json) used to demonstrate example of UDF

Query Nested JSON via Spark SQL

Alvin Henrick Leave a Comment 25111 Views

It’s been a while since I wrote a blog so here you go. I have been researching with Apache Spark currently and had to query complex nested JSON data set, encountered some challenges and ended up learning currently the best

Docker backup and restore volume container

Alvin Henrick 2 Comments 29369 Views

This is the continuation from my previous post where I had explained how to run spring boot app inside the docker container as daemon which is using MongoDB as storage and the [/data/db] volume was mounted as docker container volume.

Spring Boot App deployed with Docker and Data Only Container Pattern Explained.

Alvin Henrick 2 Comments 21430 Views

The more I use and learn about Docker and the more I feel like I can’t live without it.This blog is about Docker amazing feature Volume Containers.I wanted to write the Spring Boot app and deploy it to the Docker

Apache Storm and Kafka Cluster with Docker

Alvin Henrick 18 Comments 63752 Views

This post is all about real time analytic on large data sets. I am sure every one has heard about Apache Kafka (Distributed publish subscribe messaging broker) and Apache Storm (Distributed real time computation system.) and if you were disappointed

Hadoop (YARN) Multinode Cluster with Docker

Alvin Henrick 15 Comments 45155 Views

It’s been a while since I have been planning to write a blog and share the knowledge .My wife has been trying to convince me from a very long time that I should write the tech blog because she thinks

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30