Review of Latent Dirichlet Allocation (A Clustering Algorithm)


Again I would like to thank to all my reader where ever you are. I took a quick look on the stats of this months since today is the last day of October. And the result is quite pleasing. This month has the highest views and highest visitors ever. The trend is always up from month to month. It won’t be possible without you guys. So hope you enjoy my writings on this blog.

This week post is about a clustering algorithm called Latent Dirichlet Allocation (LDA). When I first read about this algorithm I thought it is just another clustering algorithm like K-Means, that I wrote several post ago. Boy, was I wrong for underestimating this algorithm.

Lanjutkan membaca “Review of Latent Dirichlet Allocation (A Clustering Algorithm)”

Spark GraphX: The Fast Map Reduce for Graph


First of all I would like to apologize for writing post of this week quite late then normal, which is normally by the beginning of each week. Some small project hindered me from doing so last weekend. But this post should be quite interesting.

This post is dedicated to Spark GraphX. Spark’s library to handle graphs. I wrote in my previous popst about graph database. Spark Graph is not a graph database. It is a library to handle graph data. It is similar to tinkerpop. The graph data can be stored in HDFS or NoSQL databases. Anywhere as long as spark can take an run operation on those data.

Normally the graph data are stored in two entities (tables in databases or files in HDFS). The first one contains all the vertices or nodes. The second one contains the the edges that connect the vertices.

Spark GraphX provides method to create graphs, to run computation on the graph, etc. Some famous case for graph computation is to find influencer from a network of people. Many companies wants to know these influencers in order to approach them to promote their product, or sometimes bashing their competitors.


Let’s Go With Go


Few posts ago, I posted about Python as an alternative language in Big Data solutions. It is quite obvious since python is a matured language and technology. Almost all, if not its all already, big data solutions can be written using python. Let it be Spark, Storm, map reduce, etc. Python also has active and vibrant community that produced data sience libraries like scikit-learn, pandas, etc.

OK I should stop talking about Python. Its about introducing a new language. Much younger language called Go. It was introduced by Google in 2009. Although its relatively new language compare to Python, Scala or Java, Go has gained a huge popularity in recent years.

Lanjutkan membaca “Let’s Go With Go”

Story of ELK (Elasticsearch, Logstash, and Kibana)


I would like to thank my readers for their time in visiting this blog. Its been a good month where traffic is high and new followers are joining in. So welcome and I hope you enjoy the blog.

In the past few days I read about ELK. It’s not a new thing I have read about it but I never tried the three of them all together. Before, I only consider elasticsearch as the star of the show because of its search ccapabilities. But in the past few days I really learned the all three and found that they could be useful together.

Lanjutkan membaca “Story of ELK (Elasticsearch, Logstash, and Kibana)”

Apa Sih Data Analytics Itu?

Image source:

Image source:


OK sampai juga kita pada posting bahasa Indonesia. Memang dari waktu ke waktu saya sesekali menulis posting dalam bahasa Indonesia karena untuk memuaskan mayoritas pembaca blog ini yang dari Indonesia.

Untuk posting hari ini tidak akan terlalu detil teknis tapi lebih ke gambaran umum mengenai data analytics. Sebenernya mau saya Indonesia-kan menjadi analisis data tapi agak janggal kedengerannya dan rasanya akan mempermudah untuk belajar data analytics lebih lanjut mengingat kebanyakan sumber mengenai data analytics dalam bahasa inggris.

Definisi data analytics adalah kegiatan untuk meneliti dan memeriksa data mentah untuk mendapatkan kesimpulan yang akurat berdasarkan data yang telah dikumpulkan. Kesimpulan tersebut sangat tergantung dari pertanyaan atau masalah yang kita ingin tahu jawabannya dari data yang telah terkumpul.

Lanjutkan membaca “Apa Sih Data Analytics Itu?”