First of all I would like to thank you to all my beloved readers for reading this blog. Because of you I received another stats booming notifications from wordpress.Once the traffic has reached a high number, I would migrate this blog to a dedicated web site with its own domain name.
Also for my Indonesian readers. The definition of Big Data in Bahasa Indonesia has been selected by google as Google’s definition of Big Data in Bahasa Indonesia. Jadi terima kasih banyak, thank you, merci, grazie, obrigado, xie xie, kansahamnida, arigatou, danke, gracias, धन्यवाद. Those are thank you in some of the languages that my visitors are coming from. I want to say in all languages but I still cannot speak Swahili 🙂
OK now back to the post. In the past 2 weeks I learned about a new emerging technology in software development. Its not really new actually it has been around for more than 3 years. And yes ofcourse its name is Docker.
For those who don’t know, Docker is a container technology where we can install various software. It is kind of lightweight virtual machine. Docker has gained popularity in the past few years.
One of the thing I like about docker is Docker hub. Docker hub is some kind of repository that contains many linux images pre-installed with various software. For example you can find linux image with mysql database pre-installed. Having this image you can quickly setup docker container without having to install all required dependency softwares. We can also create our own image and put it on Docker hub.
Docker can ease us to learn, develop and test big data application. How? exactly by using the pre-defined image with all the required dependencies.
I looked on Docker hub and see there are quite many big data softwares available in it for example elasticsearch, cassandra, spark, mongodb, and couchbase. All of them are official image and not created by someone and just put it on Docker hub.
Using this method, no need to setup NoSQL database, data analytics framework, and so on. It is much faster to start creating big data application and tested it right on our PC/laptop. We can also simulate a cluster there.
One thing I cannot fine hadoop official in Docker hub. Perhaps because even hadoop is dockerized, most laptop/PC cannot cope with it.
However, no need to fear the good thing about this community is that if it is not there, someone will put it there soon. And its true. There is Ferry that helps us to create big data cluster, not only hadoop. There is Pachyderm aan analytics tools to analyze data in a container. And Coho to help running big data application as microservices.
I personally will not use those three tools in production right now not because they are bad, but because they are new and I haven’t heard many companies use them, so not really combat-proven for relying pour big data infrastructure to them. Another reason is because some of them are not free and open source but that is a matter of personal preferences. But nevertheless these docker and all of thopse tools are good to help learning big data. Hence in the title I put ‘Learning’ and not ‘Operating in Production’ nor ‘Deploying in Production’.
Hope this helps 🙂