Viaje a Japón. Parte 1: Preparando el viaje.

Aquí la travesía por Tokyo y Yokohama. Aquí la travesía por Osaka y Kyoto.

    Por muchas razones me siento atraído a muchas facetas de la cultura japonesa y por lo tanto, viajar a Japón siempre fue un objetivo que me había planteado. Sin embargo, nunca me propuse seriamente organizar un viaje. Varios amigos han viajado a Japón y me lo han recomendado, esto me permitió darme cuenta de que no es tan inalcanzable o complicado como parece en primera instancia. Después de cuestionarme qué hacer con dos semanas de vacaciones, Briana sugirió ir a Japón, y fue así como empezamos los preparativos.

    Aquí voy a contar los sucesos del viaje, a manera de memoria. También voy a tratar de dar consejos para gente que tenga planeado viajar a Japón en el futuro. Japón es un país sumamente organizado, y como tal, requiere que tanto sus habitantes como sus visitantes sean organizados para encajar en su sociedad. El sistema de trenes (del cual hablaré extensamente después) es un buen ejemplo de esto, pues para usarlo adecuadamente es necesario manejar una diversidad de horarios y líneas. Al principio puede parecer intimidante ver una lista tan larga de preparativos, pero cada punto nos ayudó a aprovechar nuestro tiempo y disfrutar más del viaje. El viaje sería relativamente largo y teníamos planeado visitar muchos lugares y hacer muchas cosas, así que fue necesario prepararnos con meses de anticipación. Hicimos ciertas cosas que fueron vitales para lograr esto:

  • Escribí un programa que encuentra vuelos de acuerdo a ciertas características. Tardé bastante tiempo en completarlo, pero realmente valió la pena, pues me permite encontrar vuelos baratos casi al momento en el que se ponen a la venta. También, Briana pasó mucho tiempo monitoreando sitios de promociones en vuelos, esto nos permitió aprender a distinguir entre un vuelo caro y uno barato, lo cual a su vez me dio una buena idea de qué parámetros darle a mi programa. El programa encontró un vuelo barato desde LAX hasta Narita (en All Nippon Airways), así que buscamos otro vuelo barato desde Guadalajara hasta Los Ángeles (En Volaris).
  • El pasado viaje a Nueva York me enseñó que las maletas de rueditas no son prácticas si se va a caminar mucho, así que compramos mochilas de backpacking. Específicamente, la High Sierra Lighting 35. All Nippon Airways nos permitió subirlas como equipaje de mano, pero Volaris no. Compramos este modelo de mochilas porque se veía práctica y no tan grande, además de que fue relativamente fácil encontrarlas en tiendas de Guadalajara. Aunque fueron un poco caras, nos resultaron muy útiles y esperamos usarlas en muchos viajes futuros.
  • Compramos un JR pass. Básicamente, un JR pass es un boleto de entrada gratis e ilimitada a todos los medios de transporte de Japan Rail. Esto fue súmamente útil y recomendable pues nos ahorró mucho dinero, sobre todo en viajes largos. Dado que Japan Rail es la compañía predominante en Japón, el pase se puede usar en la gran mayoría de estaciones de trenes locales, así como en ciertos autobuses, el Shinkansen, el Narita Express e incluso un ferry. Curiosamente, es imposible comprar un JR pass dentro de Japón, y Japan Rail no tiene una tienda oficial, así que es necesario buscar buenos precios con agencias de viaje que se dedican a esto. Nosotros encontramos el mejor precio en Japan Experience, que envía el voucher del JR pass vía FedEx. Esta manera de venderlo me pareció muy poco práctica, pues es necesario recibir un vale físico que después se cambia por el JR pass en alguna de las estaciones de JR en Japón, cuando sería mucho más sencillo que enviaran algún tipo de código por email. La razón de esto es para asegurarse de que sólo los extranjeros usarán el JR pass. Japan Experience también envió gratis un pequeño libro con consejos de cómo moverse en los trenes japoneses y cuales lugares visitar. Este libro resultó ser inesperadamente útil y práctico.
  • Dado a que no queríamos permanecer en Tokyo por las dos semanas que duraría el viaje, usando AirBnB reservamos dos departamentos: uno en Tokyo (en Shibuya, específicamente) y otro en Osaka. Ambos proveían dos amenidades que fueron muy útiles: Un pocket wi-fi y una lavadora. La lavadora fue vital pues nos permitió viajar con una cantidad pequeña de ropa y lavarla cuando fuera necesario. El pocket wi-fi nos permitió estar conectados a Internet, incluso mientras estábamos fuera del departamento. Esto fue vital para no perdernos y buscar información. En Japón, muy pocos lugares ofrecen conexión gratis y libre a Internet. Las dos ocasiones en las que tratamos de conectarnos al wi-fi de McDonald’s, no funcionó. En Starbucks es necesario hacer un proceso de registro por Internet antes de llegar al restaurante, lo cuál hace casi imposible usarlo en una emergencia.
  • Bajé mapas del metro de Tokyo, Osaka y Kyoto. Esto fue muy buena idea ya que al llegar al aeropuerto de Narita estábamos bastante perdidos y no sabíamos cómo llegar a la estación de Shibuya. Sin embargo, aquí he de hacer una advertencia, pues sólo bajé un mapa por cada ciudad, y resulta que cada ciudad tiene varios sistemas de trenes totalmente independientes (incluso si comparten algunas estaciones), así que es necesario tener varios mapas por ciudad. Después hablaré de esto más a detalle.
  • Buscamos adaptadores de corriente, pero no fueron necesarios. Japón usa un voltaje de 100v, así que la mayoría de los aparatos que funcionan con 110v también funcionan allá sin necesidad de adaptadores.
  • Compramos yenes por adelantado. Esto también fue buena idea, porque tuvimos ciertos problemas para retirar dinero de los cajeros automáticos. Globo Exchange es uno de los pocos lugares que venden yenes en México, así que hicimos una reservación y los compramos el mismo día del inicio del viaje.
  • Mi GPS Garmin que me ha resultado ser tan útil hasta ahora, resultó ser totalmente inútil en Japón. A pesar de que podía localizarme correctamente, el GPS no tenía ningún mapa urbano de Japón, y por lo tanto no podía trazar rutas o darme direcciones exactas. Recomendaría mucho tener un GPS que no use conexión a Internet, pero primero asegurándose de que tenga mapas detallados y actualizados del lugar al que se va a viajar. Después descubrí que mi GPS no tiene mapas de Asia porque lo compré en México, y hasta ahora sólo he visto que tenga mapas de ciudades de América. Debido a esto, para ubicarnos usamos Google Maps en nuestros smartphones, que llevamos conectados al pocket-wifi del departamento.
  • Me debatí mucho si debía llevar una laptop al viaje. Al final decidí llevar mi vieja netbook Toshiba y, a pesar de que no fue algo primordial, si fue de ayuda pues pude respaldar los cientos de fotos que tomamos, además de que nos permitió navegar y revisar documentos que necesitábamos.
  • Compré una cámara Nikon Coolpix S9900. Esta cámara, además de ser muy pequeña, tiene muchas opciones interesantes y genera imágenes de calidad muy superior a casi cualquier celular actual. Durante el viaje vi a muchos turistas que usaban su celular para tomar fotos, y en cierta medida esto es aceptable, pero Japón tiene paisajes tan hermosos que un lente de poca calidad realmente no es suficiente para hacerles justicia.
  • Estudié japonés por aproximadamente dos años. Aclaro que cuando comencé a estudiar, no tenía planeado viajar a Japón, pero a pesar de que mi japonés es muy básico, resultó muy útil en el viaje, pues en Japón hay muy poca gente que hable inglés (ya no se diga español). Después hablaré más a detalle de esto.
  • Pedí consejos. No hay nada mejor que obtener consejos de alguien que ya haya hecho un viaje parecido. Precisamente por eso estoy escribiendo esto. Avena (que lleva viviendo en Japón por varios años) y Alejandra (que viajó para allá recientemente) me dieron muy buenos consejos.

 

Llegando a Japón.

    En Tokyo hay dos aeropuertos principales: Narita y Haneda. La aerolínea del vuelo de LAX a Narita fue All Nippon Airways, aunque extrañamente la compra del boleto fue a través de United Airlines. El vuelo desde LAX hasta Narita duró aproximadamente 11 horas, que fueron hechas un poco menos cansadas gracias al excelente servicio de la aerolínea. Puedo confirmar que la clase económica de All Nippon Airways es comparable y en muchos aspectos superior a la primera clase de United Airlines. Durante el vuelo recibimos dos comidas completas bastante buenas y muchos aperitivos y bebidas, todo incluido en el precio del vuelo. Cada asiento del Boeing 777 que usamos tiene una pantalla en donde se pueden ver películas, documentales y otros videos, así como un mapa de la ruta actualizado en tiempo real. La selección de películas fue sorprendentemente buena y nueva para la clase económica, y también se pueden ver videos turísticos de Japón para hacer planes de último momento.

    Una vez que llegamos a Narita, pasamos por la aduana. encontramos una ruta a Shibuya y tardamos más de una hora en recorrerla, el reto fue encontrar el departamento que habíamos reservado. A pesar de que el departamento estaba a menos de dos cuadras de la estación, no teníamos manera de saberlo ya que mi GPS no pudo darnos indicaciones y las direcciones en Japón tienen un formato totalmente incomprensible. Ya que no teníamos manera de conectarnos a Internet para usar Google Maps, nos vimos en la necesidad de pedir ayuda a un taxista, que muy amablemente nos ayudó a pesar de las dificultades al entendernos, y no sólo eso, sino que se reusó a recibir dinero a cambio de su ayuda. Esta fue la primera vez que comprobamos la característica amabilidad japonesa, y no sería la última.

Share

Alan Verdugo / 2016/11/23 / Uncategorized / 1 Comment

Comptia Linux+ certification

    I recently completed the Comptia Linux+ certification. I spent much more time than I previously hoped on this, and because of that, I wanted to write about it. After all, this was the reason why I did not update this blog as frequently as I wanted.

    First of all, let me tell you about the basic stuff. I chose this particular certification as my first one because I am very interested in Linux and everything that is related to Open Source. Also, this particular certification has a 3-for-1 offer. This means that if you complete the certification requirements, you will not only get the Comptia Linux+ certification, you will also get LPIC-1 and the SUSE CLA certification. Alas, after September 1st, 2016, SUSE decided to stop participating on this offer, so now it is actually a 2-for-1 offer, which is still pretty good in my opinion.

    In order to get the certification, you need to pass two exams: LX0-103 and LX0-104. Currently, an opportunity to take each test has a price of $194 US dollars. Each exam consist of 60 questions that you can answer in a 90-minute period. In order to pass an exam, you need a minimum of 500 points (on a scale of 200 to 800). I am still not sure how the questions are graded

Preparing for the exams.

    The only material I used for studying was the “Comptia Linux+ Powered by Linux Professional Institute Study Guide: Exam LX0-103 and Exam LX0-104 3rd edition” book. Its name alone should tell you how long and boring it is to read (like most technical books). However, it is the tool that allowed me to be certified, so it does deliver what it promises and I would recommend it. The book also includes a discount code for the exams and access to a website where you can study using flashcards and a test exam.

    I admit I did not study frequently, there were days when I read the book for a couple of hours, then I did not read it until weeks later because I just did not have the time. I know for a fact that proper discipline and regular study schedules while reading this same book will result in better grades on the exams. However, I read the book three times from beginning to end. It was boring, painful, and I just got sick of reading the same thing over and over again (I committed to not read any other book until I got the certification), but it was worth it in the end.

Taking the exams.

    Once you paid and scheduled your exam, you just need to go to the PearsonVue center you selected. You only need to take a couple of official IDs with you. The lady that helped me was very kind and made sure to explain the whole process clearly. She asked me for my IDs, verified that my signature and picture matched and then took another picture of me. All this is just to ensure that nobody else is able to take the exam and claim it was you. So, if you were thinking in asking a friend to go and take your certification test for you, it will simply not work. Security is very thigh and I think that is good. I was given a set of rules and told to agree on them. The rules basically say that you will not cheat and will not help other people cheat (which is practically impossible anyway).

    After that, I was given a key and told to put all my things in a drawer. You are not allowed to sit in front of your computer with your cellphone, jacket, keys, notebooks, or anything else that could be used to cheat. I was given a marker and a small whiteboard, which I was supposed to use as a notebook if needed.

    As for the actual questions, some of them are multiple choice, some are what I like to call multiple-multiple-choice (“choose the 3 correct answers from 5 options”) and in some questions you have to actually type the answer on a text box. I think 90 minutes is much more time than it is actually needed for 60 questions since you will know the answer right away or not know it, in both cases you maybe need a couple of seconds for each question. I used my extra time to re-read and think about the answer I chose, since some of the questions can be very tricky.

    Once you finish the exam, you are given your grade, so you know right away if you passed or not. The only “feedback” you receive are the exam objectives you failed. You never know which questions you answered incorrectly or why. If you failed on an answer related to network routing (for example), in your results sheet you will see a message saying that “network routing” is one of the exam objectives you failed. And that’s it. Of course, this is done to further ensure that you do not spread information about the questions or answers after you took the exam.

Lessons learned.

    I spent several months studying for the exams. Actually, I spent so much time studying for this, that the original exams (LX0-101 and LX0-102) were updated to new versions, which made me start studying again using new study materials because the exams’ objectives were also updated. In the future I will try to complete certifications faster to avoid this. The SUSE CLA certification offer was removed just after I scheduled my second exam, but before I actually took it, so I lost that opportunity as well just because I wanted more time to study. This is just another example of how quickly technology advances, you can literally see how some projects are outdated in a matter of days. If you want to stay current, you need to move fast, and this is something not a lot people can or want to do.

    Would I do this again? Yes, I would. Maybe not this year or even next, but I think certifications are valuable, not just because of the title in your CV, but because it shows that you are willing to undertake a challenge, prepare for it, and actually achieve it, while learning new tricks during the process. Maybe Comptia Linux+ and LPIC-1 are not as famous as the certifications from RedHat, and I was able to pass both exams in my first try, but they were much harder than I expected, and because of that I think they should be taken more seriously among employers and recruiters. I considered myself an advanced Linux user with professional experience as a system administrator, but I was still able, and required, to learn many new things in order to get the certification, for this fact alone I think it is worth it.

Share

Alan Verdugo / 2016/09/21 / Uncategorized / 0 Comments

Web syndication: The most useful thing nobody uses.

    In Alan Moore’s seminal comicbook Watchmen, the retired superhero Ozymandias (considered the smartest man in the planet), now an extremely successful businessman, is seen watching a wall of television sets, each set to a different channel. He does this in order to absorb as much information as possible in a reduced amount of time. He uses that information and his intellect to help his businesses grow.

    Watchmen is set during the Cold War era. At that time, television was the best massive communication instrument, so Ozymandias was right in using it. However, we now live in the 21st century, and our broadcast instruments (both physical and logical) are vastly superior to a wall of TVs. However, it is evident that not many of us are using them to their full potential, even when we are proven everyday that information is indeed power. We are relinquishing some of that power every time we use an unorganized mechanism of information consumption.

    In a world that is more connected than ever, we are surrounded by constant updates for an increasing number of sources. We are being pressured into keeping up to date to an ever increasing amount of information, yet we have an ever decreasing amount of time to do it. Web syndication mechanisms have arisen trying to solve problems like this one, and they are actually very effective in doing it. However, their usage has been relatively low and it keeps declining.

    The most popular Web Syndication mechanism is RSS[1]. Reading a RSS feed is like reading the front page of a very organized newspaper. A newspaper that updates itself every minute.

    As an avid user of RSS feeds, I can start my browser and be updated in all the important news in just a couple of minutes. More importantly, I can do that again every few hours to know if something new has happened, and doing it will only take me another minute. This is a great help in my productivity since I can still be aware of everything that is happening but it just requires a fraction of the time it used to take before I started using feeds. Before using feeds, I browsed trough the news pages aimlessly, wasting hundreds of hours every month. Now, I still get the same content, but I have reduced the time I need to do so.

    There was some controversy when Firefox 4 launched and suddenly the RSS button was removed from the default layout. According to Mozilla, the reason behind that decision was that only 3% of the users actually clicked on it[2], which is one of the lowest usage rates of the main UI elements in the whole browser. This proves the point, Web Syndication is one of the most useful mechanisms on the Internet, but only a waning minority is taking advantage of it.

Based on over 117,000 Windows 7 and Vista Test Pilot submissions from 7 days in July 2010.

This heatmap shows the usage rate of the RSS button in Firefox. Source: https://heatmap.mozillalabs.com/

    Google Chrome plainly offers no native support for feeds, and the installation of an extension is necessary in order to use them. In 2013, Google discontinued Google Reader, the most popular RSS client at the time. Twitter stopped supporting RSS in 2013. Apple removed support for RSS in Safari and Mail when OS X Mountain Lion was launched in July of 2012. From then on, the users are directed to the Mac App Store, where they can buy an RSS reader[3]. Worst of all, the number of web designers that add a RSS button to their designs is decreasing, as is the number of back-end programmers that actually implement a RSS feed in the page to begin with.

    No wonder the usage of Web syndication is declining. It was low to begin with, and the tech giants are making it harder for people to use it or even to know that it exists.

    But why have all these companies forsaken RSS? After all, RSS is a very useful feature that is easy to implement, and it does not require many resources. In the case of Google Reader, it was said that the product usage declined, which is a similar case to the removal of the RSS button on Firefox. However, Apple went a step further and removed a feature that was already working and caused many problems for their users in the process. RSS in OS X was not broken, but Apple decided to “fix it” by simply removing it. In other words, RSS was not hurting anybody, but they decided it was its time to go.

RSS usage statistics. Source: http://trends.builtwith.com/feeds/RSS

RSS usage statistics. Source: http://trends.builtwith.com/feeds/RSS

    Without RSS, I would have to go to my news pages and look for the news. I have noticed that this is very distracting. With RSS, the news come to me, and they are waiting there to be read. I can choose when I want to visit a site in order to read further. I can preview the content of a site and decide beforehand if I really want visit it.

    And this brings the question: are RSS/Atom feeds bad for a website? After all, if I have to actually go to a site in order to read the headlines, it is much more likely that I will read more articles, click on banners and spend more time browsing the site, which directly or indirectly increases the site’s income. In this sense, for a webmaster, feed reading is like window-shopping, when the user could instead enter the store and be subjected to a much more complete marketing experience. This seems to be the case. After all, Facebook and Twitter both have reduced or completely removed support for RSS feeds. They obviously prefer their users to spend more time in their advertisements-plagued main sites instead of just getting plain-text updates via a feed. Window-shopping is bad business, while a complete “shopping” experience has proven to be much more profitable. Just ask Starbucks.

References:

  1. http://trends.builtwith.com/feeds
  2. https://heatmap.mozillalabs.com/
  3. https://en.wikipedia.org/wiki/OS_X_Mountain_Lion#Dropped_and_changed_features
Share

Alan Verdugo / 2016/06/19 / Uncategorized / 0 Comments

Introduction to Apache Spark

SPARKlogosmall    Spark is an open source cluster computing framework widely known for being extremely fast. It was started by AMPLab at UC Berkeley in 2009. Now it is an Apache top-level project. Spark can run on its own or can run, for example, in Hadoop or Mesos, and it can access data from diverse sources, including HDFS, Cassandra, HBase and Hive. Spark shares some characteristics with Apache Hadoop but they have important differences. Spark was developed to overcome the limitations of Hadoop’s MapReduce in regards of iterative algorithms and interactive data analysis.

Logistic regression in Hadoop and Spark.

Logistic regression in Hadoop and Spark.

   Since the very beginning, Spark showed great potential. Soon after its creation, Spark was already showing that it was being ten or twenty times faster than MapReduce for certain jobs. If is often said that it can be as 100 times faster, and this has been proven many times. For that reason, it is now widely used in areas where analysis is fundamental, like retail, astronomy, biomedicine, physics, marketing, and of course, IT. Thanks to this, Spark has become synonymous with a new term: “Fast data”. This means having the capability to process large amounts of data as fast as possible. Let’s not forget Spark’s motto and raison d’être: “Lightning-fast cluster computing”.

   Spark can efficiently scale up and down using minimal resources, and developers enjoy a more concise API, which helps them be more productive. Spark supports Scala, Java, Python, and R. While it also offers interactive shells for Scala and Python.

Components:

  • Spark Core and Resilient Distributed Datasets. As its name implies, Spark Core contains the basic Spark functionality, it includes task scheduling, memory management, fault recovery, interaction with storage systems, and more. Resilient Distributed Datasets (RDDs) are Spark’s main programming abstraction, they are logical collections of data partitioned across nodes. RDDs can be seen as the “base unit” in Spark. Generally, RDDs reside in memory and will only use the disc as a last resource. This is the reason why Spark is usually much faster than MapReduce jobs.
  • Spark SQL. This is how Spark interacts with structured data (like SQL or HQL). Shark was a previous effort in doing this, which was later abandoned in favor of Spark SQL. Spark SQL allows developers to intermix SQL with any of Spark’s supported programming languages.
  • Spark Streaming. Streaming, in the context of Big Data, is not to be confused with video or audio streaming, even if they are similar concepts. Spark Streaming handles any kind of data instead of only video or audio feeds. So, it is used to process data that does not stop coming at any time in particular (like a stream). For example, Tweets or logs from production web servers.
  • MLlib. It is a Machine Learning library, it contains the common functionality of machine learning algorithms like classification, regression, clustering, and collaborative filtering. All of these algorithms are designed to scale out across the cluster.
  • GraphX. Apache Spark’s API for graphs and graph-parallel computation, it comes with a variety of graph algorithms.

spark2

   Spark can recover failed nodes by recomputing the Directed Acyclic Graph (DAG) of the RDDs, and it also supports a recovery method using “checkpoints”. This is a clever way of guaranteeing fault tolerance that minimizes network I/O. RDDs achieve this by using lineage, i.e. if an RDD is lost, the RDD has enough information about how it was derived from other RDD (i.e. the DAGs are “replayed”), so it can be rebuilt easily. This works better than fetching data from disk every time. In this way, fault tolerance is achieved without using replication.

Spark usage metrics:

   In late December 2014, Typesafe created a survey about Spark, and they noticed a “hockey-stick-like” growth in its use[1], with many people already using Spark in production or planning to do it soon. The survey reached 2,136 technology professionals. These are some of their conclusions:

  • 13% of respondents currently use Spark in production, while 31% are evaluating Spark and 20% planned to use it in 2015.
  • 78% of Spark users hope to use the tool to solve issues with fast batch processing of large data sets.
  • Low awareness and/or experience is currently the biggest barrier for users implementing Spark effectively.
  • Top 3 industries represented: Telecoms, Banks, Retail.
Typesafe survey results snapshot.

Typesafe survey results snapshot.

   So, is Spark better than Hadoop? This is a question that is very difficult to answer. This is a topic that is frequently discussed, and the only conclusion is that there is no clear winner. Both technologies offer different advantages and they can be used alongside perfectly. That is the reason why the Apache foundation has not merged both projects, and this will likely not happen, at least not anytime soon. Hadoop reputation was cemented as the Big Data poster child, and with good reason. And for that, with every new project that emerges, people wonder how it relates to Hadoop. Is it a complement for Hadoop? A competitor? An enabler? something that could leverage Hadoop’s capabilities? all of the above?

Comparison of Spark's stack and alternatives.

Comparison of Spark’s stack and alternatives.

   As you can see in the table above, Spark offers a lot of functionality out of the box. If you wanted to build an environment with the same capabilities, you would need to install, configure and maintain several projects at the same time. This is one of the great advantages in Spark: having a full-fledged data engine ready to work out of the box. Of course, many of the projects are interchangeable. For example, you could easily use HDFS for storage instead of Tachyon, or use YARN instead of Mesos. The fact that all of these are open source projects means they have a lot of versatility, and this means the users can have many options available, so they can have their cake and eat it. For example, if you are used to program in Pig and want to use it in Spark, a new project called Spork (you got to love the name) was created so you can do it. Hive, Hue, Mahout and many other tools from the Hadoop ecosystem already work or soon will work with Spark.[2]

   Let’s say you want to build a cluster, and you want it to be cheap. Since Spark uses memory heavily, and RAM is relatively expensive, one could think that Hadoop MapReduce is cheaper, since MapReduce relies more on disc space than on RAM. However, the potentially more expensive Spark cluster could finish the job faster precisely because it uses RAM heavily. So, you could end up paying a few hours of usage for the Spark cluster instead of days for the Hadoop cluster. The official Spark FAQ page talks about how Spark was used to sort 100 TB of data three times faster than Hadoop MapReduce on 1/10th of the machines, winning the 2014 Daytona Graysort Benchmark[3].

   If you have specific needs (like running machine learning algorithms, for example) you may decide over one technology or the other. It all really depends of what you need to do and how you are paying for resources. It is basically a case-by-case decision. Neither Spark nor Hadoop are silver bullets. In fact, I would say that there are no silver bullets in Big Data yet. For example, while Spark has streaming capabilities, Apache Storm generally is considered better at it.

Real-world use-cases:

   There are many ingenious and useful examples of Spark in the wild. Let’s talk about some of them.

   IBM has been working with NASA and the SETI Institute using Spark in order to analyze 100 million radio events detected over several years. This analysis could lead to the discovery of intelligent extraterrestrial life. [4] [5] [6]

   The analytic capabilities of Spark are also being used to identify suspicious vehicles mentioned in AMBER alerts. Basically, video feeds are entered into a Spark cluster using Spark Streaming, then they are processed with OpenCV for image recognition and MLlib for machine learning. Together, they would identify the model and color of cars. Which in turn could help to find missing children.[4] Spark’s speed here is crucial. In this example, huge amounts of live data need to be processed as quickly as possible and it also needs to be done continually, i.e. processing as the data is being collected, hence the use of Spark Streaming. [7]

   Warren Buffet created an application where social network analysis is performed in order to predict stock trends. Based on this, the user of the application gets recommendations from the application about when and how to buy, sell or hold stocks. It is obvious that a lot of people would be interested in suggestions like this, and specially when they are taken from live, real data like Tweets. All this is accomplished with Spark Streaming and MLlib.[4]

   Of course, there is also a long list of companies using Spark for their day-to-day analytics, the “Powered by Spark” page has a lot of important names like Yahoo, eBay, Amazon, NASA, Nokia, IBM Almaden, UC Berkeley, TripAdvisor, among many others.

   Take, for example, mapping Twitter activity based on streaming data. In the video below you can see a Spark Notebook that is consuming the Twitter stream, filtering the tweets that have geospatial information and plotting them on a map that is narrowing the view to the minimal bounding box enclosing the last batch’s tweets. It is very easy to imagine the collaboration that streaming technologies and the Internet of Things will end up doing. All that data generated by IoT devices will need to be processed and streaming tools like Spark Streaming and/or Apache Storm will be there to do the job.

Conclusion:

   Spark was designed in a very intelligent way. Since it is newer, the architects used the learned lessons from other projects (mainly from Hadoop). The emergence of the Internet of Things is already producing a constant flow of large amounts of data. There will be a need to gather that data, process it and draw conclusions on it. Spark can do all of this and do it blindingly fast.

   IBM has shown a tremendous amount of interest and commitment to Spark. For example, they founded the Spark Technology Center in San Francisco, enabled a Spark-As-A-Service model on Bluemix, and organized Spark hackatons[8]. They also committed to train more than 1 million of data scientists, and donated SystemML (a machine learning technology) to further advance Spark’s development [9]. That doesn’t happen if an initiative doesn’t have support at the highest levels of the company. In fact, they have called it “potentially, the most significant open source project of the next decade”. [10]

   All this heralds a bright future for Spark and the related projects. It is hard to imagine how the project will evolve, but the impact it has already done in the big data ecosystem is something to take very seriously.

References:

[1] http://www.slideshare.net/Typesafe_Inc/sneak-preview-apache-spark

[2] http://es.slideshare.net/sbaltagi/spark-or-hadoop-is-it-an-eitheror-proposition-by-slim-baltagi

[3] https://spark.apache.org/faq.html

[4] http://www.spark.tc/projects/

[5] http://blog.ibmjstart.net/2015/07/14/seti-sparks-machine-learning-to-sift-big-data/

[6] http://blog.ibmjstart.net/2015/08/06/types-of-bigdata-from-the-allen-telescope-array/

[7] https://github.com/hackspark/Amber-Alert-Aid

[8] http://blog.ibmjstart.net/2015/06/29/why-is-ibm-involved-with-apache-spark/

[9] http://www.ibm.com/analytics/us/en/technology/spark/

[10] https://www-03.ibm.com/press/us/en/pressrelease/47107.wss

Share

Alan Verdugo / 2016/02/09 / Uncategorized / 0 Comments

Introduction to Raspberry Pi, OpenELEC and RetroPie

    Being such a good boy finally was worth it, and last Christmas Santa Briana gave me a Raspberry Pi 2 Model B. Not only that, but she went the extra mile and got me an Ultimate kit from Canakit, which includes many extra toys to play with. This is easily one of the best gifts I’ve ever received, and after I got it I knew that I will get more Raspberries sooner or later.

    If you have been living under a rock in Mars, I’ll briefly explain what a Raspberry is and why they are so popular with the hacker/maker/DIY scene. A Raspberry (besides a fruit) is basically just a tiny computer, about the size of a credit card. It doesn’t even come inside a case (you have to buy extra accessories separately, or buy a custom kit). Due to the small form factor. It does not have a lot of processing power, but this offers advantages like a low price, and low power consumption. All this makes it perfect to use it in education, electronics, clustering or embedded projects.

Yes, that is the whole thing.

Yes, that is the whole thing.

    After thinking a couple of minutes about what to do with it, I choose two initial projects: OpenELEC and RetroPie. The Ultimate kit comes with an 8GB micro-SD card that comes pre-installed with “NOOBS” (New Out Of the Box Software), an utility that installs Raspbian into the same micro-SD card so you are good to play with it. One of the best things with Raspberries is that you can switch micro-SD cards to instantly have a new OS or set of tools. So, I went ahead and bought two 32GB micro-SD cards.

raspbian

    Raspbian is an operating system based on Debian, optimized for use in a Raspberry. Together, Raspbian and a Raspberry provide more than enough resources for people who may want to learn Linux, programming, or just basic computer usage at a very low cost. I could see this being distributed to young kids as a way to encourage their education, in fact, this was the vision of the Raspberry creators. I know I would have killed to have access to something like this when I was younger. Raspbian does need some basic setup for things like setup the wifi, so you may need a keyboard, mouse, and an ethernet cable. Then you can connect remotely to the Raspberry. I was lucky to have a wireless keyboard that Avena brought from Japan, so I used that.

openelec

    OpenELEC (Open Embedded Linux Entertainment Center) is a beautiful attempt at creating a free, open source and very complete media center. It uses Kodi (formerly known as XBMC) to do the heavy lifting, which is another very mature open source project. It is not exclusive to run on Raspberries (although it offers an optimized image for them), so it can be installed on any hardware that you may not use anymore and convert it into a full-fledged entertainment center. I was very impressed with the results and I think it even surpasses things like Netflix, AppleTV or Google’s Chromecast. It has many features, like support for wide variety of video, audio and image formats, subtitles support (it can even connect to the internet and look for subtitles for the movies you are watching), auto-updates and many things that I haven’t discovered yet.

retropie

    RetroPie is an open source project aimed specifically to convert a Raspberry into the ultimate emulating machine. In the same way OpenELEC uses Kodi, RetroPie uses EmulationStation, which is, as you must guess by now, another open source project. You basically download a Debian-based image that includes pre-installed emulators for old games, install it into a micro-SD card, add your ROMs, and you are good to go. A lot of effort has been put into this project and it shows in how easy it is to use. After probably 30 minutes I was already playing my favourite old games with a controller in my HD screen. Since this project is specifically designed for standardized Raspberry hardware, any compatibility problems have been solved long ago. The majority of the games play flawlessly, maybe even better than on PCs, so far I’ve only had problems with a few N64 games and a few hacked ROMs for the SNES. I am eager to test it with Dreamcast games and see if the tiny Raspberry can handle them, however, I don’t think there should be any problems since it can natively run 3D games like Quake 3 Arena without any drop in fps. My old wired Acteck USB gamepad was recognized instantly. However, I wanted to play wirelessly so I got a TRENDnet Bluetooth (TBW-107UB) adapter and configured a Dualshock 3 to work with it. Now I can just turn on my Raspberry, turn on my TV, choose a game, and start playing. The only thing I regret is only getting a 32GB micro-SD card. I should have gotten a 64GB or even 128GB card (I got around 14,000 ROMs just for the SNES and they barely fit on my card). RetroPie was created by people who, like me, grew up with old videogames and wanted to enjoy them again as they were meant to be played, on a TV and with a gamepad, but not using a high-end PC for it. Those people and their passion have made this a very polished and complete project, which, for me, makes the Raspberry pay itself instantly.

Conclusion:

    Raspberries were a game-changer. The unexpected approach of selling a low-cost and low-resource hardware has been proven very successful and many have tried to replicate it. If you think about it, this is the opposite of the high-specs, high-price closed-source model that some companies like Apple have tried to shove down our throats for years. And this is good, some parts of the world and some economic sectors just need an easy and cheap way into computer literacy. In order to learn, people (specially kids) need a proving ground that they can break without the worry of wasting hundreds of dollars, and what better way than to leverage the open source projects for that. As a side effect, the FLOSS community (and with it, all of the IT industry) has been improved thanks to the Raspberry project, which in turn has spawned many software and hardware projects on its own. Any of the related projects (From Linux to Kodi, OpenELEC, RetroPie, EmulationStation, and the Raspberry) have an astounding quality and deserve any and all the contributions. As for me, I will think about what to do next, since Raspberries have a lot of potential, it is hard to decide what to do with them.

Share

Alan Verdugo / 2016/01/08 / Uncategorized / 0 Comments