Obtaining the Google Analytics Individual Qualification (GAIQ)

    A few days ago I got the Google Analytics Individual Qualification. This was one of my year’s resolutions so I am quite happy about it, but while I was researching and studying for it, I noticed there is sparse information and most of it is outdated. In this post I will explain the preparation and certification process and also I will share my thoughts on Google Analytics itself.

    Google Analytics is a free web-based tool designed to collect, analyze and report data generated from visitors and clients of websites. It is mostly focused on E-commerce, marketing and merchandising. At first I thought it was similar to IBM’s Watson Analytics, but I soon realized they only share similarities in the name itself and in the fact that they are both web-based tools.

    To prepare for the certification exam I took two courses: Google Analytics for Beginners and Advanced Google Analytics. Both of them are free, easy and quite short (a few hours at most). They include sets of videos, practice questions and interactive activities that are well designed and engaging. Even if somebody is not trying to get the certification, I would still recommend to take at least the first course since it is interesting and provides a glimpse of how Google and other companies track their users and potential customers, how they use the data to design marketing campaigns, tactics and strategies, and how they can understand the behavior of their own market. In this day and age, an E-commerce site is in a serious disadvantage if they do not understand their customers, and Google Analytics is the perfect tool to help with that. I really think that an online business that does not exploit its data’s potential would only be successful thanks to sheer luck.

    While I am not a marketing person, I understand the potential and I am very impressed with the amount of data and insights that Google Analytics is able to deliver. The tool is easy to setup and use even when it has so many options that it looks intimidating to beginners. It also can be linked to other Google services, like Adwords and Data Studio, to leverage the data even further.

    After studying both courses I took the certification exam and passed with flying colors. This means that I went from having minimal knowledge of what Google Analytics was, to being certified in a few hours scattered among 3 days, so you can tell it is not a hard certification if you prepare for it. As I said, this certification is not aimed at programmers or technical people, but much more focused on the marketing side of the business. The tool itself is tailored to be used by merchants, even including conversion goals, behavior analysis and other interesting options that marketing experts will enjoy. In fact, I am still looking for more technically-oriented options, like downloading the generated datasets or directly connecting to the Google Analytics database or API in order to programatically create more specific reports. I am not worried about not finding something like that since Google Analytics is such a mature product so I am sure there will be some way to do it, but if you are used to get your hands dirty with code, you may feel a little disappointed and patronized by the GUI.

    The passing grade for the certification exam is 80%. Once that is achieved, you receive a congratulatory email and an automatically-generated diploma, but that is it. I was really hoping for an actual certificate but all this was all free so I will not complain. The certification is valid for 18 months, and after that the exam has to be passed again to renew it.

    I do not understand why so few companies offer free training for their own products. After all, if people learn to use them and like them, they will use them and recommend them, so it is a win-win situation. Let’s hope Google continues to provide training and certifications in Google Analytics and other products, the whole process was interesting, engaging and easy to follow, even taking the exam did not required to use my webcam or other annoying measures so I will recommend this to anybody who could have even a minimum interest in online business or analytics.

Libros leídos en 2017

01 – “The life-changing magic of tidying up: The japanese art of decluttering and organizing”, de Marie Kondo.

    Durante el pasado viaje a Japón, una de las cosas que más me sorprendieron fue el extremadamente reducido espacio de los apartamentos en Tokyo, pero me sorprendió aún más la organización que se requiere para vivir en ellos. Sospeché que hay algo inherente en los japoneses que los hace ser lo suficientemente organizados y eficientes como para acomodar toda su vida en unos pocos metros cuadrados. Leí este libro intentando de identificar la manera en que lo logran.

    El método “Konmari” garantiza que una vez organizadas las posesiones, nunca se volverá a tener una casa desordenada. Básicamente, esto es: Deshacerse de cosas inútiles, y asignar cada cosa a un lugar específico. Suena muy simple, pero noté que el método hace énfasis en que sus seguidores asuman responsabilidad de sólo comprar cosas útiles, y en esto se basa su éxito. El libro es bastante corto, bastante repetitivo, y parece más un largo artículo de revista para amas de casa, así que sólo lo recomendaría a gente tan desorganizada como yo.

 

02 – “His master’s voice”, de Stanisław Lem.

Betrayal is the result of conscious decision, but what causes us to be drawn to destruction? What black hope, in destruction, beckons man? Its utter inutility rules out any rational explanation. This hunger has been suppressed in vain by numerous civilizations. It is as irrevocably a part of us as two-leggedness. To him who seeks a reason but cannot abide any hypothesis of a design, whether in the form of Providence or of the Diabolical, there remains only the rationalist’s substitute for demonology – statistics. Thus it is from a darkened room filled with the smell of corruption that the trail leads to my mathematical anthropogenesis. With the formulae of stochastics I strove to undo the evil spell. But this, too, is only conjecture, therefore a self-defensive reflex of the mind.

    Pasé años buscando este libro y lo encontré justo el día en que terminé el libro anterior. Este libro toma el nombre de una discográfica que usaba esa famosa pintura del perrito escuchando la voz de su amo en un gramófono de manera atenta y confundida. Este nombre es sumamente adecuado pues el libro cuenta cómo los humanos reciben una transmisión extraterrestre y tratan de descifrar el mensaje. No dudo que si algo así pasara, toda la raza humana tendría esa misma expresión perpleja e inocente del perrito.

    Stanisław Lem se centra en las fascinantes repercusiones científicas que un evento como este tendría. Científicos de muchas áreas son convocados para tratar de descifrar el mensaje y cada uno ofrece una explicación diferente. Es decir, el libro se enfoca casi al 100% en la parte de la ciencia en “ciencia ficción”. Por un momento creí que un libro sobre el primer contacto que no involucrara directamente a los alienígenas sería aburrido, pero rápidamente noté que, irónicamente, los alienígenas por si mismos serían la parte menos interesante de un suceso así.

    Los científicos del proyecto se plantean una infinidad de cuestiones acerca de su tarea, y, de la misma manera, el lector inevitablemente termina cuestionándose esas y otras más, como por ejemplo, ¿qué tipo de mensaje debería enviar la humanidad a otras civilizaciones? ¿quién sería el autor? ¿en qué lenguaje? ¿cuál sería el medio? ¿cómo lograr que el mensaje sobreviva un ambiente tan hostil como el espacio? ¿que tan apropiados fueron los mensajes que ya hemos enviado?

    Enviar una carta abierta a todo el universo representa una problemática tan difícil como interesante. Por una parte, no se puede asumir nada sobre los posibles receptores del mensaje, pues hacer suposiciones de cualquier aspecto podría representar que ciertas civilizaciones sean incapaces de interpretar el mensaje. Por ejemplo, enviar imágenes de cualquier tipo sería una mala idea pues podría haber formas de vida que ni siquiera tengan ojos y que únicamente interactúen mediante el tacto, el sonido, el olfato u otras maneras, e incluso si tuvieran ojos ¿qué nos garantiza que pueden percibir la luz en el mismo rango del espectro que nosotros? De igual manera, no podemos asumir que usan un sistema de numeración decimal, pues tal vez los receptores evolucionaron para tener 8 dedos y por lo tanto seguramente desarrollaron un sistema octal. O tal vez ni siquiera tienen dedos. El mensaje tendría que usar los elementos más básicos y comunes en el universo, como la velocidad de la luz, pero eso nos lleva a otro problema: ¿cómo comunicar unidades de medición? Los receptores no tendrían idea de qué es un kilómetro o una milla.

    Podría estar hablando por días sin descanso sobre este tema que me parece tan interesante, pero creo que lo mejor sería recomendar el libro, pues Stanisław Lem lo hace mucho mejor que yo.

 

03 – “The panda’s thumb”, de Stephen Jay Gould.

    Ya me han recomendado este libro varias veces y por fin pude conseguirlo. Es, en cierta manera, parecido a The selfish gene, de Richard Dawkings, que leí el año pasado. De hecho, un capítulo de The panda’s thumb es dedicado a criticar la teoria de Dawkings, y no muy favorablemente. He de decir que este libro me dejó un poco decepcionado, y tal vez sea porque esperaba demasiado de él. The panda’s thumb me dió la sensación de estar leyendo un libro de texto de la escuela más que algo un poco más accesible y entretenido (como el libro de Dawkings o los de Stephen Hawking) pero tampoco llega a tener el estilo tan delicioso de Stanisław Lem, de sumergirse en párrafos totalmente incomprensibles. The panda’s thumb existe en esa área incómoda donde es tal vez muy avanzado para la gente común (como yo) pero hasta cierto punto simple para los demás.

    Cada capítulo es un ensayo diferente sobre algún aspecto de la evolución, Sin embargo, creo que cada capítulo se siente muy distante de los demás y eso hizo que sintiera que el libro no tuviera una idea central. Hay capítulos muy interesantes y otros que yo hubiera dejado fuera, como uno que se centra en el drama casi adolescente de ciertos antropólogos al discutir si ciertos fósiles son verdaderos o si fueron falsificados por ese otro antropólogo “que me vio feo en aquella convención de antropólogos”.

    Afortunadamente, todo se remedia en los últimos capítulos. En uno de ellos se habla sobre bacterias que han evolucionado componentes magnéticos en sus cuerpos para lograr discernir entre arriba y abajo. Estas bacterias se vieron en la necesidad de evolucionar de esta manera pues son tan pequeñas que la gravedad no tiene un efecto considerable sobre ellas, así que es imposible que recurran a la gravedad para desplazarse o darse dirección.

    En el último capítulo se habla sobre pruebas antropológicas para la variación de la velocidad de rotación de la Tierra. Por ejemplo, los anillos encontrados en los minerales de ciertos fósiles animales prueban que, en el pasado distante, el movimiento de rotación terrestre ocurría con mayor frecuencia. lo cual, obviamente, tiene repercusiones tanto en la astronomía como en la antropología y la biología.

    Además, aprecien esa grandiosa portada.

 

04 – “Me talk pretty one day”, de David Sedaris.

When asked “What do we need to learn this for?” any high-school teacher can confidently answer that, regardless of the subject, the knowledge will come in handy once the student hits middle age and starts working crossword puzzles in order to stave off the terrible loneliness.

At the end of a miserable day, instead of grieving my virtual nothing, I can always look at my loaded wastepaper basket and tell myself that if I failed, at least I took a few trees down with me.

    Después de los libros anteriores, realmente necesitaba una lectura mucho más ligera, un libro más relajado que pudiera leer tranquilamente por el simple placer de leer. Esto es una recopilación de las graciosísimas vivencias del autor, desde su vida escolar hasta su mudanza a Paris, en donde vivió mucho tiempo sin hablar ni una sola palabra de francés. Este es uno de esos libros que causan risas espontáneas en lugares públicos, inevitablemente acompañadas de las miradas de curiosidad y preocupación de las personas presentes.

 

05 – “Context: further selected essays on productivity, creativity, parenting, and politics in the 21st Century”, de Cory Doctorow.

The futurists were just plain wrong. An “information economy” can’t be based on selling information. Information technology makes copying information easier and easier. The more IT you have, the less control you have over the bits you send out into the world. It will never, ever, EVER get any harder to copy information from here on in. The information economy is about selling everything except information.

    Como todos los libros de Cory Doctorow, Context puede ser bajado libremente desde su página oficial. El libro es una colección de ensayos muy interesantes sobre temas como Digital Rights Management, los libros electrónicos, y la legalidad de la distribución libre de archivos. Por si no se lo imaginan aún, Cory Doctorow es una especie de ciber-activista (algo así como un Richard Stallman un poco menos gruñón).

    Hay algunos ensayos muy interesantes incluso para gente que está hasta cierto punto aislada de la tecnología, la cual no es mucha pues, como se explica en uno de los mismos ensayos, la tecnología nos impacta a todos, lo queramos o no. Algunos de estos temas ya los he abordado en este mismo blog y me pareció muy interesante ver que alguien más llegó a conclusiones parecidas o incluso idénticas.

    Definitivamente recomiendo este libro a cualquiera, cuando menos para que se entienda más como muchas compañías, e incluso gobiernos, juegan con los derechos de los consumidores como si fueran piezas de ajedrez.

 

06 – “Unikernels”, de Russell C. Pavlicek.

    Este es un libro muy corto que se podría ver como un ensayo introductorio a los unikernels. Un unikernel, básicamente, es un paquete de software el cual incluye únicamente una aplicación y un kernel. Este paquete resultante se utiliza como si fuera una máquina virtual, pero con varias ventajas. Por ejemplo, el tamaño de los unikernels frecuentemente ronda los kilobytes. Esto significa que son increíblemente veloces, lo cual naturalmente repercute en ahorros de tiempo de procesador, espacio en disco, RAM y eventualmente consumo eléctrico.

    Las tecnologías de servidores siempre han tendido a el ahorro de recursos. En el principio, se usaban servidores bare metal que significaban una inversión inicial considerable, además de gastos recurrentes en mantenimiento y seguridad. Al avanzar las capacidades del hardware, se pudieron instalar servidores virtuales pequeños dentro de los bare metal, se les llamó virtual machines. Así comenzó toda la fama del “Cloud”. Hoy en día, la tendencia es usar contenedores que permiten aislar aplicaciones y servicios para un manejo más eficiente y organizado.

    La idea de los unikernels es muy parecida a los contenedores. No es de extrañarse entonces que Docker se interesara en esta nueva tecnología y ya esté invirtiendo mucho dinero en ella. Por mi parte, siento que los unikernels prometen muchas cosas muy interesantes, pero la tecnología está aun en sus primeras etapas. Basta con decir que hay que compilar el unikernel antes de poder usarlo. Una vez que la idea y las herramientas sean más maduras y accesibles, creo que definitivamente serán parte integral de la infraestructura. La idea que más me pareció interesante fueron los Transient services. Básicamente, el tamaño tan reducido de un unikernel permite que sea iniciado en milisegundos, y esto permite que el unikernel inicie únicamente cuando se hace una petición de alguno de sus servicios para después volver a apagarse. Con esto se acabaría el uso permanente de servidores que sólo son usados esporádicamente, lo cual, de nuevo, repercutiría en ahorros de muchos tipos de recursos, tanto en términos de hardware, como monetarios y de energía eléctrica.

    La falta de otras aplicaciones en un unikernel también significa que hay menos riesgos de seguridad. Un atacante no podría explotar vulnerabilidades en software que reside dentro del unikernel ya que en general el unikernel ni siquiera tendría un shell o soporte a multiusuarios.

 

07 – “The girl with the dragon tattoo”, de Stieg Larsson.

    Este libro tiene el peor inicio que he leído en algún libro. Básicamente todo el primer capítulo se trata de dos personas hablando de préstamos empresariales hechos por el gobierno sueco. Tuve que leer aproximadamente el 50% del libro para que realmente las cosas interesantes comenzaran a ocurrir. Lo único que logró que continuara con la lectura, fueron las recomendaciones que ya varias personas me han hecho sobre este libro. Eso, y el hecho de que lo compré hace más de 6 años y nunca lo había leído.

    Tenía la idea de que este libro era una novela cyberpunk, y definitivamente sí hay algo de hacking en la trama, pero por alguna razón creía que esto iba a ser parecido a Neuromancer, y definitivamente no es así. Básicamente esta es una novela de crimen y misterio. Mikael Blomkvist, un escritor e investigador financiero, es contratado para investigar la desaparición de la heredera de uno de los imperios industriales más grandes de Suecia, sin embargo, la desaparición ocurrió hace más de 50 años y nunca se ha sabido nada más de ella. Eventualmente, Lisbeth Salander, una jóven hacker prodigio, se une a la investigación y junto con Blomkvist, descubren muchas cosas que no deberían.

 

08 – “Good omens”, de Terry Pratchett y Neil Gaiman.

God does not play dice with the universe; He plays an ineffable game of His own devising, which might be compared, from the perspective of any of the other players, to being involved in an obscure and complex version of poker in a pitch-dark room, with blank cards, for infinite stakes, with a Dealer who won’t tell you the rules, and who smiles all the time.

“I don’t reckon it’s allowed, going round setting fire to people”, said Adam. “Otherwise people be doin’ it all the time.”

He had heard about talking to plants in the early seventies, on Radio Four, and thought it an excellent idea. Although talking is perhaps the wrong words for what Crowley did. What he did was put the fear of God into them.

If you sit down and think about it sensibly, you come up with some very funny ideas. Like: why make people inquisitive, and then put some forbidden fruit where they can see it with a big neon finger flashing on and off saying ‘THIS IS IT!’?

    El libro del apocalipsis es probablemente la historia más adaptada y parodiada, especialmente por Hollywood. Aquí, Terry Pratchett y Neil Gaiman unen fuerzas para contar la historia del jóven anticristo, que tras un complot orquestado por un ángel y un demonio, crece en una familia humana de lo más normal, totalmente ignorante de su naturaleza demoniaca. Todo esto en un intento de que el fin del mundo nunca ocurra, pues incluso los ángeles y los demonios se han acostumbrado al status quo de relativa paz entre ellos. Pero siempre hay individuos que tratan de acabar con el mundo en cada oportunidad que se les presente, así que termina ocurriendo una mezcla de facciones como los ya mencionados ángeles y demonios, los amigos humanos del anticristo, los jinetes del apocalipsis (que ahora son una banda de motociclistas), una antigua sociedad de cazadores de brujas y una adivinadora que tiene el único libro que predice todos los eventos del apocalipsis de forma precisa.

    Leí este libro porque esperaba un humor inglés como el de Douglas Adams, pero la verdad es que no resultó tan bueno como esperaba. Tal vez tenía espectativas demasiado altas debido a la fama de los autores. Para algo de comedia, preferiría recomendar al ya mencionado Douglas Adams, o al segundo libro de este año: “Me talk pretty one day”.

 

09 – “Freakonomics”, de Steven Levitt y Stephen J. Dubner.

Information is a beacon, a cudgel, an olive branch, a deterrent-all depending on who wields it and how.

The day that a car is driven off the lot is the worst day of its life, for it instantly loses as much as a quarter of its value.

Evolution, it seems, as molded our brains so that if you stare at your own baby’s face day after day, it starts to look beautiful.

    Un economista y un editor del New York Times escriben sobre los diversos factores que influyen en problemas cotidianos y demuestran todas sus conclusiones con datos, hechos y estadísticas. Por ejemplo, explican cómo fue que la legalización del aborto inesperadamente redujo el crimen en los Estados Unidos. Las similitudes entre el Ku Klux Klan y los trabajadores de bienes raices, y la finanzas de los narcomenudistas de Chicago o por qué una piscina es mucho más peligrosa que un arma de fuego. Además de la influencia, positiva y negativa, de los nombres en la personalidad de la gente y sus prospectos laborales. La mejor manera de resumir a Freakonomics sería con la antigua frase “Los números no mienten”, aunque le añadiría “…pero nuestra percepción de los números sí lo hace”.

    A pesar de que hay mucha gente que está en desacuerdo con las conclusiones de Levitt y Dubner, es casi imposible estar en desacuerdo con las cifras. He leído comentarios negativos de este libro, pero estos tienden a ser referentes a las cuestiones éticas que el autor sugiere y no le han agradado a mucha gente, como por ejemplo, las ventajas del aborto, como mencioné anteriormente. Este libro sólo incrementó el interés tan grande que últimamente he desarrollado por el análisis de datos.

 

10 – “Thinking, fast and slow”, de Daniel Kahneman.

This is your System 1 talking. Slow down and let your System 2 take control.

For some of our most important beliefs we have no evidence at all, except that people we love and trust hold these beliefs. Considering how little we know, the confidence we have in our beliefs is preposterous-and it is also essential.

Nothing in life is as important as you think it is when you are thinking about it.

    El autor de este libro ganó el premio Nobel de economía y el libro por si mismo ha ganado varios premios. Comencé a leer este libro sin realmente saber de qué se trataba, pero rápidamente pude darme cuenta que en cierta manera está muy relacionado con Freakonomics.

    Kahneman propone que la mente humana utiliza dos sistemas para tomar decisiones. El sistema 1 es rápido pero ineficiente y se le atribuyen palabras como “instinto”, “intuición”, o “sexto sentido”. El sistema 2 es lento, analítico y generalmente surge únicamente cuando es momento de tomar decisiones muy importantes.

    Si Freakonomics analizó los datos duros y la evidencia derivada del análisis de datos, Thinking fast and slow utilíza la evidencia para explicar el comportamiento humano, sobre todo cuando este comportamiento es erroneo, contraproducente o simplemente iluso. Prácticamente en cada página se muestran los resultados de algún test psicológico que demuestra lo poco confiable que el sistema 1 puede llegar a ser, pero que esto se equilibra con la rapidez de reacción que simplemente no es posible usando el sistema 2.

    Justo como The panda’s thumb, este libro me resultó algo pesado de leer debido a su naturaleza más científica. Definitivamente no estoy acostumbrado a leer libros de divulgación científica (lo cual es algo que busco corregir), pero esto no significa que ninguno de estos dos libros sea malo, de hecho, me parecen dos excelentes opciones para adentrarse en este tipo de temas.

 

11 – “Flowers for Algernon”, de Daniel Keyes.

Solitude gives me a chance to read and think, and now that the memories are coming through again-to rediscover my past, to find out who and what I really am. If anything should go wrong, I’ll have at least that.

Whatever happens to me, I will have lived a thousand normal lives by what I might add to others not yet born. That’s enough.

Who’s to say that my light is better than your darkness? Who’s to say that death is better than your darkness? Who am I to say?…

    Charlie Gordon tiene un IQ de 68, 32 años y la edad mental de un niño pequeño. Charlie es seleccionado para ser el primer sujeto de pruebas humano de una operación experimental que incrementa la inteligencia drásticamente. El único otro sujeto de pruebas ha sido Algernon, un ratón de laboratorio que ahora tiene una inteligencia comparable a la de un humano. Este libro ganó el premio Nébula de 1966 y la historia corta en la que fue basado ganó el premio Hugo de 1960. Está escrito en forma de una bitácora que Charlie escribe para medir su progreso antes y después de la operación.

    Debo admitir que antes de leer este libro, sentía una profunda tristeza al ver a cualquier persona con cualquier tipo de dificultad mental, sin embargo, ahora entiendo que es muy probable que a ellos no les importe o que incluso sean mucho más felices que alguien sin su condición. Definitivamente más inteligencia no necesariamente trae consigo más felicidad, y el autor nos obliga a debatirnos si la ignorancia y la inocencia son preferibles para ciertas personas. El autor analiza la extrema fragilidad de la mente humana y los problemas mentales y sociales que esta fragilidad conlleva. En la época en la que fue escrito, este libro claramente pertenecía a la ciencia ficción, pero hoy en día técnicas como CRISPR nos hacen plantearnos seriamente decisiones éticas que podrían eventualmente afectar a personas como Charlie. Por otro lado, este fue un libro sumamente desesperante, frustrante y aterrador pues me hizo afrontar uno de mis peores miedos al cuestionarme qué pasaría conmigo si llegase a tener un decremento acelerado de la inteligencia y perdiera el control y el contacto con la realidad. Este libro me hizo apreciar aún más lo poco que mi mente me ha dado y es por esto que lo recomiendo ampliamente.

Correlations between ruling political parties and journalist assassinations

Abstract

    This research uses a dataset provided by the Committee to Protect Journalists in order to analyze the number of journalists’ deaths in Mexico and the US from 1992 to 2016, it compares both results, and tries to find out if the ruling political party is regarded as an important factor that could be the root cause of said deaths. The analysis points out that the ruling political party cannot be directly linked to the cause of the deaths, but the political strategies implemented by the government could be related indirectly.

Motivation

    Mexico is widely regarded as one of the most dangerous countries for journalists[1], even comparing to countries in a state of war. Added to that, the mexican government is also famous for being corrupt and untrustworthy, either by bribing the media or by threatening it. This has happened since the 1910s[1], and the involvement of Mexico in illegal drug traffic has only added to the dangers the journalists face.

    Since the mexican government has been known to bride or to threaten journalists, I wanted to get actual data that could show a correlation between the ruling political parties of the past and the amount of violence towards media workers. This could help the mexican citizens to make a decision before the presidential election of 2018.

Dataset

    The dataset that will be used is a comma-separated file provided by the Committee to Protect Journalists (https://cpj.org). It contains data about journalists assassinations committed from 1992 to 2016. The dataset contains 1782 records with 18 variables: Type, Date, Name, Sex, Country_killed, Organization, Nationality, Medium, Job, Coverage, Freelance, Local_Foreign, Source_fire, Type_death, Impunity_for_murder, Taken_captive, Threatened, Tortured.

    The dataset can be downloaded from Kaggle.com: https://www.kaggle.com/cpjournalists/journalists-killed-worldwide-since-1992

Data preparation and cleaning

    Some of the records did not have a date for the death of the journalist (it was either labeled “Unknown” or “Date unknown”), this prevent me from assigning it to a year and thus to a presidential administration, so, sadly, I had to ignore them. Besides, the date format was relatively hard to parse since it was not in a very standard format. For example, the date was in the form “February 9, 1998”, instead of the much more international and easy to use “1998-02-09”.

Research questions

    Is there a correlation between the ruling political party and the number of journalists assassinations?

    Can we identify a corrupt government by analyzing the acts of violence against journalists occurred during its administration?

Methods

    Using the Pandas library in Python, I filtered all records that belong to the interesting countries, then I grouped them by year and built charts using Matplotlib according to the length of each presidential administration. This allowed me to show visualizations that can easily convey any increase or decrease of journalist assassinations by the ruling political party in each period. After that, I built a pie chart showing the distribution of the “Source_fire” variable, which provides a better idea of the reason behind the assassination. This could help to understand if the death was a deliberate targeted attack on the journalists or if it could be regarded as a work-related accident.

    The entire script that processed the data and generated the visualizations can be found here: https://github.com/alanverdugo/journalists_deaths_analysis/blob/master/cpj.py

Findings

Figure 1: Number of journalists killed in Mexico.

Figure 1: Number of journalists killed in Mexico.

    In this chart we see the two parties that have been in power in Mexico (in different colors). An increase in journalists’ deaths occurred during the mid 2000s, then appeared to decrease but now seems to be increasing again.


Figure 2: Number of journalists killed in USA. Figure 2: Number of journalists killed in USA.

    For comparison purposes, this is the same chart, but using US data. It can be seen that the US is much safer for journalists, and that there is no clear correlation between the political party in power and journalists’ deaths.


Figure 3: Source of fire for journalists deaths in Mexico.

Figure 3: Source of fire for journalists deaths in Mexico.

    This is a pie chart of the source of the fire that caused the journalists deaths in Mexico. We can clearly see that most of the deaths were caused by criminal groups (very probably drug cartels).

Limitations

    The dataset is fairly small. This is one of the rare cases where not having a lot of data is a good thing (after all, even a single assassination is a tragic event). However, the relative low number of deaths makes it hard to safely find patterns or correlations. Due to the mystery and inherent danger behind some of the deaths, it may be probable that many of them are not reported to the authorities and even then, acts of corruption could hinder the reach or veracity of the data. In other words, we may probably be working with incomplete data.

Conclusions

    Practicing journalism in Mexico has been, and still is, a dangerous activity. Compared against other countries, we can see the relative dangerous situation that journalists located in Mexico experience every day. A change in the mexican political status quo did not solve the problem, in fact, it appeared to have increased it. This could mean that just changing the political party in power is not enough and that serious strategic changes in security, transparency and drug-related politics need to be done in order to ensure the safety of the journalists and of the mexican citizens in general.

    The war on drugs military campaign that started in 2006 was one of the main triggers for the increased amount in violence during the late 2000s[2]. Drug cartels fought against the military and against each other for the control of the territories. However, that does not mean that journalists did not experience attacks before the war on drugs or that they will not experience them in the future. It is unknown how many bribes or threats the journalists receive from corrupt officials or from criminal organizations, so these findings should not be regarded as definitive.

    The geo-political and socio-economic situation of each country is also a complex subject that cannot be fully grasped using such a small set of data. For these reasons, a more complete analysis should be conducted to safely identify the possible correlation between a country ruler and the acts of violence towards the media.

References

  1. List of journalists and media workers killed in Mexico. (2017, November 28). In Wikipedia, The Free Encyclopedia. Retrieved 17:38, December 9, 2017, from https://en.wikipedia.org/w/index.php?title=List_of_journalists_and_media_workers_killed_in_Mexico&oldid=812565094
  2. Timeline of the Mexican Drug War. (2017, December 4). In Wikipedia, The Free Encyclopedia. Retrieved 04:52, December 9, 2017, from https://en.wikipedia.org/w/index.php?title=Timeline_of_the_Mexican_Drug_War&oldid=813681343
  3. Mexican Drug War. (2017, December 4). In Wikipedia, The Free Encyclopedia. Retrieved 04:52, December 9, 2017, from https://en.wikipedia.org/w/index.php?title=Mexican_Drug_War&oldid=813724408
  4. List of Presidents of the United States. (2017, December 7). In Wikipedia, The Free Encyclopedia. Retrieved 05:09, December 8, 2017, from https://en.wikipedia.org/w/index.php?title=List_of_Presidents_of_the_United_States&oldid=814280717

Portable Stream and Batch Processing with Apache Beam

    I had the opportunity to attend another one of the Wizeline’s academy sessions. This time, it was about Apache Beam, a Batch and Stream processing open source project. Wizeline brought three instructors from Google itself to explain what Beam is, how to use it, and its main advantages over their competitors. All three instructors had impressive backgrounds and were very nice and open to comments and questions from a group of students that only had basic knowledge on the subject.

    Davor Bonaci’s explanations were particularly useful and interesting. He has a lot of experience talking in conferences about these topics and it shows. He was able to clearly explain such a complex technology in a way that could be understood by anyone while still ensuring that the we also understood the huge potential in Beam.

 

    There were three concepts that I found extremely interesting:  

    Windowing: In streaming, we will eventually receive data that is out of time in the context of when it was generated and when we received other data. The concept is useful to define how lenient we will be with this “late” data. We will have an easy way to specify categories of this data and group them according to our business rules.

    An example of this would getting records at 12:30 that were created at 12:10. At that point in time, we should be only processing records that were created in the last few minutes (or even seconds, according to your needs). However, that late record could be crucial for our processing and we need to find a way to discern if we keep it or if we ignore it completely. With windowing, we can achieve this.

    Autoscaling: This is probably the holy grail of IT infrastructure management. The ideal scenario would be to have a “lean infrastructure”. One that, in any given moment, only has the exact amount of processing power that is needed, no more and no less. However, thanks to the global nature of the Internet and the variations in usage according to time zones and seasons, it is practically impossible to achieve this. Resources are either over or under-allocated, the first option means wasting at least some of the infrastructure (and hence money). The second means to not be able to handle usage spikes when they occur (and they will occur eventually if you are doing things right).

    As the name implies, Autoscaling attempts to let the infrastructure grow organically when and how it needs to, and to reduce it once it is not needed anymore. This obviously has huge benefits, like having the peace of mind of knowing that the infrastructure can take care of itself but also knowing that servers will not be over-provisioned carelessly. The cloud only needs to be properly orchestrated according to the data processing needs, and we can finally deliver this. I can only imagine what would be possible once this is combined with the power of containers or even Unikernels and their transient microservices.

    You can read more about Autoscaling here: https://cloud.google.com/blog/big-data/2016/03/comparing-cloud-dataflow-autoscaling-to-spark-and-hadoop

    Dynamic workload rebalancing: When a processing job is created, it is (hopefully) distributed equally across all the nodes of a cluster, however, due to subtle differences in bandwidth, latency and other factors, some nodes always end up finishing their assignment later, and at the end of the day, we cannot say that the job is finished until the last of these stragglers is done. In the meantime, there will be many nodes that are idle, waiting for the stragglers to finish. Dynamic workload rebalancing means that these idle nodes will try to help the stragglers as much as possible, and in theory, this will reduce the overall completion time of the job. This, coupled with Autoscaling, could mean that the waste of resources is minimum.

    You can read more about dynamic workload rebalancing here: https://cloud.google.com/blog/big-data/2016/05/no-shard-left-behind-dynamic-work-rebalancing-in-google-cloud-dataflow

 

    One student asked if it would be worth it to study Beam and forget about Spark or the other platforms. It may sound like a simple question but it is something we all were thinking. Davor’s response was great. He said that whatever we studied, our main focus should be in writing code and building infrastructure that is able to scale regardless of the platform that we wish to use. Beam is not a Spark killer, they have different approaches, different methodologies, and the people working in these projects have different ambitions, goals, and beliefs. Besides, the community keeps evolving, the projects will continue to change, some will be forgotten and new ones will be created. There is a huge interest in data-processing tools due to the increased speed and volume of our data needs, which will only keep increasing. Because of that, this part of IT is experiencing violent and abrupt growing pains. I just don’t think that right now we can settle and learn just one technology since it may disappear (or change dramatically) in the very near future.

    There are some things that can still be improved in Beam. One example is the interaction with Python and Spark, another would be making it more user-friendly, but there is a group of smart people that is quickly tackling these issues and adding new and great features, so it would be a good idea to keep learning about Beam and to consider it for our batch and stream processing needs.

    Overall, I really enjoyed the workshop. As I mentioned before, all three instructors were very capable and had a deep understanding of the technology, its use cases and its potential. Besides it was really enjoyable to talk with them about the current state and the future of data processing. I will certainly keep paying attention to the Beam project.

    I would like to thank Davor, Pablo and Gris from Google, and all the team behind the Wizeline academy initiative.

Analyzing movie rating data from an IMDB.com dataset using Python, Pandas and Matplotlib

    Since the dawn of cinema, the quality and enjoyment produced by motion pictures has been a complicated and controversial subject. An entire sub-industry has been created to review, criticize, recommend, analyze, categorize and rate movies. This, added to the subjective nature of each individual likes and dislikes has resulted in mixed experiences and expectations for the public. Some movies that are regarded as timeless classics by some people are seen as boring or even as bad movies by other people. The passing of time and the recent heavy use of special effects and CGI in movies also affect how the movies will be regarded in a few years, when those special effects look outdated.

    However, we may find a general trend of increased satisfaction or dissatisfaction if we analyze a large number of movie ratings.

    The code I wrote for this analysis is available in my GitHub repository.

Research question

    Since many critics refer to their favorite period as the best era that cinema has to offer (or alternatively, that the movie quality is in decline), we will attempt to answer the following question:

Has the perceived quality of movies increased or decreased over time?

    For any answer we may find, we will demonstrate and provide a reason behind it in the form of data and its visualizations.

Findings

    Using an IMDB.com dataset, I analyzed 45,844 movies and 26,024,290 ratings for said movies. The oldest movie in the dataset was launched in 1874 and the newest in 2017.

    I grouped the movies by launch year and calculated the average rating for the movies launched in every year. Doing this, I wanted to get an idea of the overall quality of the movies trough time.

    While the technical aspect of motion pictures have obviously advanced thanks to new technology, it was not clear if this also improved the overall quality of the movie. In the next image I present a chart of the relation between launch year and average rating for the movies launched in each year.

Fig. 1 - Movie average rating per year.
Fig. 1 – Movie average rating per year.

    Some interesting facts I got from this analysis:

  • The initial period (from 1874 to around 1915) is chaotic and experimental. A film could be under a minute long. There was little to no cinematic technique, the film was usually black and white and it was without sound.
  • In the 1920s, begins a process of normalization. Movies are more popular and attainable. The public seem to have learned what to expect from directors and actors by this time. The primary steps in the commercialization of sound cinema were taken in the mid-to late 1920s, this could have helped to this normalization.
  • 2014 was a relatively disastrous year for cinema. The average rating for this year is 2.95. (The lowest since 1917, which is still part of the “experimental period”). The causes for this are beyond this analysis, but I will remind the reader that in 2014 we got movies like Transformers: Age of extinction and Left behind which currently has a 1% score in rottentomatoes.com
  • In an scale from 0 to 5, The average tends to be slightly above 3. There is no noticeable increment or decrement from this average in the last century. So, to answer the research question, we cannot said that the quality of cinema has increased nor decreased substantially.

References

  1. History of film. (2017, November 26). In Wikipedia, The Free Encyclopedia. Retrieved 04:03, November 29, 2017, from https://en.wikipedia.org/w/index.php?title=History_of_film&oldid=812220038
  2. Sound film. (2017, November 28). In Wikipedia, The Free Encyclopedia. Retrieved 04:04, November 29, 2017, from https://en.wikipedia.org/w/index.php?title=Sound_film&oldid=812587250