Stories by Christophe Bourguignat on Medium

Story of a Data Tweet

Christophe Bourguignat — Sun, 08 Nov 2020 21:01:43 GMT

This recent tweet, with its striking chart, attracted a very large audience:

In a nutshell, it suggests that contrary to what journalists and political leaders explain, there is no particular recent acceleration of the pandemic in France, the trend is the same since months — there is no surprise, and it was predictable.

Indeed the illustration shows a curve that looks linear and predictable since 5 months (the author is a scientist, Leonard Blier, a PhD student at Facebook research and Inria Paris).

An that’s also what several comments highlight:

Wait a minute…

If you look carefully, the tweet mentions the ‘daily new cases’. But the curve represents the ‘daily new confirmed cases’.

What is the difference? No one knows exactly the ‘daily new cases’. What we can measure is the ‘daily new confirmed cases’, and it of course depends on the number of daily tests, that have soared since a few months.

Thats what several comments pinpoint:

Wait a second minute…

What’s more the log scale of the graph (indeed a very good way to represent an exponential phenomenon) can be misleading, and hide the fact that the acceleration is not steady since months, contrary to what we can understand with the initial tweet.

Here is what would have looked the tweet with a linear scale (source)

Wait a third minute…

Whatever the representation (linear or log scale), at first glance it seems that things are looking like behaving as purely exponential — so predictable.

However it is not, as some comments point out. If you look carefully, a few weeks before the tweet (end of september), things looked like going better, hence maybe leading to a government hesitation on when deciding a new lockdown.

This is also visible on other indicators like hospital admissions.

Even on the original tweet we can see it:

This plateau comes also from the fact that the numbers of tests have been reduced during a few weeks (source).

Clarification

One day after the initial tweet, and given the different feedbacks, Léonard Blier had the good idea to complete its initial tweet with further constructive details and clarifications.

This is an other great example of our political and society debates of tomorrow, magnified by social networks: they will need an increasing understanding of data science and tech.

Educating citizens on data and crypto basics

Christophe Bourguignat — Wed, 28 Oct 2020 21:41:15 GMT

In France, the national Covid-19 tracing app is called TousAntiCovid.

Unlike the tracing apps of other similar European countries, TousAntiCovid doesn’t use the Google / Apple tracing API, but a dedicated protocol instead.

I read many arguments against this app, including:

Some argue that it goes against their privacy rights, and that they don’t want their data to be collected by the government — as if the app was designed without any guaranty of anonymization
Other would have preferred the government to decide to use the Google / Apple tracing API, for a more seamless integration on their device, and compatibility with other countries — as if being dependent of private big techs, and France not being able to analyze the anonymized raw data to help manage the crisis, was not an issue

TousAntiCovid seems to be a simple app for citizens, but elaborating an informed judgment about it is complex, needing at least a basic understanding of:

cryptography: how works encryption, security protocols, privacy by design, anonymization, …
big data: how works machine learning, why owning and processing data at scale is so strategic, and how for example big techs or social networks do it

That’s symptomatic of the new kind of political and society debates we will have to address more and more: they need a bunch of technology understanding — future generations will have to learn it early at school.

A short history of Matt Turck Data and AI landscape

Christophe Bourguignat — Sun, 04 Oct 2020 08:38:02 GMT

Six years ago, in 2014, Matt Turck (investor at FirstMark Capital) published the first version of it mythical yearly landscape: The state of big data.

2014 Big Data Landscape

At this time, the term big data was trending, but machine learning, deep learning and AI were not yet fashionable.

Dataiku, the France born unicorn, was not yet spotted there. I remember I hacked the landscape to supercharge it with french startups, and called it the FrenchData Landscape.

2014 FrenchData Landscape

And here comes the 2020 edition, now branded Data & AI Landscape.

2020 Data & AI Landscape

Quite a crowded one!

And guess what? Zelros makes its entry in the Insurance category. I’m very proud to see this logo here — in this landscape that inspired me since almost a decade!

In 2017, the landscape started to identify groups by industry, showing the beginning of verticalization of AI applications. In particular, a dedicated insurance category appeared, already including the french trailblazer Shift Technology.

The landscape insurance category over time

I’m also glad to see other awesome french data startup enter the landscape this year, like Toucan Toco, Dawex, CybelAngel or Saagie!

Can’t wait the 2021 edition!

Bridging Information Theory and Machine Learning

Christophe Bourguignat — Sat, 26 Sep 2020 09:27:24 GMT

Information Theory

Between 1997 and 2000, I learned the fascinating domain of information theory and digital communications:

source coding: entropy, compression, Huffman, …
noisy communication channels and their capacity: Shannon limit, …
error detection and correction codes: Hamming, Hadamard, BCH, convolutional, turbocodes, …
optimal receiving and decoding: sequential, Viterbi, …

I applied this knowledge between 2000 and 2013, on applications that were at this time 2G/3G mobile networks, modems (V90, ADSL, …), satellite communications, digital microwave radio links, …

My reference book was Digital Communications, by John G. Proakis.

Machine Learning

In 2013, I decided to radically change my field, and started to learn machine learning, and apply it to enterprise data.

My first source of inspiration has been Standford’s Andrew Ng course, and Elements of Statistical Learning, by Hastie, Tibshirani, Friedman.

Bridging the gap

In my mind, those two disciplines — information theory on one side, and machine learning on the other side — were two separated domains, corresponding to two distinct periods of my engineering career (2000–2013, and then 2014–2020+).

But I then recently discovered a new book: Information Theory, Inference and Learning Algorithms, by MacKay, and it made me change my mind.

This book first adresses information theory (data compression, noisy-channel coding, …), and then neural networks. And suddenly, Chapter 40 page 483, the two fields are reunified:

Neural network models involve the adaptation of a set of weights w in response to a set of data points, for example a set DN of N target values t1, …tN at given locations x1, …xN. The adapted weights are then used to process subsequent data. This process can be viewed as a communication process, in which the sender examines the data DN and creates a message w that depends on those data. The receiver then uses w; for example, the receiver might use the weights to try to reconstruct what the DN was. This is using the neuron for ‘memory’ rather than for ‘generalization’, ie extrapolating from the the observed data to the value of tN+1 at some new location xN+1. The adapted network weights w therefore play the role of a communication channel, conveying information about the training data to a future user of that neural net. The question we now address is, ‘what is the capacity of this channel?’ — that is, ‘how much information can be stored by training a neural network?’

How magic is it, to understand that two domains you love, and thought different, are in fact linked.

Countries which Primarily use Antimalarial Drugs as COVID-19 Treatment See the Same Dynamic of…

Christophe Bourguignat — Wed, 29 Apr 2020 17:22:20 GMT

Countries which Primarily use Antimalarial Drugs as COVID-19 Treatment See the Same Dynamic of daily Deaths as Others

Disclaimer — this article is an attempt to show that with the same data, and different analysis methods, a scientific paper can be summarized with different titles, revealing different angles of the results, and leading to different conclusions. It is based on a recent paper published under two different titles:

Countries which Primarily Use Antimalarial Drugs As COVID-19 Treatment See Slower Dynamic of Daily Deaths

and

National Consumption of Antimalarial Drugs and COVID-19 Deaths Dynamics : an Ecological Study

I here propose a third title, with its demonstration:

Countries which Primarily use Antimalarial Drugs as COVID-19 Treatment See the Same Dynamic of daily Deaths as Others

I don’t try to conclude anything here. I do that to raise citizens awareness on how the same data can be used to demonstrate different things.

COVID-19 (Coronavirus Disease-2019) is an international public health problem with a high rate of severe clinical cases. Several treatments are currently being tested worldwide. This paper focuses on anti-malarial drugs such as chloroquine or hydroxychloroquine, which have been currently reviewed by a systematic study as a good potential candidate and that has been reported as the most used treatment by a recent survey of physicians. We compare the dynamics of COVID-19 daily deaths in countries using anti-malaria drugs as a treatment from the start of the epidemic versus countries that do not, the day of the 3rd death and the following 10 days. We show that the first group have the same dynamic in daily deaths that the second group. This univariate analysis is of course only one additional piece of evidence in the debate regarding the efficiency of anti-malaria drugs, and it is also limited as the two groups certainly have other systemic differences in the way they responded to the pandemic, in the way they report death or in their population that better explain differences in dynamics (systematic differences that may also explain their choice to rely on anti- malaria drugs in the first place). Nevertheless, the similarity in dynamics of daily deaths is so striking that we believe that the urgency context commands presenting the univariate analysis before delving into further analysis.

Method

In this study, we set up two groups of countries and study the dynamics of the number of deaths between the day of the 3rd death and the following 10 days. The first group is made up of countries that we know use or produce chloroquine or hydroxychloroquine on a massive scale during this period. The second group consists of countries that did not use or produce chloroquine or hydroxychloroquine in large quantities during the period under consideration. When we calculate the averages of each of the two groups, we find very marked similarities in their temporal dynamics (see results).

The 60 countries most affected by the epidemic (in terms of number of cases) were studied one by one in descending order to determine whether or not they were conducting a national strategy for the large-scale use or production of chloroquine at the beginning of the epidemic in the country (around the 3rd death). If there was no evidence of such a strategy, or even if sources indicated a strategy to the contrary, the country was classified in the “control group” group, until a panel of countries was obtained in order to have a large sample, provided that daily death data were available for the 10 days following the third death. The second group was constituted with the countries among the 60 most affected in terms of number of cases for which sources indicate the massive use or production of chloroquine at the beginning of the epidemic in the country (around the 3rd death), provided that they have daily death data for the 10 days following the 3rd death. The different groups of countries were constituted according to the information available in the international press on their use or mass production of such drugs over the period under consideration. Respectively 15 and 17 countries thus constitute each of the two groups

For each of the two groups, the number of daily deaths is noted each day from the 3rd death in the country and the following 10 days. Then, we normalize by the size of the population older than 65 in each country (the population that is the most vulnerable to COVID-19), to allow fair comparisons. Then the average of the normalized daily deaths is established for each day for each group of countries.

Number of normalized daily deaths after day with 3 deaths, “antimalarial drugs group”.

Number of normalized daily deaths after day with 3 deaths, “control group”.

Results

The graphical projection of the mean curves indicates a similarity in the dynamics of the daily death curves of the two groups of countries, which is very clear for the period studied (i.e. from the beginning of the epidemic).

Means of the number of normalized daily deaths for each group

Limitations

It should also be noted that while many sources exist to determine the health action of governments, including their use or mass production of chloroquine from the onset of the crisis, quantitative data are lacking and do not allow for more in-depth temporal analyses and causality tests. There also might be systematic differences between the two groups — in particular political differences, urban differences or differences in other strategy aspects such as testing. There is strong evidence for places like South Korea and Japan that mass testing is an effective strategy to control the epidemic, and our study might be a proxy for testing strategies. All these aspects should be examined in a late study.

Conclusion

We find no difference in death rates, with countries using antimalarial drugs compared to those which do not. This univariate analysis is of course only one additional piece of evidence in the debate regarding the efficiency of anti-malaria drugs, and it is also limited as the two groups certainly have other systemic differences in the way they responded to the pandemic. Nevertheless, the similarity in dynamics is so striking that we believe that the urgency context commands presenting this analysis before delving into further analysis.

Restaurants Seeded the Massive Coronavirus Epidemic in New York City

Christophe Bourguignat — Mon, 20 Apr 2020 21:18:50 GMT

Disclaimer: this post is a copy/paste parodic version of the original MIT paper ‘The Subways Seeded the Massive Coronavirus Epidemic in New York City’ — but data are still real. I do that to raise citizens awareness: why we should be careful when reading papers claiming to prove things with data.

Figure 1. Numbers of Newly Diagnosed COVID-19 Cases (Pink Data Points, Left Axis) and restaurant meals (Blue Bars, Right Axis), New York City, March 1–April 3, 2020.

New York City’s multitentacled restaurant network was a major disseminator — if not the principal transmission vehicle — of coronavirus infection during the initial takeoff of the massive epidemic that became evident throughout the city during March 2020. The near shutoff of restaurants in NYC — down by over 80 percent at the end of March — correlates strongly with the substantial increase in the doubling time of new cases in this the city.

Figure 1 simultaneously tracks the daily movements of two variables from March 1 though April 3, 2020. The pink-filled circles show the numbers of new coronavirus infections reported each day by the New York City Department of Health. For this variable, the vertical axis on the left is rendered on a logarithmic scale. That way, a straight-line trend would represent the exponential growth typically seen during the initial upsurge of an epidemic where everyone in the population is naïve to the infectious agent.

For the same variable of newly reported cases, the horizontal axis at the bottom ticks off the date that the coronavirus test was performed.

The second variable tracked in Figure 1 above represents the total numbers of seated diners at restaurants every day throughout New York City. These counts are reported each week by OpenTable. This variable is represented as sky-colored vertical bars, along the vertical axis on the right side of Figure 1. For this variable, the horizontal axis measures the dates on which restaurant meals happened.

Figure 1 shows only the volume of seated diners from March 1 onward. Still, the counts shown during the first full week of the month — from Sunday March 1 through Saturday March 6 — are quite typical of the pattern for prior weeks. As decline diners volumes accelerates markedly beginning on Monday March 16, the day that New York City Mayor de Blasio issued an order limiting gatherings and closing numerous places of congregation. By the third week overall, restaurant diners are down 68 percent from the first week in March, and by the fourth week, it’s down 86 percent.

Simple comparison of the two trends in Figure 1 cannot by itself answer questions of causation. Still, the parallel between the continued high diner number and the rapid, exponential surge in infections during the first two weeks of March supports the hypothesis that the restaurants played a role. While the subsequent plummeting of diners appears likewise to parallel the flattening of the reported incidence curve, the steep fall in the heights of the blue bars may just as well represent the public’s response to widespread publicity about the ferocity of the outbreak that had been gathering storm for two weeks. As economists say, the precipitous drop in restaurant meals may well have been endogenous. Even so, the temporal pattern in Figure 1 is compatible with the conclusion that the restaurants were the vehicle by which the public’s response was translated into reduced transmission of the virus.

Bike Sharing System Seeded the Massive Coronavirus Epidemic in New York City

Christophe Bourguignat — Sun, 19 Apr 2020 21:30:57 GMT

Figure 1. Numbers of Newly Diagnosed COVID-19 Cases (Pink Data Points, Left Axis) and bike sharing rides (Blue Bars, Right Axis), New York City, March 1–April 3, 2020.

New York City’s multitentacled bike sharing system was a major disseminator — if not the principal transmission vehicle — of coronavirus infection during the initial takeoff of the massive epidemic that became evident throughout the city during March 2020. The near shutoff of bike ridership in NYC — down by over 80 percent at the end of March — correlates strongly with the substantial increase in the doubling time of new cases in this the city.

For the same variable of newly reported cases, the horizontal axis at the bottom ticks off the date that the coronavirus test was performed.

The second variable tracked in Figure 1 above represents the total numbers of bike sharing rides every day throughout New York City’s bike sharing stations. These counts are reported each week by CitBike. This variable is represented as sky-colored vertical bars, along the vertical axis on the right side of Figure 1. For this variable, the horizontal axis measures the dates on which riders used the bikes.

Figure 1 shows only the volume of rides from March 1 onward. Still, the counts shown during the first full week of the month — from Sunday March 1 through Saturday March 6 — are quite typical of the pattern for prior weeks. As decline in bike use accelerates markedly beginning on Monday March 16, the day that New York City Mayor de Blasio issued an order limiting gatherings and closing numerous places of congregation. By the third week overall, bikes usage is down 68 percent from the first week in March, and by the fourth week, it’s down 86 percent.

Simple comparison of the two trends in Figure 1 cannot by itself answer questions of causation. Still, the parallel between the continued high ridership on bikes and the rapid, exponential surge in infections during the first two weeks of March supports the hypothesis that the bikes played a role. While the subsequent plummeting of ridership appears likewise to parallel the flattening of the reported incidence curve, the steep fall in the heights of the blue bars may just as well represent the public’s response to widespread publicity about the ferocity of the outbreak that had been gathering storm for two weeks. As economists say, the precipitous drop in bike ridership may well have been endogenous. Even so, the temporal pattern in Figure 1 is compatible with the conclusion that the bike sharing system was the vehicle by which the public’s response was translated into reduced transmission of the virus.

Data Science : Time To Change !

Christophe Bourguignat — Sat, 06 Aug 2016 14:24:44 GMT

[This post was initially published on Zelros blog]

Using efficiently their Data is a complex process for companies : only 17% of them consider themselves mature when it comes to data analysis.

You probably have witnessed it : a majority of Data projects remain at the stage of “concept”, never giving applications in production.

By application, we mean an intelligent Data software, actively integrated into a business process or a service, and transforming it. This goes beyond the simple data visualization or dashboard — sometimes (severely) depicted as Minitels of Big Data !

Difficulties to go to production can come from several factors :

Data Scientists findings remain trapped into their computers, because their are not conceived to be shared in a compatible way with the enterprise IT, or not enough user friendly
End users are not involved early enough in the project — at the end the question of the business alignment is raised
Organization is unsuitable : silos between the “thinkers” and the “doers”, source of frustration
and so on …

The traditional way of doing things

Kaggle, internet, MOOCs, … teach us how to drive a “Big Data” project : scoping, data gathering, cleaning, analysis (data visualization), modeling, reporting (dashboards), production.

This approach is firstly focused on the technical aspects, to lead secondly to the application. It has been suitable during the last 3 years, while entry barriers were mainly technological.

But a breakthrough is happening : Data Science tools are now becomingmature, more and more easy to use, by an increasing number of collaborators. Amazon, Google, Microsoft launched their specialized data platforms, Spark 2 will be launched, deep learning is going mainstream.

Let’s take an analogy : for a startup in the digital economy, in many cases it isn’t technology that makes the difference. (but rather, it is a well-designed interface, an exceptional customer experience). This is the same for Data projects : usage, people and design are the critical success factors — no more technical aspects only.

Towards an new approach

That’s why it’s time to change, and reverse the cycle of Data Science projects:

No more starting from the Data Lake, to reach the application, but starting from the application and going back to the Data Lake

In many cases, everybody can find satisfaction:

Business users can express their needs early in the project
IT teams can anticipate production, and prepare the suitable environment
Data Scientists finally know which precise problem they have to crack, and can concentrate their energy on it
Software developers are involved early in the application development, which is favorable to agile increments, and a better engagement

This innovative approach, breaking the culture of the permanent POC, is at the center of Zelros product value proposition. From day 1 of the project, a minimum viable Data Application is deployed : software still limited, but functional, prone to refinements. These Data Applications help our customers to incarnate their projects, and concentrate teams towards acommon goal, reachable in a few weeks — and no more several months.

Try this recipe in you future project, we would love to hear about your feedbacks. What are the results in your context ?

DataQuo.net, a Narrative Way To Share And Discover Awesome Data Science Contents

Christophe Bourguignat — Wed, 23 Mar 2016 21:45:42 GMT

We live in a data world. Data science is beautiful, but complicated, fragmented and changing fast. It’s a mess, and it’s going to be worse in the future.

As a data scientist, I was looking for a way to understand our data-centric era, both simply and quickly. I realized that the quote was the perfect solution for that.

The unsuspected power of quotes

As often in innovation, the idea of DataQuo.net came from a personal experience. I’m an active user of Twitter, and I like to share readings about Data in general. Through successive observations over time, I noticed that when I quoted carefully chosen article excerpts, it had much more impact than just quoting the paper headline itself.

The reason is simple : by selecting a quote, you add value. It’s a proof that you have read, understood and analyzed the paper (8 out of 10 people read only the headline). It’s not an easy exercice to ask yourself : “what would be the best quote in this article (excluded it’s title), and why ?” If you find the answer, you allow the target audience to (re)-discover the reading, by offering a new understanding angle.

And the magic is that people love quotes, because they are short, intense, and self-explanatory. They are the beginning of a story, and the perfect entry point for a deeper dive into a subject.

An exchange platform for the most surprising data quotes

On DataQuo.net, data lovers meet each other. Everyday.

For the quote producer, the site is an opportunity to share its vision, to make the world discover its exciting findings, or to surface thoughts, that otherwise would have remained unnoticed.

For the quote consumer, it’s a chance to stimulate its reflexion, to discover good readings, or simply to find original material to illustrate a PowerPoint presentation.

In a nutshell, I want people to share the best data quotes on DataQuo.net, coming from all sorts of sources : academic papers, blog posts, talks, or presentations.

A sober platform

DataQuo.net uses a Hacker News-style interface. I discovered it from DataTau.com, a data science forum. OK, its look and feel is a bit rude, but I like the sobriety of it’s design. It allows to focus on essence. And it has a cool usability, both on desktop and mobile devices.

What’s more, the integrated voting system allows to always have the best contents on the first page, and fresh material regularly.

The bottom line : my favorite quotes of the moment

At the moment on the platform, I’m attracted by several quotes. I like the mind-blowing ones, like

If people could see in high dimensions, machine learning would not be necessary.

Or :

The cost of a top, world-class deep learning expert is about the same as a top NFL quarterback prospect.

I also like humoristic and sarcastic quotes, like

A Data Scientist is a device for turning coffee and data into better decisions.

Or :

The ultimate insult in Data Science : “it explains everything, but predicts nothing”.

and :

Statisticians, like artists, have the bad habit of falling in love with their models

And finally, I’m impressed by those quotes that are so ubiquitous and perpetual :

Torture the data long enough, and it will confess to anything

an finally,

Essentially, all models are wrong, but some are useful

I hope to see you soon on DataQuo.net. Discover and submit awesome data thoughts there !

A Data Science Landscape, One Year After

Christophe Bourguignat — Sun, 18 Oct 2015 20:47:46 GMT

This is the transcript of the “Data Scientist 2015” Paris conference opening keynote.

A Kudu

Hi everyone,

I prepared this keynote by asking myself a question : what topic would I have mentioned, if I had to do the introduction talk of the last year edition ? Would it have been still relevant today ? Or already totally outdated ?

Last year, for instance, I definitely would have tried — yes, I say tried — to describe what is a data scientist. You know, this fictional role, half math nerd, half software geek, and half communication skilled. Three halves showing that it doesn’t really exists. Today, I’m still even more confused. A recent survey depicted a data scientist as a spider with …. 25 feets ! Maybe after this conference day, we will know a bit more about this new role. And understand how broad it is.

Data Scientist, a spider with … 25 feets

However, compared to last year, we start having data about data scientists. After the quantified self, it’s time for the quantified data scientist — data science on data scientists. Two weeks ago, a linear model predicting the salary of data scientists was published.

What is noticeable ? If you are a girl, unfortunately, you lose points. It won’t surprise anybody. Too bad, but even the data scientist job, like lot of technical positions, cannot escape this rule.

More funny, the more time spent in meetings, the more a data scientist (/analyst/engineer) earns. And if he spends too much time exploring data (4+ hours / day), he earns less ! That beats everything !

Data science on data scientists : a linear model to predict how much they earn

Some months ago, I would have been criticizing the lack of awareness of France, on what represents the data revolution. Let’s recognize that the landscape has changed. A new role has been created — the France’s Chief Data Officer, who recently also became the France’s Chief Information Officer, showing that public IT moves to a more data-centric approach. France now also has its own data science team, and a new word is born : “mégadonnées” — “Big Data” in french.

Henri Verdier, France’s CDO / CIO

Last year, I would have talked about pioneering companies, experimenting with data — doomed to a bright future. Today, I would be more nuanced. Data maturity of companies is very disparate, and the most advanced of them start doubting. 75 % have invested in Big Data, but only 10% have projects in production. For the first time “machine learning”, one of the key component of data projects, is falling down in the last Gartner “Hype Cycle”.

Companies face disillusions. And ask themselves questions : I know how much it costs, but how much do I earn ? What is the ROI ?

Even projects with small data surface new problems — how do I use my data scientists discoveries ? This implies change management, modifying established business processes. One retailer, for example, learned that it could increase profits substantially by extending the time items were on the floor before and after discounting. But implementing that change would have required a complete redesign of the supply chain, which the retailer was reluctant to undertake

For the first time “machine learning” is falling down in the last Gartner “Hype Cycle”

On an other level, technological this time — and because data science is about a lot of technologies — I would probably have mentioned Map Reduce. An algorithm designed by Google about 10 years ago, to allow distributed processing of large volumes of data. A short time ago, it was a star. Today, it is outdated by a tsunami called : Spark.

Chris. Bourguignat on Twitter

Today I explained Map Reduce to my parents #BigData pic.twitter.com/uzPoN5dQxG

Lets’s take an other example. Two weeks ago, Cloudera announced Kudu, a new columnstore bypassing entirely HDFS, the de-facto current big data storage technology. Aside from the fact that it helps data scientists improve their zoological knowledge (the Kudu is woodland antelope found throughout eastern and southern Africa), Kudu makes analysts wonder if HDFS joined MapReduce in the emerging “legacy Hadoop project” category…

On the other hand, I would undoubtedly not have talked about Deep Learning. A branch of Artificial Intelligence (AI). Neural networks, incredibly powerful, that learn from data like — and sometimes better than — humans. This domain made recently decisive advances. These algorithms showed how they were able to paint, write, or compose music. What’s next ?

Depp Learning Paintings

Neither would I have talked about ethics. Yes, ethics — who would have thought it comes to the debate ? A society where every single decision regarding citizens is driven by predictive models, raises concerns.

That’s why data for good, transparency in predictive algorithms, and education about AI are currently growing topics.

Elon Musk on Twitter

Hope we're not just the biological boot loader for digital superintelligence. Unfortunately, that is increasingly probable

To conclude : don’t try to remember too much what I just exposed, it will be partially obsolete next year ! At least, it’s my prediction.

One thing, however, will remain. D.J. Patil, named recently by Barak Obama “US Chief Data Scientist”, wrote in 2012 in a famous and visionary Harvard Business Review article, that Data Scientist would become the “Sexiest Job of the 21st Century”. I’m also deeply convinced about that. Data Scientist is one of the most thrilling job of the world, and this will remain unchanged for a long time. We are a just at the beginning of the story.

I wish you an amazing day.