by Jason Bissell and Calvin Hoon, Talend
The Internet of Things (IoT) brings businesses many benefits, primary access to more data and better insights. However, most companies we have spoken with are still largely puzzled by what to do with their data. Whether they should store or discard enterprise data, and if stored, what the best approach is to making that data a strategic asset for their company.
Gartner estimates that there will be 25 billion “things” connected to the Internet by 2020. The sheer size and speed of data collected when every device involved in your business process is online, connected, and communicating can strain the sturdiest network infrastructure. As such, despite the widespread proliferation of sensors, the majority of IoT data, collected is never analysed, which is tragic.
Many existing IoT platform solutions are painfully slow, expensive, and a drain on resources, which makes analysing the rest extremely difficult. Furthermore, in situations where timing is critical, delays caused by bandwidth congestion or inefficiently routed data can cause serious problems.
The key takeaway is that data is the most valuable asset for any company. So it would be a shame to completely discard or let it lie dormant in an abandoned data lake somewhere. It’s imperative that all data scientists tap into their swelling pools of IoT data to make sense of the various endpoints of information and help develop conclusions that will ultimately deliver business outcomes. I am totally against of discarding data without processing.
In few years there will be an additional 15 to 40 billion devices generating data from the edge vs. what we have today . That brings new challenges. Just imagine an infrastructure transferring this data to data lakes and processing hubs to process? The load will continue to rise exponentially over coming months and years, creating yet another problem of stretching the limits of your infrastructure. The only benefit of this data will come from analysis either it is traffic of “things” or surveillance cameras. In time-critical situations, if we delay this analysis it might be “too late”. The delay could be due to many reasons like limited network availability or overloaded central systems.
A relatively new approach to solving this issue is called “edge analytics”. It is as simple as to say, perform analysis at the point where data is being generated (or analysing in real-time on site). The architectural design of “things” should consider built-in analysis. For example, sensors built into a train or stop lights that provide intelligent monitoring and management of traffic should be powerful enough to raise the alarm to nearby fire or police departments based on their analysis of the local surroundings. Another good example is security cameras. To transmit the live video without any change is pretty much useless. There are algorithms that can detect a change, and if a new image is possible to generate from the previous image, they will only send the changes. So these kinds of events make more sense to be processed locally rather than sending them over the network for analysis. It is very important to understand where edge analytics makes sense and if “devices” do not support local processing, how we can architect a connected network to make sense of data generated by sensors and devices at the nearest location.
Companies like Cisco, Intel, and others are proponents of Edge computing, and they are promoting their gateways as Edge computing devices. IBM Watson IoT, an IBM and Cisco project, is designed to offer powerful analytics anywhere. Dell, a typical server hardware vendor, has developed special devices (Dell Edge Gateway) to support analytics on Edge. Dell has built a complete system, hardware, and software, for analytics that allows an analytics model to be created in one location, or the cloud and deployed to other parts of the ecosystem.
However, there are some compromises that must be considered with edge analytics. Only a subset of data is processed and analysed. The analysis results are transmitted over the network, which means they are effectively discarding some of the raw data and potentially missing some insights. The situation that arises here is the consideration of whether or not the “loss” is bearable. Do we need the entire data set or is the result generated by the analysis enough? What will be the impact of only using a subset? There are no generalizations to be made here. An airplane system cannot afford to miss any data, so all data should be transferred in order to detect any pattern that could lead to an abnormality. However transferring data during flight is still not convenient. So collecting data offline when the plane lands and Edge analytics during flight is a better approach. The others where there is a fault tolerance can accept that not everything can be analysed. This is where we will have to learn by experience as organizations begin to get involved in this new field of IoT analytics and review the results.
Again, data is valuable. All data should be analysed to detect patterns and market analysis. Data-driven companies are making a lot more progress compare with “digital laggards”. IoT edge analytics is an exciting space, and many big companies are investing in it. An IDC FutureScape report for IoT notes that by 2018, 40 percent of IoT data will be stored, processed, analysed, and acted upon where they are created and before being transferred to the network .
 “The Data of Things: How Edge Analytics and IoT go Hand in Hand,” September 2015.
 Forbes article by Bernard Marr, “Will Analytics on the Edge be the Future of Big Data?”, Aug 2016.