James W. Marshall has discovered gold nuggets in the Sacramento valley, the capital of California in 1848, which has seen the largest mass migration of 300,000 people to California for processing and mining the gold hidden in the riverbeds at Sutter’s Mill in Coloma. In today’s world, the ability to collect, process, massive of amounts of big data has created a gold rush, not only in Silicon Valley, but throughout the United States and the world. AI offers a great promise to bring scientific advancements in discovering the most important successful factors for the organizations by analyzing the treasure trove of big data through data science, statistics, mathematics, and machine learning algorithms. Fremont Rider in 1944 predicted that the growth of libraries with books would double every 16 years. The U.S. Government in 1950 leveraged NIST, Gaithersburg, MD; Boulder, CO for research on memory, input-output methods, signals timing, and logic circuits. Leveraging big data analytics strategy to improve the decision-making process has become most active research and practice area for the organizations.
Big Data and Data Science Center of Excellence
Setting up the data center of excellence is the key to the success of organization when implementing data science methodologies and machine learning algorithms that can define the need of cost-effective management and analysis to extract the insights from the most large and complex datasets of big data. The five key features of big data include:
It refers to the large-scale of data and the innovative tools to build a BDA (Big data analytics) stack, high-level tools, compute, a complex data acquisition system that can handle all types of big data formats, portability of the data, storage, hardware, software platform, high-performance web hosting providers for large-scale businesses, and data science methodologies. The data scientists of the organizations with such landscape can segment and analyze the data to an unparalleled level of granularity. The organization needs to come-up with a strategy not only to handle the volume of the data, but also to deal with the data breaches and hacks that occur to the big data system on a daily basis. The data infrastructure, data governance, data management, data infrastructure, and data analytics aid the corporation with the scalable processes and interconnected systems around the world to apply the machine learning and statistical algorithms. The organization also has to make a decision, if they would like to implement the traditional neural networks for machine learning algorithms or implement spiking neural networks with neuromorphic hardware with FPGA accelerators after estimating the volumes of the data. The company can also decide upon the long-term strategy of coming up with data centers or cloud computing infrastructure for running the corporations at an unprecedented scale.
Figure 1: Big data analytics framework
The corporations also need to adapt a strategy for the rate, at which the data gets streamed, published, and updated into the big data ecosystem with real-time big data analytics platforms. The data can be coming from multiple platforms and ecosystems such as mobile networks and eCommerce web stores. The corporation needs to set-up a methodology and streaming tools to handle data coming from 3G, 4G or 5G networks with different optimization techniques. Telecommunication industries, energy, and utility industries have the data streaming in high-volume with high-velocity every second. Mobile Internet traffic today forms part of large-velocity and very powerful source of data for shopping. It’s important to design a methodology and tool based on the customer experience and the industry by developing the mathematical models that can handle large mobile traffic and web-traffic with machine learning algorithms.
To be successful, the corporation must adopt a strategy to handle heterogeneous sources of data and formats such as spreadsheets, JSON files, XML, audio, video, and sensor-specific data. The nature of the data being unstructured can introduce challenges. Especially, there’s so much of big data generated from a variety of data sources such as Geospatial, GIS information from satellites, data generated from the drones, data generated from the wireless networks, and IoT streaming networks. The data that comes from various channels is not limited to corporations. Even governments are implementing big data analytics solutions for bringing the insights into the big data for disaster management by building city geospatial dashboards. The success depends on the big data and AI strategy adopted by the government and. No matter, where the organization is in the journey of adopting big data analytics, data science, and AI, the success depends on understanding the real-world projects adapted by other corporations that are pioneers in the field as IT projects differ from the big data and AI projects. Big data is everywhere and comes in many formats while building the strategy in coming up with predictive analytics, prescriptive analytics, and demand forecasting, it’s important to understand, what KPIs the stakeholders of the business are looking for and what data exploratory processes, the business is planning to implement for customer experience before building a big data analytics, data science, and AI success strategy.
Figure 2: City Geospatial Dashboard overview
Some of the unstructured data could be complex to understand and interpret, as these big data assets could form from the reviews on Yelp, consumer opinions posted on social media, which is ambiguous, inconsistent with others, and imprecise to the discussion topic. Mining such data is not devoid of bias. For example, Twitter has a daily visitors of 150 million. The corporations mining such tweets should consider building a big data and AI Enterprise playbook that can define the type of strategy to deal with veracity of data, building particular type of machine learning classifiers to express a positive, negative, or neutral sentiment in the market with natural language processing techniques primarily applying logistic regression, Naïve Bayes, K-Nearest neighbors, random forest, and support vector machine algorithms.
The value part of big data is on the top of the big data pyramid of the five Vs. Setting up big data analytics strategy and implementing data science methodology with various machine learning algorithms should lead to a successful outcome of the business generating revenues. The predictive analytics should aid the corporations in gaining the insights of the factors that can contribute to the growth of the businesses.
Crossing the chasm
In order to be successful, the organizations need to ramp-up onto the commercial market with production deployments from the strategy. The big data analytics adaption sits permanently in the center of structural changes that occur to the big data landscape and advanced big data tools and techniques every quarter due to the data deluge. There’s high enthusiasm in the big data analytics landscape for many corporations as the digital big data in the universe doubles every 2 years and the data size the business handles are extremely exponential. 90% of the big data generated in today’s world has been generated in the past few years.
The rise of reinforcement learning
Having the treasure trove of big data with corporation should lead into applying algorithmic development, powerful computing for building self-driving cars, robotics, and superintelligence to spawn new lines of businesses. It entirely depends on understanding the brain function to be able to build next-generation AGI and new electronic appliances with quantum materials. OpenAI continues to make strides in reinforcement learning in 2020. Though, the optimization of reinforcement learning algorithms is achieved through trial and error method, it’s critical to avoid some errors that can lead to fatalities. Therefore, the safety is critical in reinforcement learning environments, especially when launching spacecraft vehicles onto different surfaces of different planets or their moons, an error should not occur. As we saw in 2019, DeepMind StarCraft, AlphaGo, Arcade Learning environment, Control Suite, and OpenAI Dota have shown superhuman performance. Mathematics with numeric methods and statistical techniques are applied into the environment to optimize the behavior of advanced AI RL systems by way of cumulating the reward function in multiple iterations. Just like teaching a baby the initial steps and similar to how babies learn to walk their own walk on their own after the initial steps, the deep reinforcement learning is expected to revolutionize the robotics field in powering and giving autonomy to robots to walk their own walk without human intervention. The future of reinforcement learning is expected to build advance AI systems in robotics field for multiple industries such as transportation, healthcare, retail, manufacturing, and aerospace.
Figure 3. Open AI constraint elements in safety gym reinforcement learning environment
Artificial intelligence endured several long cold winters before rolling out commercial production deployments in healthcare and finance to meet the economies of scale. The adoptions of AI are exponentially increasing. According to Gartner, just AI augmentation alone will create $2.9 trillion of business value in 2021. The launch of TensorFlow Enterprise AI by Google in 2019 will accelerate more deployments for the enterprises. According to Deloitte Consulting’s State of AI in the enterprise survey that was conducted with 1900 executives worldwide on the AI technology portfolio comprising machine learning, deep learning, natural language processing, and computer vision , AI will create waves in 2020. The national AI strategy from US executive order on AI leadership focuses on several initiatives of government investment in AI on Military, Navy, Army, US Airforce, education, and healthcare. The research report also shows that the window for competitive differentiation with AI is quickly closing.
Figure 4. AI Enterprise adoption forecast
Data science lifecycle – The Hallmarks of the Great Beyond
Gartner’s survey in 2015, shows that, big data analytics was shown as a catalyst in the way of revolutionizing how business operate and become successful. However, 80% of the organizations have failed to implement the big data analytics and data science strategy and methodology they’ve defined in the data science lifecycle projects. That’s a stunning number, considering 65% of the organizations have even went under the waterline for strategic innovation management and data management investments. Only the 35% of the organizations have shown success when jumped onto the bandwagon of big data analytics strategies. The survey results showed that the big data dreams were hard to achieve when they didn’t define the correct strategy and did not choose the right big data compute infrastructure or apply the machine learning algorithms effectively. Hence, the organizations need to create the big data analytics strategy that supports organizational decision-making and improves the business with operational and strategic decisions by enhancing the decision-quality. The colossal volumes of big data collected from internal, external sources should use advanced big data analytics tools and algorithms to generate the insights to benefit the stakeholders of the organization. Most, importantly, it’s critical to watch-out for the essential factors of success on the organizations that have implemented.
About the Author
Ganapathi Pulipaka is Chief AI HPC Scientist at DeepSingularity LLC for AI Infrastructure, supercomputing, high-performance computing for HPC, parallel computing, AI strategy neural network architecture, data science, machine learning, and deep learning in C, C++, Java, Python, R, TensorFlow, and PyTorch on Linux, macOS, and Windows. He breaks new ground in the world of machine learning on conversational AI, NLP, Robotics, IoT, IIoT, and reinforcement learning algorithms with 21+ years of experience. He is ranked as #5 data science influencer by Onalytica.
Lwin, K. K., Takeuchi, W., Sekimoto, Y., & Zettsu, K. (2019, March 12). City Geospatial Dashboard: IoT and Big Data Analytics for Geospatial Solutions Provider in Disaster Management. IEEE. http://dx.doi.org/10.1109/ICT-DM47966.2019.9032921 https://doi.org/10.1109/ICT-DM47966.2019.9032921
Rueda, D. F., Vergara, D., & Reniz, D. (2018, January 24). Big Data Streaming Analytics for QoE Monitoring in Mobile Networks: A Practical Approach. IEEE. http://dx.doi.org/10.1109/BigData.2018.8622590