Write-up for Roundtable 3

The goal for Roundtable 3 was to determine if AI tools could provide industry-wide access to data in a prevailing manufacturing culture that emphasizes protection of intellectual property.

With the overarching objective centered around finding solutions and knowledge to inform a national strategy to advance manufacturing processes, a mix of academic, government, and industry participants held an illuminating conversation for the third in the series of roundtables. In general, the lack of shared information is a pacing item for the adoption of AI and ML in manufacturing, making the topic of AI for Industry-Wide Data Sharing particularly relevant in Workshop 2. The session sought to solicit commentary from industry experts from Lockheed Martin, Microsoft, NSF, NIST, IBM, DOE, along with various higher education academics.

Why Data Should Be Shared

The concept of why data should be shared provoked a prolonged discussion on its merits. Panelists were quick to note the perils of data sharing, such as confidentiality and competitive risks, accuracy and quality of data, lack of curation and context, and legal issues, while hesitating on acknowledging the upsides or identify specific benefits. The unintended consequences of data sharing resounded loudly, with industries seeking to secure and privatize data to maintain their competitive advantage and to keep ownership of their intellectual property. Data was largely regarded as the “bread and butter,” or the “secret sauce” that allows manufacturers to be competitive. To this regard, sharing data was met with trepidation and caution: how can we share data without losing our competitive advantage?

Admittedly, there is still much to learn and many ways toward improvement in data sharing protocols. A common way to describe manufacturing processes is necessary to maximize production capacities in coordination with suppliers, customers, and other departments within the same company. Manufacturers need a shared body of definitions, and especially important in SMMs where day-to-day operations can benefit by having a common dictionary. For this, an industry-wide ontology (semantic tools that formalize concepts and relationships) seems necessary to express standardized formal languages (like XML), thus ensuring shareability and interoperability. These standards could allow the use of AI to extract knowledge from disparate data sources. An example of this approach was shop floor maintenance logs. These logs were identified as a possible source of data to improve machine performance, especially in small businesses. Currently, extracting meaningful analytics from these logs requires human intervention to “tidy” the logs that often encompass a local vernacular that is not shared across industries. By taking a large dataset of maintenance logs, using Natural Language Processes and statistical analysis to optimize language understanding, and going through an iterative process to “train” the machine could lead to performance optimization.

Standardizing a process to use AI to extract knowledge could then have wide ranging implications that could positively impact the industry’s performance. Having multiple “niche” operations build knowledge in this way, from the bottom up, encourages a groundswell of activity that uses data analytics to solve problems, a more likely scenario in data sharing than relying on giant companies that tend to be more risk averse. With more use cases like this, the entire manufacturing industry can benefit from scalable innovative AI tools and methods.

 

How Data Can Be Shared

Given the disparities between industry sectors, building use cases and setting clear benchmarks from a manufacturing perspective are imperative. The healthcare industry is a clear example of an obvious use case for successful and impactful data sharing. Here, the goals and stakes are high. If you can enable a new treatment for a rare cancer, who wouldn’t want it? In manufacturing, however, this incentive is not as crystal clear. Resources are typically referred to as “machines,” and “jobs” are tasks done on a machine. A “model” may thus consist of a job that is a single operation, or a collection of operations that are conducted on multiple machines. Models and algorithms are used to improve performance on production lines (uptime) and minimize downtime. Improving throughput is often an important performance indicator that is directly related to a company’s profit margin. Data can therefore enable much more by identifying best practices, improving product and system design, and advancing innovations, but having a clear example of why to do so is critical.

One impediment to sharing data comes in the realization that many SMMs have yet to collectively embrace the cloud. This issue could be driven by fear of exposing information that would endanger a business model, or lack of resources to implement and maintain the required computer system. The trust in cloud technologies, security concerns, and the vulnerability of networks all seem to come to play, and it is well known that SMMs are often devoting all their limited resources to solving day-to-day problems. In either case, there is a lack of appreciation for the need, benefit, and value of data sharing. Manufacturers need ways to make more data accessible to their AI programs and increase business growth, doing so in a manner that protects data privacy. Two approaches were discussed in the roundtable. A trust model where the creation and preservation of data is curated by subject experts. The data stays local with algorithmic models in place to protect knowledge. The second option was the use of federated learning, a new paradigm for collaboration and partnership between companies using common, powerful machine learning models that build knowledge without exchanging data samples. An example could be a federation between machinery suppliers and machinery operators that provides ongoing improvements in predictive maintenance. This would enable collaboration between industries for learning models and machine learning explorations.

What Incentives Encourage Data Sharing

In an industry draped in a culture of secrecy and systems designed to increase competitive advantage, what incentives will encourage data sharing. Building use cases where manufacturers benefit from sharing data is an important step in setting priorities and understanding what’s at stake. There is value in collecting data, doing it right, and extracting knowledge that can benefit an entire industry without infringing on the competitive advantages of individual entities. However, these values are not clearly defined. At this point, a global consensus among participants formed around the need for SMMs to get involved in sharing data to start addressing mutual problems. For example, crashes or physical injuries through machine tool usage can be avoided through the federation of machine tool documentation. Vendors can tailor their models using pooled data resources to avoid crashes. In either scenario, the curation of data is of critical importance. Another example comes in the form of government funded programs that are designed to make knowledge and research available in order to grow a specific area of research. NSF funded projects come with agreements to make research findings publicly available, with an agreement to release, publish, and share data within six months. And as mentioned previously, there is mutual consensus to share medical data between hospitals as long as privacy concerns are addressed appropriately.

One solution to prevent derivatives of work that may compromise competitive advantages is to bring in trusted third parties. They could help resolve potential liability issues by validating and verifying models to certify products. Furthermore, manufacturers may be more willing to share data with a trusted third party (rather than directly to the public) that can handle the curation and protection of data.

Another idea, in an industry marked by being data rich but with data poor individual manufacturers, comes in the creation of synthetic data. With the need for big data to drive AI exploration for deep learning and data analytics, models to create synthetic or “fake” data can generate information that would add dimensionality and context to test algorithms.

The widespread adoption of data sharing faces many challenges. Understanding the context of data is important. How data is used operationally, how it is annotated, and what it means should all be curated by experts in a particular field. With more successful use cases, more organizations will be willing to share data. Ultimately, the tremendous potential to advance knowledge through a collective ability to learn from data will take hold in the manufacturing industry. 

Suppliers, for example, may choose to federate their data to build better predictive models of overall supply chain performance, resulting in mutually beneficial management.

Workshop 2, Roundtable 3

Dave Dorheim, lead workshop writer, DWD Advisors
Yoh Kawano, Roundtable 3 writer, Office of Advanced Research Computing, UCLA