AI Data Analytics
https://lakefs.io/hive-metastore-why-its-still-here-and-what-can-replace-it/ https://www.linkedin.com/pulse/using-olap-graph-data-integration-harmonization-sean-martin/
- data is the food for AI
- better data => better AI, no data => AI starves
- so we need to first get data for AI and we need to get better data for AI, how?
- by creating an innovative AI-powered data analytics platform:
- automatically collect quality data from existing data analytics tasks (data analytics for AI)
- automatically use the data to power AI for better data analytics (AI for data analytics)
- recursive loop: better data analytics produces better food for AI, better AI powers better data analytics
Reinforcement Learning Transfer Learning / Federated Learning Imbalanced Data Learning Causality Discovery / Data Dependency Detection
Data Fabric: Cambridge Semantics - Anzo
Privacy: - (Default) In-cluster/In-house Intelligence (DS-KG, DAG) + General Intelligence (General KG) - (After agreement of terms) In-cluster data after privacy-preserving transformation can be transferred/aggregate to one evolving General KG
Reinforcement Learning: (Mainly to the execution platform) For each performance-critical part of the system, multiple strategies/components are implemented. Using of RL (Multi-arm bandit, Deep reinforcement learning) and cost model can select the optimal strategy/component. The decision can be adaptive and take the user preferences (Cost-sensitive/Performance-sensitive) into consideration.
Better data collection: - Causality discovery in data warehouse () - Transformation to Structured data