AI Data Analytics

https://lakefs.io/hive-metastore-why-its-still-here-and-what-can-replace-it/ https://www.linkedin.com/pulse/using-olap-graph-data-integration-harmonization-sean-martin/

  • data is the food for AI
  • better data => better AI, no data => AI starves
  • so we need to first get data for AI and we need to get better data for AI, how?
  • by creating an innovative AI-powered data analytics platform:
    • automatically collect quality data from existing data analytics tasks (data analytics for AI)
    • automatically use the data to power AI for better data analytics (AI for data analytics)
    • recursive loop: better data analytics produces better food for AI, better AI powers better data analytics

Reinforcement Learning Transfer Learning / Federated Learning Imbalanced Data Learning Causality Discovery / Data Dependency Detection

Data Fabric: Cambridge Semantics - Anzo

Privacy: - (Default) In-cluster/In-house Intelligence (DS-KG, DAG) + General Intelligence (General KG) - (After agreement of terms) In-cluster data after privacy-preserving transformation can be transferred/aggregate to one evolving General KG

Reinforcement Learning: (Mainly to the execution platform) For each performance-critical part of the system, multiple strategies/components are implemented. Using of RL (Multi-arm bandit, Deep reinforcement learning) and cost model can select the optimal strategy/component. The decision can be adaptive and take the user preferences (Cost-sensitive/Performance-sensitive) into consideration.

Better data collection: - Causality discovery in data warehouse () - Transformation to Structured data