Topics

  • Pipeline for data analysis, knowledge discovery in databases (kdd), data science etc.
  • Tasks: optimisation, regression, classification, clustering, exploration
  • CRISP-DM
  • Inputs: Features, imputation
  • Environmental interaction: Deltas and gradients, Supervised Learning, Unsupervised Learning, Reinforcement Learning, Fitness function, Steps.
  • Tools (methods): singular, mixed and joint (especially with non-AI approaches)
  • Outputs: evaluation, visualization, interpretation
  • Reality: easily fooled, politics, ethics, evidence, hybridise or die

Tools

  • DM – Weka, KEEL, SciKit-Learn
  • Regression – ECJ
  • Classification – Tensor flow/Keras
  • Resources – MOOCS, Jupyter Notebooks, Blogs, Lists (kdnuggets) and repositories
  • Languages – Python (NumPy,…), Matlab toolboxes
  • Competitions – Kaggle, CEC
  • Conferences and journals
  • Distributed AI – Big Data, Cloud-based computation, Grids, Supercomputers
  • Proprietary tools - Watson

Threshold Concepts

  1. AI tools are a pipeline not a single box
  2. Data is often corrupt, incomplete and only a snapshot model of the patterns in real life
  3. Sample search space vs solution search space in conjunction with the curse of dimensionality and NP meaning.
  4. Benefits and costs of data handling from feature manipulation to proper experimentation (train, validation, test, sets)
  5. Environmental interaction guides tool selection.
  6. No free lunch bounds methods but can be defeated by tuning appropriate models to certain problems; therefore understand the problem as much as the tool.
  7. Where standard methods are useful, not useful and should be hybridised.
  8. Importance of deltas, gradients and filters in classification and regression.
  9. Importance of beliefs, priors, likelihoods, posterior and expectations in optimisation.
  10. Probabilistic vs stochastic vs deterministic results
  11. Visualization of patterns discovered, e.g. trade-off surfaces in multi-objective problems, White, grey and black box techniques.
  12. Analysis feeds back to set-up, e.g. significance, power and ethics of results.
  13. Politics, ethics and trust in someone else’s tool?