Key points
- Humans could learn but computers couldn’t
- Neural networks allows us to program ‘parallel computers’
- Researchers Needed to figure out how learning in neural networks work. Nobody knew – it was experimentation / finding: what is the right question to ask
- Beginnings of generative models and question became: what gets the most traction – wasn’t clear it was the right question, but in hindsight turned out to be so.
- Seems that Supervised learning would get the traction (started as an intuition amongst researchers)
- If neural network is deep and large, it could be configured to solve a hard task
- A large and deep neural network can represent a good solution to the problem (most networks were small at the time).
- This Required a big data set and a lot of computing power (to actually do the work)
- Optimisation (for smaller networks?) was a bottleneck in development
- Train them, make it big, find the data and make it succeed
- OpenAI idea: unsupervised learning through compression
- Hypothesis: really good compression of data will lead to unsupervised learning
- If you compress data really well, you extract all the hidden secrets which exist in it, therefore compression is key
- Test: predicting the next character of amazon reviews, there will be a neuron inside the LSTM (long short term memory) that corresponds to its sentiment. That showed traction for unsupervised learning
- So: validated that predicting next character was clue to finding secrets in data
- Reinforcement learning: solving real time strategy game (competitive sport: smart, team work, fast and competing against another team) DOTA2 with the Goal to play against best (human) players in the world.
- Some work seemed like detours (but led up to outcomes) through convergence.
- Nice convergence where GPT created the foundation, and experiments with DOTA created reinforcement learning – learning from human feedback
- When training a large neural network to accurately predict the next character of word in lots of different contexts, what we are doing is learning a world model
- In order just to learn the statistical correlations, compress them really well, what is learned is the process that produced the text. This text is actually a projection of the world
- The more accurate the prediction, the higher the fidelity of this process
- Prediction can be accurate but to be helpful (like an assistant) requires reinforcement learning. Not a matter of teaching new knowledge, more related to ‘what we want it to be’
- The ability to learn everything from the world through the projection of text
Amazing how much of the conversation was about following intuition / looking for the right questions / experimenting and converging. Aligns fully with the non-objective search.