Notes from watching Leonard Susskind's Theoretical Minimum series. Scribed notes for what I've learned, ideally at a level that's easy to understand. Another goal of this blog is to be succinct - these notes came from 18.5 hours of lectures, which were then turned into a ~400 page book. To be more succinct, I will also leave out some details and avoid being overly wordy; the reader should think of these notes as a skeleton for the actual course, rather than a replacement.
Please reach out to me if you would like to discuss anything here or have any questions!
Classical mechanics is intuitive because we can visualize the objects being studied. While it becomes mathematically abstract with concepts like Lagrangians and Poisson brackets, its basic principles are motivated by the observable world. In classical mechanics, we consider 1) a system of objects, and 2) the state of this system. Its deterministic nature means that knowing the system's state at any given time allows us to predict its future behavior.
Beyond classical mechanics, abstraction increases. Our brains operate in three dimensions, making it impossible to visualize higher or lower dimensions accurately. No one can visualize 4 dimensions, and when we visualize lower dimensions (say 2d), we are seeing a 2d plane embedded in 3 dimensions. Good physicists excel not in visualizing higher dimensions but in using abstract mathematics to describe them. Thus, understanding quantum mechanics requires relying on abstract math rather than trying to visualize it as an extension of classical mechanics.
In classical mechanics, the key question is: what is the state of a closed system? The space of states encompasses all possible states of a system. For example, a die has the state space 1,2,3,4,5,6, and a coin has heads,tails. For a point particle on a line, the phase space includes all possible positions and momenta, forming a continuously infinite, two-dimensional line that we can graph.
Consider the die again. If we look at two subsets:
odd numbers: 1,3,5
numbers ≤3: 1,2,3. We see that "AND" is their intersection: 1,3, and "OR" is their union: 1,2,3,5. These concepts come from set theory, but in quantum mechanics, the space of states is a vector space, not a set.
Rather than diving into vector spaces, let's focus on experiments.
Experiments
Let's examine a coin in classical mechanics (cbit) and quantum mechanics (qubit). The quantum analog is more intriguing. Each qubit has two states: heads or tails, represented by σ=+1 for heads (↑) and σ=−1 for tails (↓).
We also have an apparatus with "THIS SIDE UP" sign. When placed upright, the apparatus detects the qubit's state and displays σ on the screen. Instead of just measuring, think of this as preparing the system. If we measure σ=+1, reset the apparatus, and quickly repeat the experiment with the same qubit, the screen consistently shows +1. Thus, the apparatus not only determines but rather prepares the qubit's state.
Let's flip the apparatus so that "THIS SIDE UP" is on the bottom, orienting it downwards. Now, redoing the experiment with the same qubit gives σ=−1. Repeating the experiment consistently yields σ=−1, but flipping the apparatus back shows σ=+1 again. This suggests the qubit has directionality, indicating it behaves like a vector. Alternatively, the detector's orientation could influence the screen to display the vector component along its direction.
What if we rotate the apparatus 90∘ so "THIS SIDE UP" points left? Using multiple ↑ qubits one after another, we expect the screen to display 0, as the horizontal component of ↑ is 0. In classical physics, this would happen. However, in quantum mechanics, the screen shows either +1 or −1, averaging to 0 over many trials, so the probability of getting +1 and −1 is the same.
If we generalize by changing the angle to θ∘, the horizontal component of the ↑ vector along this tilt is cosθ. Classically, we'd expect the output to be cosθ. In the quantum world, the output is still +1 or −1, but the average value over many trials will be cosθ.
Some Mathematics
Let's diverge our discussion to introduce more mathematics - let's discuss vector spaces. For now, it suffices to define a vector space. Normally, we use v or v to denote vectors; in quantum mechanics, we use bra-ket notation, where ⟨v∣ is a bra, and ∣v⟩ is a ket. We can now define a vector space as follows:
A vector spaceV is a collection of vectors ∣v⟩, satisfying the following:
Addition: ∣a⟩+∣b⟩=∣c⟩, where ∣a⟩,∣b⟩,∣c⟩∈V.
Multiplication: z∣a⟩=∣a′⟩ for z∈C and ∣a⟩,∣a′⟩∈V.
There are a couple more axioms that are generally used (ie. there is a zero; that is, a vector ∣0⟩ such that ∣0⟩+∣a⟩=∣a⟩ for ∣a⟩∈V), but professor Susskind ignores them for now.
Let's take a look at a few examples:
The real numbers are a real vector space, because we require z∈R.
The complex numbers are a complex vector space.
Functions form a vector space.
The collection of all vectors α1α2⋮αn with α1,α2,...,αn∈C is a vector space.
Now, let's introduce the idea of a dual vector space. Given a vector space V, we construct a dual vector space V∨. The basic idea is for every vector ∣v⟩∈V, there is a vector ⟨v∣∈V∨.
We can formulate this more rigorously as follows:
Given a vector space V containing vectors ∣v⟩, the dual vector spaceV∨ contains vectors ⟨v∣, such that there is a 1-to-1 correspondence between:
∣a⟩↔⟨a∣
∣a⟩+∣b⟩↔⟨a∣+⟨b∣.
z∣a⟩↔⟨a∣z∗, where z∗ is the complex conjugate of z.
Now, let's define the inner product, the analogue of the dot product.
Given ⟨a∣ and ∣b⟩, the inner product is an operation that gives ⟨a∣b⟩, which satisfies ⟨a∣b⟩=⟨b∣a⟩∗.
Let's look at an example.
Define ⟨b∣a⟩=(β1∗,β2∗)(α1α2)=β1∗α1+β2∗α2. Then ⟨a∣b⟩=(α1∗,α2∗)(β1β2)=α1∗β1+α2∗β2. Now we can see ⟨a∣b⟩⟨b∣a⟩∗.
What if we set b=a? Then ⟨a∣a⟩=⟨a∣a⟩∗, so the inner product of any vector with itself is always positive and real. This value is generally known as the length of a vector.
We say two vectors are orthogonal if ⟨b∣a⟩=0. Define the dimension of a vector space is the maximum number of nonzero mutually orthogonal vectors. For example, in the vector space of 2d column vectors, (10) and (01) are orthogonal, and since there are no other nonzero mutually orthogonal vectors, this vector space is of dimension 2.
Lecture 2 - Quantum States
Logic
Let's rename ↑ to spin up, and ↓ spin down. When we measure a spin up qubit in an upwards oriented apparatus, we are measuring σz=±1 - the z-component of the spin. And if we turn the apparatus on it's side, we get σx=±1. And so on...
Now we generalize. Suppose we have previously prepared an experiment where everything is oriented in the n^ axis. Then, we rotate the apparatus in 3d space so that the orientation vector points in the m^ direction. It turns out that the average value of the experiment is ⟨σm⟩=n^⋅m^=cosθ, the component of n^ along m^.
Classically, proposition about the state of a system is a subset for which a proposition is true. One type of a proposition is a NOT statement, which is exactly what it sounds like - the subset of things that do not satisfy the proposition. A proposition can be either TRUE, or FALSE. Two other types of propositions we've seen already are AND and OR statements - they are different in quantum mechanics. In english, there are two types of OR - the inclusive one (the union) and the exclusive one (in one subset, but not the other). Generally when we speak, we mean the exclusive one, but in logic, we mean the inclusive OR (the union).
Take two propositions:
A:σz=+1
B:σx=+1
Let's design an experiment to check A OR B on ↑. Let's measure the spin along the z-axis, first. Orient the apparatus in the upward direction, and run the experiment. If this is true, we are done. A OR B is TRUE because the OR only requires that one of them be true.
Let's redo the experiment, but flip the order of A and B.
We will get σx=+1 with 21 probability. Then A OR B is TRUE.
We will get σx=−1 with 21 probability, so we need to test B. Since we prepared the spin arrow to point to the sideways to test A, when we test B, we will get 21 chance it lands +1, and 21 chance it lands −1. So the chance B OR A is true is 21+41=43, so order does matter.
The space of states of a quantum system is a linear vector space, so given any two states, we can add them to create a third space. You cannot do this with sets - given two elements of a set, you can't always add them together to make a third element.
The Connection
Finally, let's state the connection between a vector space and the space of states of a quantum statement. In our previous experiment, we saw that: σz,σx, or σy=±1. Let's rename these to ∣U⟩ (up), ∣D⟩ (down), ∣L⟩ (left), ∣R⟩ (right), ∣I⟩ (in), and ∣O⟩ (out). Suppose we had a ket vector ∣A⟩=αU∣U⟩+αD∣D⟩. The probability of getting up, would be Pu=αu∗αu, and the probability of getting down is αd∗αd.
One basic postulate that we will accept is orthogonality means that two things are sufficiently different that with a single experiment you can tell the difference. So ∣U⟩ and ∣D⟩ are orthogonal to each other. We will show that these can be the basis for our vector space (of dimension 2).
Since PU+PD=1, we have αu∗αu+αd∗αd=1↔⟨A∣A⟩=1. Using similar ideas, we can guess that ∣R⟩=21∣U⟩+21∣D⟩ and ∣L⟩=21∣U⟩−21∣D⟩. We can now derive the identities:
2∣R⟩+∣L⟩2∣R⟩−∣L⟩=∣U⟩=∣D⟩
Furthermore, we can show that ∣I⟩=−21∣U⟩+2i∣D⟩ and ∣O⟩=21∣U⟩−2i∣D⟩.
We can rewrite everything as column vectors to conclude the following:
Let ∣i⟩ be basis vectors (mutually orthonormal vectors). Then we can write any element in our vector space as a linear combination of the basis vectors, ie. ∣A⟩=∑iαi∣i⟩ for αi∈C.
We can now do inner products: ⟨j∣A⟩=∑iαi⟨j∣i⟩=αiδji, where δji is the Kronecker delta function:
δij={10if i=jotherwise
So ∣A⟩=∑i∣i⟩⟨i∣a⟩, which is a useful identity used to simplify things. The exact same thing is true for bra vectors: ⟨A∣=∑i⟨A∣i⟩⟨i∣.
Here's another reason why we want to use linear algebra: observables (thing we measure) are linear operators.
A linear operatorM, acting on ∣A⟩, gives a unique ket vector ∣B⟩ satisfying:
M[z∣A⟩]=zM∣A⟩.
M[∣A⟩+∣B⟩]=M∣A⟩+M∣B⟩.
Using this definition, we can show that
⟨i∣M∣A⟩=⟨i∣B⟩=βi
Using our useful identity, we can rewrite this as
j∑⟨i∣M∣A⟩⟨j∣A⟩=⟨i∣M∣j⟩αj=βi.
We define the matrix elements as Mij:=⟨i∣M∣j⟩, which are important because it's possible to characterize linear operators by these matrix elements. So we can rewrite ∑j⟨i∣M∣A⟩⟨j∣A⟩ in two other ways: ∑jMijaj=βi, and
Since we have matrices, it will be useful to define eigenvectors.
The eigenvectors of M are vectors ∣λi⟩ such that M∣λi⟩−λi∣λi⟩=0, and each eigenvector ∣λi⟩ has eigenvalueλi.
These are the vectors which the direction before and after applying the operator is the same, and the only thing that happens to the vector is that it's scaled, and this scaling factor is called the eigenvalue.
We defined linear operators on ket vectors; they can also be defined on bra vectors by through
One may think that M∣A⟩=∣B⟩↔⟨A∣M=⟨B∣, but this is wrong. The correct answer is that M∣A⟩=∣B⟩↔⟨A∣M†=⟨B∣, where M†=[MT]∗ (ie. change mji to mij∗: first transpose the matrix, then complex conjugate all of it's entries) is the Hermitian conjugate.
Since quantum mechanical measurements are always real, quantum mechanical observables are represented by Hermitian operators, which satisfy the property M=M†. Professor Susskind calls this the fundamental theorem because of how important it is. One nice property of these Hermitian operators is that the eigenvalue of a Hermitian operator must be real: For a Hermitian operator L, L∣λ⟩=λ∣λ⟩⟹⟨λ∣L∣λ⟩=λ⟨λ∣λ⟩ and ⟨∣L=⟨∣L†=⟨λ∣λ∗⟹⟨λ∣L∣λ=λ∗⟨λ∣λ⟩, so λ=λ∗⟹λ is real. We can state the fundamental theorem more precisely as follows:
The eigenvectors of a Hermitian operator form an orthonormal basis.
Take L∣λ1⟩=λ1∣λ1⟩↔⟨λ1∣L=λ1⟨λ1∣ and L∣λ2⟩=λ1∣λ1⟩. Inner producting things together gives ⟨λ1∣L∣λ2⟩=λ1⟨λ1∣λ2⟩ and ⟨λ1∣L∣λ2⟩=λ2⟨λ1∣λ2⟩. Subtracting gives (λ1−λ2)⟨λ1∣λ2⟩=0, so the two eigenvectors are orthogonal.
Suppose λ1=λ2 and let ∣A⟩=α∣λ1⟩+β∣λ2⟩. Then L∣A⟩=αL∣λ1⟩+βL∣λ2⟩,L∣A⟩=αλ∣λ1⟩+βλ∣λ2, and L∣A⟩=λ(α∣λ1⟩+β∣λ2⟩)=λ∣A⟩, so these two vectors are linearly independent because any ket vector can be written as a linear combination of these two. Now we have two orthonormal basis vectors, we can extend using Gram-Schmidt to obtain our desired number of basis vectors. It remains to show that if our space is N-dimensional, there are N orthonormal eigenvectors, which is just a simple linear algebra exercise.
Principles
Professor Susskind states that there are 4 basic principles of quantum mechanics:
The observable or measurable quantities of quantum mechanics are represented by linear operators L.
If the system is in eigenstate ∣λi⟩, the result of a measurement is guaranteed to be λi.
Unambiguously distinguishable states are represented by orthogonal vectors.
If ∣A⟩ is the state vector of a system, and the observable L is measured, the probability to observe value λi is P(λi)=⟨A∣λi⟩⟨λi∣A⟩=∣⟨A∣λi⟩∣2.
In particular, these conditions combine to imply that that L must be Hermitian.
Explicit Pauli Matrices
Let's write down spin operators as 2×2 matrices. Starting with σz, principle 2 tells us σz∣U⟩=∣U⟩,σz∣D⟩=−∣D⟩, and principle 3 tells us ⟨u∣D⟩=0. Writing this as matrix equations, we get
For our last direction, σy, ∣I⟩=21∣U⟩+2i∣D⟩ and ∣L⟩=21∣U⟩−2i∣D⟩ so ∣U⟩=(212i),∣O⟩=(21−2i) so
σy=(0i−i0).
These three boxed matrices are collectively called the Pauli matrices.
Implications
Although σ is not a 3-vector (because they are written as matrices), it behaves a lot like one. Suppose σn=σ⋅n^. Then
σn=(nx(nx+iny)(nx−iny)−nx)
which gives us a way to calculate what happens when we orient the apparatus along n^, painting a complete picture of spin measurements in 3d space.
If we let n^ sit in the x−z plane, then
σn=(cosθsinθsinθ−cosθ).
The eigenvectors are ∣λ1⟩=(cos2θsin2θ),∣λ1⟩=(−sin2θcos2θ), with eigenvalues +1 and −1, respectively. Suppose our apparatus initially points along the z-axis, and we rotate it so it lies along the n^ axis. Giving it a spin in the ∣U⟩ state, what's the probability of observing σn=±1. Principle 4 gives us
P(+1)=∣⟨U∣λ1⟩∣2=cos22θP(−1)=∣⟨U∣λ2⟩∣2=sin22θ
The expected value is thus
⟨L⟩=i∑λiP(λi)=cos22θ−sin22θ=cosθ.
There is one more theorem for this section:
(The Spin-Polarization Principle)
Any state of a single spin is an eigenvector of some component of the spin.
This means given a state ∣A⟩, there exists some direction n^ such that σ⋅n∣A⟩=∣A⟩. Two implications: 1) for any spin state, we can orient the apparatus in some orientation so that it registers +1, and 2) there is no state where the expected value of all three components of the spin is 0. In fact, ⟨σx⟩2+⟨σy⟩2+⟨σz⟩2=1.
Lecture 4 - Time and Change
Unitarity
In classical mechanics, professor Susskind introduced what he called the minus first law: information from the past is never lost. The quantum analog is the conservation of distinctions. Let the state of the system be ∣Ψ(t)⟩. The state at time t is a linear operator U(t), which satisfies ∣Ψ(t)⟩=U(t)∣Ψ(0)⟩, where U is called the time-development operator for the system. This is based on the basic dynamical assumption of quantum mechanics is that if you know the state at one time, then the quantum equations of motion tell you what it will be later.
The main difference between classical and quantum determinism is as follows: classical determinism allows us to predict the results of an experiment, which quantum determinism allows us to compute the probabilities of the outcomes of later experiments.
Suppose ∣Ψ(0)⟩ and ∣Φ(0)⟩ are orthogonal. Then ⟨Ψ(0)∣Φ(0)⟩=0, and the conservation of distinctions implies ⟨Ψ(t)∣Φ(t)⟩=0. Rewriting ⟨Ψ(t)∣=⟨Ψ(0)∣U†(t), and ∣Ψ(t)⟩=U(t)∣Ψ(0)⟩ from earlier implies ⟨Ψ(0)∣U†U(t)∣Φ(0)⟩=0. Consider an orthonormal basis ∣i⟩ with ∣Φ(0)⟩ and ∣Ψ(0)⟩ as basis vectors. We get ⟨i∣U†(t)U(t)∣j⟩=δij, so U†U=I (an operator that satisfies this condition is called unitary).
Now we can introduce our fifth principle of quantum mechanics:
The evolution of state vectors with time is unitary.
The Hamiltonian
Let U(ϵ)=I−iϵH. Then (I+iϵH†)(1−iϵH)=I⟹H†=H. This value H is called the quantum Hamiltonian, a observable where the eigenvalues meausure the energy of a quantum system. Using ∣Ψ(t)⟩=U(t)∣Ψ(0)⟩, and taking t=ϵ, we get ∣Ψ(ϵ)⟩=∣Ψ(0)⟩−iϵH∣Ψ(0)⟩. Rearranging and taking ϵ→0 gives
∂t∂∣Ψ⟩=−H∣Ψ⟩,
the time-dependent Schrödinger equation. So the reason we care about the quantum Hamiltonian is that it tells us how the state of an undisturbed system evolves with time.
Planck's constant is ℏ=1.0545×1034kg⋅m2/s, which we need to insert to the time-dependent Schrödinger equation to make the dimensions actually make sense:
ℏ∂t∂∣Ψ⟩=−iH∣Ψ⟩.
It is worth noting that Planck originally came up with the constant h=2πℏ, but later physicists changed it to remove the need to write 2π in a ton of places.
Averaging
Last time, we look at averages. Here's a nice trick to compute averages:
⟨L⟩=⟨A∣L∣A⟩
where ∣A⟩ is the noramlized state of a quantum system. To prove this, rewrite ∣A⟩=∑iαi∣λi⟩⟹L∣A⟩=∑iαiL∣λi⟩. Since L∣λi⟩=λi∣λi⟩⟹L∣A⟩=∑iαiλi∣λi⟩. Lastly, take the inner product with ⟨A∣ to get ⟨A∣L∣A⟩=∑i(αi∗αi)λi, and the result follows.
We can use averaging to show that we can always scale a state-vector by whatever constant factors eiθ we want and nothing will change: take ∣A⟩=∑iαi∣λi⟩. If we let ∣B⟩=eiθ∣A⟩, we can show that they have the same magnitude because ⟨B∣B⟩=⟨Ae−iθ∣eiθA⟩=⟨A∣A⟩. Similarly, the eigenvalues will be ∣λj⟩ with probability αj∗e−iθeiθαj=αj∗αj for both ∣A⟩ and ∣B⟩. Finally, we have ⟨L⟩=⟨B∣L∣B⟩=⟨Ae−iθ∣L∣eiθA⟩=⟨A∣L∣A⟩.
where the second lines follows from Schrödinger. The term [L,M]=LM−ML is called the commutator and is usually not 0. One important fact is that the commutator bracket is skew symmetric: LM−ML=[L,M]. Sometimes, we rewrite this whole thing more succinctly as
dtdL=−hi[L,H].
We may notice that the commutator [L,H] is awfully similar to the Poisson bracket iℏ{L,H}. If we plug this into the previous equation, we get
dtdL={L,H}.
The major difference in quantum physics is that there is a difference between FG and GF for two linear operators F,G, whereas this is not true in classical mechanics.
What does it mean when an observable (which we call Q) is conserved? This means [Q,H]=0⟹[Qn,H]=0. H is the definition of energy in quantum mechanics, and obviously, [H,H]=0, which is an example of conservation of energy.
Spin has an energy depending on it's orientation when placed in a magnetic field: let H∼σ⋅B=σxBx+σyBy+σzBz. For a simple example: take the magnetic field along the z-axis. Then H=2ℏωσz for some constant ω. The average values are
The Pauli matrices verify that [σx,σy]=2iσz,[σy,σz]=2iσx, and [σz,σx]=2iσy. Thus,
⟨σx⟩⟨σy⟩⟨σz⟩=−ω⟨σy⟩=ω⟨σx⟩=0.
This is exactly the same equations as the classical rotor in a magnetic field! In classical mechanics, precessing is the x and y components of angular momentum, whereas in quantum mechanics, it's the expected value.
Solving Schrödinger
The iconic Schrödinger is
iℏ∂t∂Ψ(x)=−2mℏ∂x2∂2Ψ(x)+U(x)Ψ(x).
Earlier, we saw the time-dependent Schrödinger equation:
ℏ∂t∂∣Ψ⟩=−iH∣Ψ⟩.
There is also the time-independent Schrödinger equation:
H∣Ej⟩=Ej∣Ej⟩.
Because H is energy, Ej is the energy eigenvalues and ∣Ej⟩ the energy eigenvectors. Suppose we know these. We can now solve the time-dependent analog by plugging in ∣Ψ(t)⟩=∑jαj(t)∣Ej⟩ to get
Using this, we can now predict probabilities: the probability for outcome λ is Pλ(t)=∣⟨λ∣Ψ(t)⟩∣2, and we can calculate the ket using Schrödinger.
Wave Function Collapse
Let the state-vector be ∑jαj∣λj⟩ before the measurement of L. The apparatus measures λj with probability ∣αj∣2, and then leaves the system in a single eigenstate of L, namely ∣λj∣. This is because we need to cnosider the apparatus as part of our single quantum system; otherwise, the state vector might reduce to a single eigenstate due to interaction with the external world. This is known as the wave function collapse, which we'll talk about later.
Lecture 5 - Uncertainty and Time Dependence
Simultaneous Eigenvectors
Consider a two-spin system. If we measure both spins, the system is in a state that is simultaneously an eigenvector of L and M - a simulatenous eigenvector. Assume that there is a basis of state vectors ∣λ,μ that are simultaneous eigenvectors, or
L∣λ,μ⟩M∣λ,μ⟩=λ∣λ,μ⟩=μ∣λ,μ⟩.
Using some algebra we can get
[L,M]∣λ,μ⟩=0.
So the condition for two observables to be simultaneously measurable is that they commute.
Wave Functions
Let ∣a,b,c,...⟩ be orthonormal basis vectors with entries as commuting observables A,B,C,.... We can rewrite any state vector as
∣Ψ⟩=a,b,c,...∑ψ(a,b,c,...)∣a,b,c,...⟩.
Then the wave function is
ψ(a,b,c,...)=⟨a,b,c,...∣Ψ⟩.
The probability for the commuting observables to have values a,b,c,... is
P(a,b,c,...)=ψ∗(a,b,c,...)ψ(a,b,c,...)
and we know that the total probability sums to 1:
a,b,c,...∑ψ∗(a,b,c,...)ψ(a,b,c,...)=1.
We use Ψ for state-vectors and ψ for wave functions.
Uncertainty
Another reason to care about the Pauli matrices is that every 2×2 Hermitian operator has the Pauli matrices and the identity matrix as a basis. Can we simultaneously measure any pair of spin components? When two observables do not commute, in general, it is impossible to precisely know everything about both. For example, since [σx,σy]=2iσz,[σy,σz]=2iσx, and [σz,σx]=2iσy, we cannot simultaneously measure two spin components.
In general, we cannot simultaneously measure two observables with perfect precision unless they commute. We must have some uncertainty, by which we mean the standard deviation
A=A−⟨A⟩.
The eigenvalues of A are a=a−⟨A⟩, and the square of the uncertainty is (ΔA)2=∑aa2P(a)=∑a(a−⟨A⟩)2P(a)=⟨Ψ∣A2∣Ψ.
To bound uncertainties, we often need inequalities, most notably the triangle inequality, ∣X∣∣Y∣≥X⋅Y and the Cauchy-Schwarz inequality, which is sometimes written in the form ∣X∣2∣Y∣2≥∣X⋅Y∣2 but we will use the (equivalent) form 2∣X∣∣Y∣≥∣⟨X∣Y⟩+⟨Y∣X⟩∣.
Let ∣Ψ⟩ be a ket and A,B be observables. Define ∣X⟩=A∣Ψ⟩,∣Y⟩=iB∣Ψ⟩. Plugging this into Cauchy-Schwarz gives
⟹2⟨A2⟩⟨B2⟩≥∣⟨Ψ∣[A,B]∣Ψ⟩∣ΔAΔB≥21∣⟨Ψ∣[A,B]∣Ψ⟩∣
Lecture 6 - Combining Systems: Entanglement
Tensor Products
Now, let there be two systems: system A with space of states SA and system B with space of states SB. The combined system is then SA⊗SB, with basis ∣ab⟩.
Suppose Charlie has a dime (σ=+1 (σ=−1), and gives one each to Alice and Bob. Then, Alice and Bob travel very far away from each other without looking at the coin. When Alice finally looks at the coin, she will immediately know what Bob's coin is. So ⟨σA⟩=⟨σB⟩=0 and ⟨σAσB⟩=−1. ⟨σAσB⟩−⟨σA⟩⟨σB⟩ is the statistical correlation, and since it is nonzero Alice and Bob's observations are correlated.
Let's take the quantum version, with spins rather than coins. We can write any state in the combined system as ∣Ψ⟩=∑a,bψ(a,b)∣ab⟩. Let the components of Alice's spin be σx,σy,σz with her ket vectors notated ∣A}, and Bob's spin be τx,τy,τz with his ket vectors notated as usual. If Alice prepares her spin in state αu∣U}+αd∣D} and Bob prepares his in state βu∣U⟩+βd∣D⟩, the combined product state is then
Note that the tensor product is a vector space for studying combined systems; a product state is a state vector of the proudct space. Most state-vectors in the product space are not product states.
Tensor products work in matrices the same way we'd expect them to work.
Entanglement
The most general vector is
ψUU∣UU⟩+ψUD∣UD⟩+ψDU∣DU⟩+ψDD∣DD∣⟩
with normalization condition ψUU∗ψUU+ψUD∗ψUD+ψDU∗ψDU+ψDD∗ψDD=1 so there are 6 real parameters. This space of states is much more complicated than just the individual ones combined; this is due to entanglement.
Two examples of maximally entangled states are the singlet state∣sing⟩=21(∣UD⟩−∣DU⟩) and the triplet states 21(∣UD⟩+∣DU⟩),21(∣UU⟩+∣DD⟩) and 21(∣UU⟩−∣DD⟩).
So inside a maximally entangled state: 1) An entangled state is a complete description of the combined system and nothing else can be known about it, and 2) In a maximally entangled state, nothing is known about the individual subsystems.
Recall the spin-polarization principle: this holds for all product, states but does not hold for ∣sing⟩. In fact, we can show that ⟨σx⟩=⟨σy⟩=⟨σz⟩=0 because
so τzσz∣sing⟩=−∣sing⟩, or ∣sing⟩ is an eigevector of τzσz with eigenvalue −1. We can check that this is also true when we replace z with x or y.
So we need an apparatus that measures in terms of σ⋅τ, instead of measuring one component at a time. How? Sometimes the Hamiltonian of neighboring spins is proportional to σ⋅τ, so we just need to measure the energy of the atomic pair.
Why are they called singlets and triplets? The singlet is an eigenvector with one eigenvalue, and the triplets are all eigenvectors with a different degenerate eigenvalue.
Lecture 7 - More On Entanglement
Outer Products and Density Matrices
We can also form the outer product∣ψ⟩⟨ϕ∣, which is a linear operator that acts on a ket by ∣ψ⟩⟨ϕ∣∣A⟩=∣ψ⟩⟨ϕ∣A⟩ and acts on a bra by ⟨B∣∣ψ⟩⟨ϕ∣=⟨B∣ψ⟩⟨ϕ∣. The case ∣ψ⟩⟨ψ∣ is called a projection operator, which is Hermitian and satisfies the following: ∣ψ⟩ is an eigenvector of its projection operator with eigenvalue 1; any vector orthogonal to ∣ψ⟩ is an eigenvector with eigenvalue 0; ∣ψ⟩⟨ϕ∣2=∣ψ⟩⟨ϕ∣; the trace is 1; ∑i∣i⟩⟨i∣=I where i is a basis; and lastly, ⟨L⟩=Tr∣ψ⟩⟨ψ∣L.
The reason why we care is because the density matrix defined as ρ=21∣ψ⟩⟨ψ∣+21∣ϕ⟩⟨ϕ∣, which is useful because ⟨L⟩=TrρL. We can extend this to n states easily.
Suppose Alice knows a wave function Ψ(a,b), but wishes to extract as much knowledge about a without caring about b. Then
and we can write the information as a matrix ρaa′=∑bΨ∗(a′b)Ψ(ab). This shows that we can remove Bob's information without removing any of Alice's information; we can take La′b′,ab=La′aδb′b to "filter out" Bob's information, which gives
To know Alice's density matrix, we must first know the entire wave function. After that, we can disregard the rest and still compute everything about Alice using just her density matrix.
Here are some properties about density matrices: they are Hermitian; Tr(ρ)=1; the eigenvalues lie between 0 and 1; for a pure state: ρ2=ρ,Tr(ρ2)=1; for a mixed or entangled state, ρ2=ρ,Tr(ρ2)<1.
For a quick example: for the state vector ∣Ψ⟩=21(∣UD⟩+∣DU⟩), the density matrix is (210021).
Tests for Entanglement
The correlation test: there is no entanglement if and only if ⟨AB⟩−⟨A⟩⟨B⟩=0.
The density matrix test: If the composite Alice-Bob system is ain a product state, then Alice's density matrix has exactly one eigenvalue equal to 1, and the others are 0. Additionally, the eigenvector with a nonzero eigenvalue is nothing but the Wave function of Alice's half of the system.
One issue with quantum mechanics is that in an experiment, the apparatus does not "know" the spin state until Alice looks at it. But once she does, the wave function collapses. If we bring in Bob to consider Alice, the apparatus, and the spin as one system, once he looks at this system, their wave function collapses. And so on...
Quantum mechanics does not violate locality, which states that it is impossible to send a signal faster than the speed of light. Let Alice's density matrix be ρaa′=∑bψ∗(a′b)ψ(ab) and Ubb′ be the unitary matrix for whatever happens to the entire system when Bob does his experiment. We get ψfinal(ab)=∑b′Ubb′ψ(ab′)⟹ψfinal∗(ab)=∑b′′ψ(a′b′′)Ub′b′′†, but when we plug this in, we still get the original ρaa′=∑bψ∗(a′b)ψ(ab) because U is unitary.
Bell's Theorem
Consider a video game that tries to fool you into thinking there is a quantum spin in a magnetic field inside the computer, and we get to experiment to test this. The computer stores two normalized complex numbers αU and αD. At the start of the game, the computer initializes these values, then solves the Schrödinger equation to update the α's like they were the components of the spin's state-vector. We are allowed to manipulate the angle of the apparatus.
What if we do this on two computers? As long as they are connected throguh a cable and the computers can send messages instantaneously, we are good. But disconnecting the cable destroys the simulation.
This is essentially Bell's theorem: classical computers need to be connected with an instantaneous cable to simulate entanglement. This is not a problem about quantum mechanics, but rather a problem with simulating quantum mechanics inside classical computers.
Lecture 8 - Particles and Waves
Interlude: Mathematics
When we consider functions as vectors, we form a Hilbert space. But we need to modify things in three ways: 1) integrals replace sums so ⟨Ψ∣Φ⟩=∫−∞∞ψ∗(x)ϕ(x)dx 2) probability densities replace probabilities so P(a,b)=∫abψ∗(x)ψ(x)dx, and 3) Dirac delta functions replace Kronecker deltas, or a function δ(x−x′) with the property that for any function $F(x),
∫−∞∞δ(x−x′)F(x′)dx′=F(x).
For large values of n, it's approximated by πne−(nx)2.
In quantum mechanics, the limits of integration span the entire axis and our have function →0 at ∞ to be normalized, so we can rewrite integration by parts as:
∫−∞∞FdxdGdx=−∫−∞∞dxdFGdx.
Consider X and D, the multiply by x operator and differentiation operators. We can now see that X is Hermitian and D is anti-Hermitian (D†=−D). We can make D Hermitian by −iℏD, which satisfies
−iℏDψ(x)=−iℏdxdψ(x).
Eigenstuff for Position and Momentum
For X, every real number x0 is an eigenvalue of X, and the corresponding eigenvectors are functions that are infinitely concentrated at x=x0. Additionally, we have ⟨x0∣Ψ⟩=∫−∞∞δ(x−x0)ψ(x)⟹⟨x∣Ψ⟩=ψ(x), and we call ψ(x) the wave function in the position representation.
Define the position operator P=−iℏD. The eigenvectors are ψ(x)=2π1eℏipx with eigenvalue p. Now, note that ⟨x∣p⟩=⟨p∣x⟩∗. We can now see that the wavelength of eℏipx is given by λ=p2πℏ, which is one reason we call it a wave function. Let's call ψ~(p)=⟨P∣Ψ⟩ the wave function in the momentum representation.
The relationship between the two representations is given by the Fourier transforms
We have [X,P]=iℏ (which ⟹{x,p}=1 in the classical case). Specializing the general uncertainty principle from earlier to this case gives the famous Heisenberg's Uncertainty Principle:
ΔXΔP≥2ℏ
Lecture 9 - Particle Dynamics
Kinematics
How do particles move in quantum mechanics? Plugging H=cP gives
∂t∂ψ(x,t)=−c∂x∂ψ(x,t)
so any ψ(x−ct) is a solution. Since we want ∫−∞∞ψ∗(x)ψ(x)dx=1, ψ must look like a wave packet. All together, this particle can only move to the right, the energy can be either positive or negative, and the particle can only exist in a state when it moves at this particular velocity. The classical description is that momentum is conserved and the position moves with fixed velocity c; the quantum analog is the whole probability distribution and the expected value move with velocity c.
If we add in the Hamiltonian, we can write H=21mv2=2mp2, and messing around with some algebra gives us the traditional Schrödinger equation for an ordinary nonrelativistic free particle:
iℏ∂t∂ψ=−2mℏ2∂x2∂2ψ.
To solve the time-dependent Schrödinger, we need to solve the time-independent version first. The function ψ(x)=eℏipx solves the independent version, and we can use this to solve the time-dependent version:
ψ(x,t)=∫ψ~(p)(expℏi(px−2mp2t)dp.
Let's look at the quantum analog of v=mp. We have
v=dtd∫ψ∗(x,t)xψ(x,t).
From lecture 4, we have
v=2mℏi⟨[P2,X]⟩=2mℏi⟨P[P,X]+[P,X]P⟩
and using [P,X]=−iℏ gives ⟨P⟩=mv.
As we saw, we can take a classical system, replace the classical phase space with a linear vector space, replace x with X and p with P, and use the Hamiltonian to solve the time-dependent equation (how the wave function changes with time) or the time-independent equation (to find the eigenvectors and eigenvalues of the Hamiltonian). This process is known as quantization.
Forces
Classically,
F(x)=mdt2d2x=−∂x∂V.
Quantization tells us to make H=2mP2+V(x) and modify Schrödinger to
iℏ∂t∂ψ=−2mℏ2∂2ψ∂x2+V(x)ψ=Eψ.
We can show that X,V(x)]=0 and
dtd⟨P⟩=2mℏi⟨[P2,P]⟩+ℏi⟨[V,P]⟩.
Using [V(x),P]=iℏdxdV(x) we get dtd⟨P⟩=−⟨dxdv⟩.
This shows that the classical equations are only approximations that are good when we can replace the average of dxdV with the average of x. We can do this when V(x) varies slowly compared to the size of the wave packets.
For an example of a "bad potential", consider a bunch of large closely packed spikes of size δx with δx<Δx. The Heisenberg Uncertainty Principle tells us Δx∼mΔvℏ, which shows that large masses and smooth potentials work more classically; particles with low mass moving through an abrupt potential behave like a quantum mechanical system. When the equality of Heisenberg's Uncertainty Principle holds, these are the Gaussian wave packets, which we'll discuss later.
Path Integrals
Classically, the action is A=∫t1t2L(x,x)dt=∫t1t2(2mx2−V(x))dt. The quantum analog question is: given a particle starts at (x1,t1), what is the amplitude C1,2 that it shows up at (x2,t2)? We have
which is the extremely powerful path integral formulation by Feynman.
Lecture 10 - Harmonic Oscillator
Classical vs Quantum
Classically, L=21my2−21ky2=21x2−21ω2x2 where x=my, ω=mk. The Lagrangian is ∂x∂L=dtd∂x˙∂L⟹x=Acos(ωt)+Bsin(ωt).
We have p=∂x˙∂L=x˙ so H=px˙−L=21x˙2+21ω2x2=21p2+21ω2x2. For the quantum mechanical version, we have H∣ψ(x)⟩⟹−2ℏ2∂x2∂2ψ(x)+21ω2x2ψ(x). Plugging this into the time-dependent Schrödinger gives
i∂t∂ψ=−2ℏ∂x2∂2ψ+2ℏ1ω2x2ψ.
Energy Levels
We can also calculate the energy levels by solving the time-independent Schrödinger:
−2ℏ2∂x2∂2ψE(x)+21ω2x2ψE(x)=EψE(x).
But there are a bunch of nonsensical solutions that make no sense physically, we need to impose conditions such as: physical solutions of the Schrödinger equation must be normalizable.
The lowest energy level is the ground stateψ0(x). To identify this, there is a theorem (left unproved): the ground-state wave function for any potential has no zeros and it's the only energy eigenstate that has no nodes. With a huge amount of algebra we can reduce the Schrödinger to
2ℏωe−2ℏωx2=Ee−2ℏωx2.
We can write E0=2ωℏ and ψ0(x)=e−2ℏωx2.
Creation and Annihilation
Take H=21(P2+ω2X2)=21(P+iωX)(P−iωX)+2ωℏ. Consider the lowering operator a−=2ωℏi(P−iωX) and the raising operator a+=2ωℏ−i(P+iωX). We can now rewrite H=ωℏ(a+a−+21). Additionally, we have [a−,a+] and defining N=a+a− gives H=ωℏ(N+21), as well as the relations [a−,N]=a− and [a+,N]=−a+. Since a+∣n⟩=∣n+1⟩ and a−∣n⟩=∣n−1⟩, we can show that En=ωℏ(n+21).
Wave Functions, Again
We have 2ωℏi(P−iωX)ψ0(x)=0⟹dxdψ0=−ℏωxψ0(x) which has solution e−2ℏωx2. Applying the raising operator gives ψ1(x)=2iωxψ0(x), and we can do this as many times as we want to get ψn(x), and the polynomials in this sequence are called the Hermite polynomials.
These eigenfunctions also show quantum tunneling: they approach zero asymptotically but never reach 0, so there's a chance the particle is "outside the bowl" that defines its potential energy function.