DGYA proud WE-ARE-NOT-DOOMED fan.
https://dgyblog.com/
Extreme Learning Machines<p>Recently I’m studying this idea called “Extreme Learning Machine”. I dated with this method around 2 years ago, and at that time I was using this method to help me quickly classifying some features. And I didn’t really dive into this.</p>
<p>Extreme Learning Machine (ELM) is a fairly simple method. It is a generalization of Single-hidden Layer Feedforward Neural Networks (SLFNs). You just have to project your data to hidden layer by random weights and then compute the target using least-square solutions. All of this suggest that it’s a simple regression model.</p>
<p>ELM is known for its simplicity, short running time and unusual performance. These three maybe the most problematic part in conventional models. MLP or Convolutional Neural Networks (CNNs) are very painful to train. RBM methods are even more painful without proper optimization. However before I write something about ELM, I need to tell my major concerns:</p>
<ul>
<li>
<p>Are there any reasons that major Machine Learning community distanced itself with ELM? This probably a strong word. As I know of, NTU in Singapore invented this method, and they have even their own ELM conference in China. However, we couldn’t find many discussion over this method. And people are not taking so serious with it.</p>
</li>
<li>
<p>The big doubt of this method is that it’s heavily computing inverse of a square matrix. Well, it’s a product of your cost function’s solution, it’s normal in computing. However, we usually don’t want this in learning algorithm because most inverse computing kit are having limited precision so your performance is also limited. And that is pretty much why we like Gradient Descent.</p>
</li>
<li>
<p>ELM is designed for SLFNs, therefore, by default, it has only 1 hidden layer. We can use some other ways to make it deep, however, it’s not natural. You need to train the model like a Stacked Auto-encoders using Greedy Layer-wise Training to construct a deeper net.</p>
</li>
</ul>
<p><strong>While I’m writing this note, I received many comments from other researchers’ comments on ELM. I felt that I should list down all the perspectives so that you can have a complete view of this method.</strong></p>
<ul>
<li><a href="https://plus.google.com/+YuhuangHu/posts/MoEFUJu7938">My post to Deep Learning community at Google+</a></li>
<li><a href="https://www.facebook.com/yann.lecun/posts/10152872571572143">Yann LeCun’s recent comments on ELM</a></li>
<li><a href="http://theanonymousemail.com/view/?msg=ZHEZJ1AJ">The ELM Scandal</a></li>
<li><a href="http://www.reddit.com/r/MachineLearning/comments/34u0go/yann_lecun_whats_so_great_about_extreme_learning/">Reddit’s discussion on Yann LeCun’s comments</a></li>
<li><a href="http://www.reddit.com/r/MachineLearning/comments/34y2nk/the_elm_scandal_a_formal_complaint_launched/">Reddit’s discussion on The ELM Scandal</a></li>
<li><a href="http://www.ntu.edu.sg/home/egbhuang/pdf/ELM-Rosenblatt-Neumann.pdf">What are Extreme Learning Machines? ELM inventor’s fight back</a></li>
<li><a href="http://libgen.in/scimag/get.php?doi=10.1016%2Fj.neunet.2014.10.001">Trends in extreme learning machines: A review</a></li>
</ul>
<p>There are thousands ways of criticizing ELM, and there are also another thousands ways of supporting the method. Bottom line, in current scale of data, it’s working. It may not work in large scale (Never saw it produced comparable results in large datasets such as ImageNet or MIT Places), but it may not be wasting time. The potential of random weights are now fully discovered yet and there is no way to tell current ANN models are plausible in biological system (definitely not, if our brain is firing like this, then we are screwed).</p>
<h3 id="extreme-learning-machine-basic-story">Extreme Learning Machine: basic story</h3>
<p><em>You may find different terminology that is used by ELM papers in this section, but the idea is the same</em></p>
<p>ELM is a SLFN. Let \(W_{i}\) as input weights, \(X=\{x^{(1)}, x^{(1)}, \ldots, x^{(N)}\}\) as input samples, \(b\) as bias. Then the hidden activation \(H\) is computed by</p>
\[H=f(X\cdot W_{i} + b)\]
<p>where \(f(\cdot)\) is activation function.</p>
<p>Given output weight \(W_{o}\) and output target \(T\), ELM is to minimize following cost function:</p>
\[J(H, T, W_{o})=C||H\cdot W_{o} - T||_{p}^{\sigma_{1}}+||W_{o}||_{q}^{\sigma_{2}}\]
<p>where \(C\) is regularization term, \(\sigma_{1}>0\) and \(\sigma_{2}>0\), \(p\) and \(q\) is indicating the norm. The above function is clearly a Linear Regression formulation. Note that unlike conventional neural networks, there is no activation function (or say it’s a linear activation) to output layer. It’s not hard to say that we can use Gradient Descent to find the representation of \(W_{o}\). However, ELM offered an analytical solution when \(p=q=\sigma_{1}=\sigma_{2}=2\):</p>
\[W_{o}=\left(\frac{I}{C}+H^{\top}H\right)^{-1}H^{\top}T\]
<p>And this is the entire story of basic ELM.</p>
<h3 id="elm-auto-encoder">ELM Auto-encoder</h3>
<p>ELM is trying to learn a mapping between randomly projected weights and target. This makes itself as a decoder. Therefore, we can modify cost function a little, so that it can learn a mapping between signal and projected weights.</p>
\[J(H, X, W_{o})=C||H\cdot W_{o} - X||_{p}^{\sigma_{1}}+||W_{o}||_{q}^{\sigma_{2}}\]
<p>This makes ELM is acting like an auto-encoder. And transformation of output weight matrix \(W_{o}^{\top}\) is a encoder.</p>
<h3 id="stacked-elm-auto-encoders">Stacked ELM Auto-encoders</h3>
<p>Like Stacked Auto-encoders (SAEs), you can use the same principle as Greedy Layer-wise Training to train a multi layered ELM network. At the end you will have a unsupervised feature extractor (or a supervised network if you plug a normal ELM at the end of the network). And it doesn’t need any further tuning.</p>
<h3 id="local-receptive-field-elm-lrf-elm">Local Receptive Field-ELM (LRF-ELM)</h3>
<p>As we know, ELM is essentially a generalization of SLFNs. Therefore, all the variants of SLFNs can apply the same learning principle as ELM. And we can extend this theory to Convolutional Neural Networks (ConvNets). The basic understanding of ConvNet is that instead of learning complex function of entire receptive field, it learns representation from small region of the receptive fields. And in results, you need to have many feature maps in order to obtain better results.</p>
<p>Assume that you understand the idea of ConvNets, LRF-ELM firstly initialized \(K\) random filters (you can also orthogonalize these weights by using SVD), then it computes feature \(f^{(k)}\) by:</p>
\[f^{(k)}=F\left(X* w^{(k)}\right)\]
<p>where \(w^{(k)}\) is the \(k\)-th random filter and \(F\) is the activation function. After this, you can then apply a pooling operation to the feature maps.</p>
<p>The rest story is simple then, you can simply flatten the feature maps and learn target by previous mentioned equation.</p>
<h3 id="lrf-elm-auto-encoder">LRF-ELM Auto-encoder</h3>
<p>The solution to LRF-ELM Auto-encoder is not so obvious once you tried to figure it out. Because LRF-ELM is similar to ConvNet, therefore you couldn’t use the same solution as previous to derive the result. Here after some investigation, I figured a way of learning filters in unsupervised learning manner.</p>
<p>Consider we have \(K\) filters (size doesn’t matter), then the feature map \(f^{(k)}\) is calculated as</p>
<p>\(f^{(k)}=F\left(X* w^{(k)}\right)\)
where \(w^{(k)}\) is the \(k\)-th random filter and \(F(\cdot)\) is the activation function.</p>
<p>In the decoding stage, the filters can be learned by using feature maps as filter. And learned filter \(\hat{w}^{(k)}\) is computed by</p>
\[\hat{w}^{(k)}=X* f^{(k)}\cdot \left(\frac{1}{\frac{I}{C}+\sum_{i,j}\left(f_{i,j}^{(k)}\right)^{2}}\right)\]
<h3 id="incremental-learning-of-elm">Incremental Learning of ELM</h3>
<p>One of the problem of original ELM is that it doesn’t mention anything about dimension of the data. And in most of recent cases, it’s almost impossible to compute with entire data, therefore, we must find a way of learning output weights incrementally.</p>
<p>In previous paper <a href="http://www.ntu.edu.sg/home/egbhuang/pdf/OS-ELM-TNN.pdf">A Fast and Accurate Online Sequential Learning Algorithm for Feedforward Networks</a>, it offered a nice solution to this problem, you can update the output weight by:</p>
\[P_{k+1}=P_{k}-P_{k}H_{k+1}^{\top}\left(I+H_{k+1}P_{k}H_{k+1}^{\top}\right)^{-1}H_{k+1}P_{k}\]
\[W_{o}^{k+1}=W_{o}^{k}+P_{k+1}H_{k+1}^{\top}(T_{k+1}-H_{k+1}W_{o}^{k})\]
<p>where \(P_{k}=K_{k}^{-1}\). \(K_{0}\) is \(H_{0}^{\top}H_{0}\), \(W_{o}^{0}\) is \(K_{0}^{-1}H_{0}^{\top}T_{0}\) and</p>
\[K_{k+1}=K_{k}+H_{k+1}^{\top}H_{k}\]
<p>You might think that if this is the way of computing output weights, then the LRF-ELM is a real mess. Turns out, it’s even simpler since \(K\) in LRF-ELM is a scalar. And it’s inverse is \(\frac{1}{K}\).</p>
<p>Another way of updating output weights is suggested by another paper: <a href="http://arxiv.org/abs/1412.8307">Fast, simple and accurate handwritten digit classification using extreme learning machines with shaped input-weights</a>. Instead of updating output weights, it simply updating to component of learning function: \(H^{\top}H\) and \(H^{\top}T\). In this way, you will only maintain this two fixed size matrix, and the output weights can be computed anytime from this two components. The updating rule is easy to derive:</p>
\[K_{k+1}=K_{k}+H_{k+1}^{\top}H_{k}\]
\[A_{k+1}=A_{k}+H_{k+1}^{\top}T_{k+1}\]
<p>where \(K_{0}=H_{0}^{\top}H_{0}\) and \(A_{0}=H_{0}^{\top}T_{0}\)</p>
<h3 id="maybe-another-way-of-training-elm">Maybe another way of training ELM</h3>
<p>So far we reviewed and explored variants of ELM. However, it doesn’t really fit in conventional Machine Learning where we use back-propagation and everything. The last section of incremental learning is somewhat “Machine Learning”-ish. I still felt it’s not natural enough.</p>
<p>This brought me to a recent proposed method, it is described in <a href="http://arxiv.org/abs/1411.0247">Random feedback weights support learning in deep neural networks</a>. I admired the approach that it is described. Without computing the gradient of weights and bias, one can still use back-propagation algorithm to train a neural network. With this approach, one can train MLP, ConvNets and other Feedforward Network variants without any trouble.</p>
<h3 id="generalizing-elm">Generalizing ELM</h3>
<p>I should say this again, the original proposal of ELM is a method that tries to generalize SLFNs. And it clearly tries to characterize either the output target or the signal itself. But it is not cleaning the data. In fact, most neural networks in the market does not try to clean the signal before they try to characterize it. The hope of cleaning data is on hierarchy structure and the abstract features is hoped to be clean. Have an ELM ever tried to characterize hidden activation itself? And instead of a hidden feedforward layer, can we replace it as a recurrent hidden layer? And if in this assumption, does ELM have to be a single-layered structure?</p>
<p>Okay, so far, like other neural networks, ELM tries to model either a target (classification labels or a regression target) or the input itself. This is conventional. However, what if we replace the target as hidden activation itself? So the cost function looks like this:</p>
\[J(H, W_{o})=C||H\cdot W_{o} - H||_{p}^{\sigma_{1}}+||W_{o}||_{q}^{\sigma_{2}}\]
<p>Is this even making any sense? Every one will say that in this case \(W_{o}\) is basically a regularized identity map. It’s very clear that the cost will decrease to 0 if \(W_{0}\) is an identity matrix. So, if it’s an identity matrix, why would I even try to learn it?</p>
<p>Well, wrong. There are plenty things you can discover. If you replace \(T\) as \(H\) in your ELM’s solution, it’s not hard to discover that actually this \(W_{o}\) is closely related to hidden activation’s correlation matrix. And, why does ELM’s hidden layer have to be a feedforward hidden layer? Can it be a recurrent layer? The answer is YES. And by the way, this whole thing is called conceptor network. It is developed by Herbert Jaeger from Jacobs University Bremen, you can find the complete technical paper from <a href="http://minds.jacobs-university.de/conceptors">here</a>. I think Professor Jaeger himself didn’t realized that there is a close relative of this idea that has been invented for almost 10 years.</p>
<p>Alright, so you can extend ELM to conceptors, what’s the big deal? The BIG DEAL is that ELM and it’s theory doesn’t have to be only a generalization of SLFNs, it can be a multi-layer hierarchy that can automatically clean and learn pattern of the data, offers a way of control neural dynamics naturally and biologically plausible. Well, I’m not going to review the full context of the naive conceptor formulation, but directly pay attention on its final offer of the entire theory: Random Feature Conceptor. (The following context needs understanding of naive conceptor and auto-conceptor). In following context, I’m using conceptor’s terminology.</p>
<p>The basic idea is to project reservoir state to a large feature state, so the formulation of the network is:</p>
\[r(n+1)=\tanh(Gz(n)+W^{\text{in}}p(n)+b)\]
\[z(n+1)=\text{diag}(c(n))F'r(n)\]
<p>where \(r(n)\in\mathbb{R}^{N\times 1}\), \(G\in\mathbb{R}^{N\times M}\), \(z(n)\in\mathbb{R}^{M\times 1}\), \(p(n)\in\mathbb{R}^{L\times 1}\), \(c(n)\in\mathbb{R}^{M\times 1}\), \(F'\in\mathbb{R}^{M\times N}\).</p>
<p>Now it’s easy to formulate the cost function as previous mentioned:</p>
\[J(z, c)=\frac{1}{M}\sum_{i}||z_{i}-c_{i}z_{i}||^{2}+\frac{\alpha^{-2}}{M}\sum_{i}c_{i}^{2}\]
<p>You can also calculate the fixed-point solution in this way:</p>
\[c_{i}=E\left[z_{i}^{2}\right] (E[z_{i}]+\alpha^{-2})^{-1}\]
<p>And of course you can adapt the entire \(c\) using stochastic gradient descent.</p>
<p>Well, here is a drawback, and it’s a serious one: it’s very hard to scale up, especially for high-dimensional data like images. Withe millions of images, the system is going to compute between very large matrices.</p>
<p>At this point, we don’t really understand the full power of conceptor network and ELM yet. But one thing is for sure, there is no evidence where ELM or conceptor network is succeeded in large datasets, and this is also what we mostly worried.</p>
Tue, 14 Aug 2018 00:00:00 +0000
https://dgyblog.com/notes/2018/08/14/elm/
https://dgyblog.com/notes/2018/08/14/elm/Deep Learning Book Notes<h2 id="7-regularization-for-deep-learning">7. Regularization for Deep Learning</h2>
Mon, 17 Apr 2017 00:00:00 +0000
https://dgyblog.com/notes/2017/04/17/deep-learning-book-notes/
https://dgyblog.com/notes/2017/04/17/deep-learning-book-notes/Working with Terminal<p>I’ve always been a fan of terminal since the first time I hooked my first laptop with Ubuntu.
I know, I know, there are lots of OS (Open Source) people out there probably shouting about how much evilness in Ubuntu. But we were talking about Ubuntu 10.10 where Ubuntu is just an innocent, stable Linux-based Operating System.
About 2 years later, I got my first MacBookPro, which is a great machine. I used a year with it and then it became heavy and was proposing all
sort of errors. Well, subsequently, I bought a new Mac, which is the one I’m typing this blog.</p>
<p>After I exposed to <code class="language-plaintext highlighter-rouge">*nix</code> OS family, the growing interests of using terminal is just unstoppable. I mean, it’s not like I want to be
a nerd or geek. By using terminal alone, I can really improve my workflow by a lot.</p>
<p>I’ve used a lot of tools though my years of being a programmer. I started with Windows, so I’ve tried with Free Pascal (do anyone know what is this now?), DEV-C++, Visual Studio (like everyone did), Eclipse. And Eclipse is still the only IDE I’ve ever used in all three platforms. I also used Emacs for a pretty long time. After I’m so tired of editing my Emacs profile, I switched to a more “modern” solution: Atom.</p>
<p>I’ve to say, Atom is a fine editor. It’s easy to use, easy to configure, easy to extend and easy to modify. And I like this idea of developing desktop application via core browser service plus JavaScript. This is a total new pattern of designing stuff. And of course, if you have a entire HTML engine there for you, you can make it as pretty as possible right?</p>
<p>I thought I’m done, I’m gonna be a hardcore Atom fan after I spent few weeks of configuring Atom as I needed.
Then there is HackZurich 2016. I met a guy there, without any love story in between, I witnessed a person who can do his job entirely within terminal.This is not how I do my job before. I mean, even though I used Emacs and tried to use Vim, my first thought was to install their GUI version.</p>
<p>For all these years, I’m trying to treat terminal as a individual component in my work flow, not the manager of it.
And by witnessing that, I decided that I will have to change.</p>
<p>OK, OK, enough history talk. So in this entire post, I’m gonna review my recent attempt of closing a loop: working entirely in terminal.
And by this point, I’m pretty happy about what I’ve done. And I hope every sensible developer would also do the same thing.</p>
<p>So, these are what I needed:</p>
<ul>
<li>iTerm: a much more powerful terminal emulator than Terminal.app</li>
<li><code class="language-plaintext highlighter-rouge">zsh</code>: so much better than MacBook’s <code class="language-plaintext highlighter-rouge">bash</code> by the way</li>
<li>Tmux: a terminal multiplexer that sometimes can be very handy</li>
<li>Vim: yes, I finally picked up Vim, and turns out after the training I got from Atom, this time it’s not so hard</li>
<li><code class="language-plaintext highlighter-rouge">macman</code>: Frequently using commands in shell script I wrote for myself (inspired by <a href="https://github.com/guarinogabriel/Mac-CLI">Mac-CLI</a>)</li>
<li><code class="language-plaintext highlighter-rouge">linuxman</code>: Similar to <code class="language-plaintext highlighter-rouge">macman</code>, and offer me the same toolkit as I work in Mac.</li>
</ul>
<p>And.. that’s it, or at least I couldn’t think about what else do I need to do my jobs.</p>
<h2 id="iterm">iTerm</h2>
<p>I don’t need many apps. I’ve used iPhone for over a year, iPad for more than 2 years, and MacOS more than 3 years. But I still manage to have only two panes of apps. I generally against the idea of installing more apps since this would pollute my system. I need my system to be clean at all the time.</p>
<p>This is why it took me around 2 weeks to decide installing it. And now it’s been my default terminal emulator for over a month. I configured as a Ubuntu terminal outlook. And my bash highlight scheme is a perfect match.</p>
<p>If you care about your terminal at all, you should change to iTerm. It’s not only that it’s a better terminal, but also for much more funny functions.</p>
<h2 id="zsh"><code class="language-plaintext highlighter-rouge">zsh</code></h2>
<p>I’ve used <code class="language-plaintext highlighter-rouge">bash</code> as my default shell client for many years. I’ve never switched to different shell client. I knew they are out there by the way.</p>
<h2 id="vim">Vim</h2>
<h2 id="macman-and-linuxman"><code class="language-plaintext highlighter-rouge">macman</code> and <code class="language-plaintext highlighter-rouge">linuxman</code></h2>
<p>(TO BE CONTINUED)</p>
<p>This is some text that I wanted to change so that I know the website can update.</p>
Sun, 16 Oct 2016 00:00:00 +0000
https://dgyblog.com/work/life/2016/10/16/use-terminal/
https://dgyblog.com/work/life/2016/10/16/use-terminal/What I did and what I didn't<p>OK, so the last post I wrote (contains only one sentence in Chinese) is about
7 months ago. Since then, a lot of things have changed around me.</p>
<p>First, I’ve moved to Zurich, a beautiful town, very expensive, and not as boring
as I thought. I’ve started my journey to my Master degree here at INI, <a href="https://www.ini.uzh.ch/people/yuhu">check
out my profile photo</a>. Somehow they chose to
use my middle name (which I don’t have one) and my last name as my ID.
Well, it’s not exactly the ugliest ID I’ve ever had, but close. I got 4 more
email addresses (thanks to independent IT infrastructure everywhere). And
surprisingly more than one homepage and disk resources that I can dispose.
I’ve met many fantastic people (including Professors, Drs, PhDs, masters, etc).
Not much social life back to my student house since I’m not that into partying,
and also not much on cooking.</p>
<p>(<em>Side Thought: Am I giving too much information to myself? Is this gonna
backfire me when I’m famous someday? Am I thinking too much?</em>)</p>
<p>I’m not much a blogging guy anymore (that guy is dead about 4 years ago).
I switched from Emacs to Atom. And I love Atom now. For one big reason, I can
write more standardized code than using Emacs. I love Markdown by the way.
My English is still bad. But I’m clearly not the worst. I did my first
semester project. And I’m about to publish it (exciting!!). I like the lab,
everyone is helping everyone (if you ask). I attended my first set of classes,
some of them are difficult, some of them are not that difficult.
I did work more on course-work than I did in UM (basically sleeping all the time
when I was an undergraduate). I attended countless seminars (thanks to INI’s
arrangements). So far so good.</p>
<h3 id="updates-on-july-5th">Updates on July 5th</h3>
<p>I have moved to a new place. It’s lovely. I still have lots to figure out,
somehow I also survived by using daily data plan.</p>
Thu, 05 May 2016 00:00:00 +0000
https://dgyblog.com/life/2016/05/05/what-i-did-and-didnt/
https://dgyblog.com/life/2016/05/05/what-i-did-and-didnt/天明集录三 接舆<p>近因学校庆祝周年庆典</p>
Fri, 02 Oct 2015 00:00:00 +0000
https://dgyblog.com/review/2015/10/02/tianming-3/
https://dgyblog.com/review/2015/10/02/tianming-3/The day I leave WordPress<p>I’ve been using WordPress as my primary blogging system for almost 10 years. And today, I have to say goodbye.</p>
<p>I started to know WordPress when I tried to setup my personal blog and was searching for best blogging service provider. And it’s tough, there were Baidu, Microsoft, and bunch of other providers. Then I found out that there is an impressive blog client. And that is the first time I tried WordPress.</p>
<p>I’m not a natural blogging person. For start, I don’t really write about everything. I have other ways of expressing my emotion and feelings. So it’s always my secondary way of writing things.</p>
<p>OK, enough history, to you (my beloved readers) and to me. Let’s talk about the recent problem. I hosted my old WordPress powered blog on my host and with my domain. Few days ago, my provider reached me and say:”Man, we have to suspend your site for awhile, the spam is not like spam, it’s more like attack.” Honestly, I was on something else, so I replied “sure”. And I decided to change my entire site to pure HTML site. Since I’ve fought with spam for too long, I need to find a secure way of doing the comment block.</p>
<p>My plan is to change everything on my GitHub and host everything there. It’s very obvious, GitHub has better strategy of fighting with hostile behavior and DisQus can provide even better comment system. I actually no feeling if anyone would leave message below my post, however, I got to find out if I can really stop this.</p>
<p>It’s more secure too. I don’t have to worry that if I’m hacked (as long as they don’t have my passwords). It’s pure HTML, no database, no anything. And I can control basically everything with minimal knowledge in HTML and CSS (well, a bit better than the minimal).</p>
<p>Anyway, like Taylor Swift’s song, SHAKE IT OFF, SHAKE IT OFF!</p>
<div align="center">
<iframe width="560" height="315" src="https://www.youtube.com/embed/nfWlot6h_JM" frameborder="0" allowfullscreen=""></iframe>
</div>
Tue, 29 Sep 2015 00:00:00 +0000
https://dgyblog.com/life/2015/09/29/the-day-i-leave-wordpress/
https://dgyblog.com/life/2015/09/29/the-day-i-leave-wordpress/弱爆了的我求助數學大神<p>最近做的工作中涉及到一個小的數學問題，想把它記錄在這里。</p>
<p>這個問題可以簡化為這樣：有一個包含\(n\)個隨機實數（\(a_{1}, a_{2}, \ldots, a_{n}\)）的集合\(A\)。我希望它滿足兩個條件。 條件1：所有元素都大於等於零。 條件2：所有元素之和（記為\(s\)）小於等於某個大於零的實數\(p\)。由於這個集合\(A\)的元素都是隨機的，為了使它滿足這兩個條件，我對它進行兩個操作。操作1：使\(A\)的所有元素都減去一個非負的實數\(b\)。操作2：使\(A\)的所有為負數的元素變為零。要求進行完操作1和操作2之後集合A滿足條件1和條件2。求\(b\)的最小值（記為\(b_{\min}\)）。</p>
<p>為了形象的說明問題，我舉個例子。比如現在這個隨機的集合\(A\)包含的元素是\(\{1,2,3\}\)。而我要求的\(p\)等於3。這時候發現\(A\)已經滿足條件1，但不滿足條件2。\(A\)所有元素的和\(s\)等於6，比3大。這時候我根據操作1對\(A\)的所有元素都減去一個數\(b\)。這裡顯然\(b\)的最小值是1。這樣減完之後\(A\)為\(\{0,1,2\}\)。\(s\)剛好等於3。操作2也不需要進行了因為\(A\)現在的元素沒有負數。</p>
<p>怎麼能證明例子中的\(b\)等於1就是\(b\)的最小值了呢？\(b\)比1稍微小一點,比如0.9，那么總和就會比3稍微大一點，於是也就不滿足條件了。而如果\(b\)比1稍微大一點，比如1.1，那么總和就比3小了，於是雖然滿足了條件但是我們有更小的\(b\)也可以滿足條件，所以1.1就不是最小值了。所以兩邊逼近的結果就是\(b\)等於1剛剛好滿足條件，也就是\(b\)的最小值。</p>
<p>然而這種描述既不算嚴謹的證明，也無法適用於更一般的情況。我們先不管如何嚴格找出\(b_{\min}\)的表達式，我們先來看看什麼是更一般的情況。</p>
<p>首先上面的例子根本沒有用到操作2。什麼時候需要用到呢？比如一個簡單的情況，\(A\)中原本就存在負數元素。這時候由於\(b\)是個非負數，所以所有元素減完\(b\)之後必定還存在負數元素。這時候就不符合條件1了。所以進行操作2，也就是使這些負數變為零。當然還有一種情況就是即使原本\(A\)中沒有負數元素，當進行完操作1之後，由於所有元素都減去了\(b\)，就有可能會使一些元素變為負數。這時候也要通過操作2來保證\(A\)滿足條件1。請注意，這個操作2是會影響到\(b\)的選取的。為什麼呢？因為操作2的過程可能會使元素總和\(s\)增大。因為本來是負數的變為零了么。而我認為這個問題的難點也正在於此。試想一下，假如我們在不考慮操作2的情況下已經找到了最小的\(b\)，然後減完發現有負數元素，不滿足條件1，於是進行操作2，然後發現總和\(s\)大於\(p\)，不滿足條件2，於是需要讓一開始的\(b\)變得更大。但是大多少呢？更大之後會不會產生新的負數呢？如果產生了新的負數豈不是又要調整\(b\)？這種情況下有沒有\(b_{\min}\)的非遞歸表達式呢？如果沒有，如何證明沒有呢？又該怎麼求呢？</p>
<p>我一開始分析這個問題的時候並沒有清晰的看到這兩個操作互相之間的影響。我只是寫了幾個簡單的例子去分析不同情況下（初始的\(A\)不滿足1或不滿足2或兩個都不滿足）\(b\)的選取會隨操作2怎樣的變化。然後逐步調整\(b\)來達到最佳表達式。當我自以為分析了所有情況的時候我高興異常的寫了個程序去進行一般化驗算。然後不出意料的出現了各種錯誤。然後我想到了一些我想法中的漏洞。然後細思極恐。因為當我意識到這兩個操作實際上一個是會使\(s\)減小（操作1）而另一個是會使\(s\)增大的（操作2）。所以我直覺的感覺到應該是沒有一般性的代數解。只能不斷的搜索\(b_{\min}\)了。搜索的話倒是很簡單，我們設置一個\(b_{\min}\)的下限0，再設置一個\(b_{\min}\)的上限，然後用二分法不停的搜索\(b_{\min}\)就行了。這樣雖然簡單暴力，但是不精確。要知道我們題目中的元素可都是實數啊。但是我確實想不到什麼好辦法了。</p>
<p>說點題外話。</p>
<p>作為很弱的我經常分析問題都是憑直覺的。我並沒有什麼很好的數學能力，無論從經驗上還是邏輯上。眼下導師特別想讓我自己推導很多式子，分析很多問題。並且給我很多好的數學材料去閱讀并一再強調遇到問題不要遲疑想問就問，他會盡量解答。當然除了數學上的，導師還很熱衷於給我講通信網絡的知識（畢竟我們的課題就是干這些的）。每每當我覺得我的英語表達還是不盡人意讓我有種有力使不上的感覺的時候，我更多的是欣喜於通過和導師討論快速的得到知識的快感。這樣的環境使我這個弱爆了的懶蛋有了一絲進取的精神。這是我最最最缺乏的元素了。希望這是個好的開始吧。</p>
Sun, 30 Aug 2015 00:00:00 +0000
https://dgyblog.com/yang/2015/08/30/help-me/
https://dgyblog.com/yang/2015/08/30/help-me/关上灯都一样<p>怎么会突然冒出来的呢，因为在颓废了一天之后闲逛网页的时候发现这个站的最后一篇文章竟是一年前的。整个2014年都没有一篇文章不免略显尴尬。好像白交钱供着以往的旧品一样，失去了存在的意义。因为旧品在哪都能供，死了的都要入土，除了那些花大价钱留在水晶棺材里的。我们这种旧品显然不值得受这样的待遇。所以既然还给着钱，就要做点事。哪怕撒野也好，反正是自己的地盘。</p>
<p>于是就来写了。要写什么也不知道。突然要写，却不知道写什么。这样的情况以前也是有的。然而以前总能写下去。现在却非要凭着点什么我怎么也要写点东西出来的感觉才能写了。相比以前，现在连凭空写点什么乱七八糟的都不太提得起兴趣了。写着写着总要泄气，然后就割弃了。上学的时候总还喜欢写。因为那时候总还需要天天握着笔。在语文课本上写的最多。突然想到什么就在课本旁边空白处写起来。当然语文课本上的再多也多不过历史政治课本上的。因为那些是老师强行让你写到空白处的。要背诵的。。。想想现在一年写的字可能还没有那时候一天写的多。当然说的是真正用笔写的。</p>
<p>买了个新电脑。犹犹豫豫，断断续续。说要买说了一年多。真正买的时候也挺突然的。说实话无论我现在如何去判断当时到底为何那么突然就买了电脑，我都无法判断出原因。结果就是买了。亚马逊上买的。联想的。没有带系统。现在正在用着。当时虽然买的快，但是也是做了比较的。做比较已然成了我买任何东西之前的功课。不过我一年也买不了多少东西就是了。做了比较自然买的就是自认为最好的。最好的其实不恰当。应该说是性价比很高的。做比较不光和德国其他产品进行比较，也和国内产品的价格进行了比较。发现的结果就是，在德国买电脑不一定就真的那么贵。当然别人买也不会像我这样费劲的去做功课。其实如果我有钱，就是很富裕那种，我也还是会做功课的。所以这和钱没关系。因为做功课会有成就感之类的东西。也算是购买的乐趣吧。可能和女生逛街感觉爽是一个性质的。总之就是买了个新电脑。</p>
<p>最近读研了。跑来亚琛工大读可能也还是从众心理做崇吧。大家都说要来，于是我也来了。来之前还雄心壮志来着。现在才三个月也已然无所谓了。这也正常。如果来之前就无所谓，现在估计已经辍学了。现在不像以前了。记得我刚上本科时候还嚷嚷着学校教学怎么怎么不好，专业怎么怎么不喜欢。现在也不嚷嚷了。然而不是因为现在的教学变好了，专业也开始喜欢了。恰恰相反，来了这边发现教学可能还不如本科时候的。专业更是没好感到爆。但是却不嚷嚷了。反正就是混毕业。毕业又不好混。嚷嚷还不如看会书。人家又没拉着你来，自己要来就自己给自己解释。不想解释就继续该干嘛干嘛呗。很多事就是这样。该干嘛干嘛。想看星星首先等天黑。急也没用，嚷嚷更没用。研究生就这么混吧。</p>
<p>去年十月开始接触教会。基督教会。每周聚一次。吃吃饭，唱唱赞美诗，查查经，聊聊天。每周一次，很少落下。今年十月来了亚琛后第一件事就是联系当地的华人教会。然后还是每周聚一次。后来渐渐熟了，就更多的参加聚会活动。教会的人都很好。教会的活动也都很好。基督教本身也是很好的。然而我还没有正式的加入。没有受洗。连决志祷告都没有做过。不过今年十月来亚琛后开始睡前祷告。一直坚持到现在。没有断过一天。我倒是真的从心里感觉基督徒的生活是好的。然而没有加入。也有人问我，什么时候加入。目前的话我还是没有真的想要加入。不加入却做着基督徒该做的事。我觉得这样的状态很好。然而不对。这样是肯定不对的。心理上不对，行为上也欠妥当。可是目前就这样吧。我发现我的人生越来越像一个大海上的去了船桨的小舟。就是随浪漂流而已。并不希望加入什么舰队。也没有目的地。活着，仅此而已。</p>
<p>写到这里觉得新的电脑键盘打字挺舒服的。现在要出去一趟。坚持每天出去一趟，哪怕倒个垃圾也是好的。马上出去想去转转家附近的那个FH校园。希望今天开着。一会回来再继续写吧。</p>
<p>散步中……</p>
<p>回来了。。。好冷。。。先洗个澡暖和一下。。。</p>
<p>洗澡中……</p>
<p>洗完了，暖和多了。其实外面也并没有很冷。前两天预报今天会是零下十一度来着。挺吓人的。但真到了今天也就零下一两度吧。昨天下了雪，今天路边的泥土里还能看到雪的印记。街上的车的外壳也都是一层雪。本来是黑色的车子都成了白色。但是也就是这样了。没有一脚踩下去咯吱响的积雪。也没有能反光的结冰的雪。沉睡了一天的身体走在外面反而有说不出的舒适感。这趟也走了很远吧。走了一个小时。本来说要去FH学校转转的。但是果不其然大门紧锁。虽然里面远处的楼里貌似有灯光，但是大门确实是紧紧关着的。但是不要紧。意料之中罢了。于是接着往前走，走没有走过的路。</p>
<p>我挺喜欢走路的。不是散步，也不是远行，更不是拖着疲惫的身躯上下学或去超市或逛街。我喜欢的走路就是漫无目的的走上一大段没走过的路。城市里的，一般是去郊区。刚来德国的时候是在一个小的不能再小的小镇上。我曾经描述过这个名叫Linnich的小镇有多小。现在能想到的一个描述方式是：这个镇子如果原比例制作成游戏地图都毫不费事。总之就是在那个小镇子上，我走过很多路。大多是夜里走的。走着走着就走出镇子了，然后就绕着丛林小路走，绕着高速公路走，绕着铁道走，绕着河流走，绕着田野走，绕着风车走，绕着一幢幢别墅走。遇到过野兔，山猫，刺猬，狐狸或者臭鼬之类的，还遇到过裸女。。。是的。裸女。当时多半是边听歌边走。脑子里也几乎不想东西。景色也没什么好看的。因为是夜里。荒郊野地几乎没有亮光，并没什么好看的。后来去了杜伊斯堡上本科。第一年在Moers住。这是个比Linnich大一点的城市。就大一点而已。刚去没多久就走遍市内了。然后就开始绕着城市走。城市周围几乎全是高速，没什么好走的。后来发现城市里有很多河。河水连着土壤，土壤扩展出公园之类的地方。很多野鸭子野鹅什么的。倒是也有了走的乐趣。在Moers就住了将近一年。之后就搬到杜伊斯堡了。杜伊斯堡是个无趣的城市。又破又没有古色。所以市内几乎没什么好走的。后来发现了森林。专门为徒步慢跑者准备的森林。沿着最长的一条路径可以走两个小时。很舒服。也可以直接通向附近的城市，那边很多同学居住。于是我也经常从森林里绕向他们家。坐一会后再从公路折回。很好的走路经历。再之后就到了现在的亚琛了。我住的地方往南走就是大片的树林。郊区的感觉。挺适合走路的。今天走的就是这样的路。</p>
<p>我走路的特点是，第一不能走回头路，第二不走重复的路，第三不能看地图。</p>
<p>一般是沿着一条路不管三七二十一走到头。走到不能再走。然后拐弯开始想办法折回。大方向把握住了怎么都能折回的。绕远一点也无关。但是绝对不走走过的路。有时候走着走着就和之前走过的路联系上了。于是这一片基本就在脑子里有了图像。以后也就不再走这一片了。去找新的没走过的路走。有时候走着走着看到公交站也蛮惊讶的。原来这趟车开到这啊，的感觉。公交站一般都有地图。有时候也会看看，然后觉得其实走了这么远在地图上也就一小块。走这些路的过程中也会期待很多特别的遇见。可惜几乎没发生过。可惜的很啊。</p>
<p>总觉得自己还不能把握未来。这一点不假。但问题是已经到了需要把握未来的年龄了。然而不想去把握。总想着日子到了一切自然会有定夺。但是很可能那被定夺的是现在的放弃所决定的。然而还是不想去把握。想着说家里都无牵无挂了自己就随便找个生活方式供着自己就行了。确实是一副听天由命的无所谓的脸。甚至连生存本身都无所谓。简直是废柴。。。这就是现实的我。几年如一日的不变的我。还没进社会就自己把自己的棱角磨平的我。每天都做梦且梦境必然比现实精彩的我。隔三差五颓废人生无聊走路的我。看什么都已经提不起兴趣，觉得关上灯都一样的我。</p>
<p>不写了。</p>
Mon, 29 Dec 2014 00:00:00 +0000
https://dgyblog.com/yang/2014/12/29/only-one/
https://dgyblog.com/yang/2014/12/29/only-one/Quantum Mechanics for Scientists and Engineers Notes 7<h2 id="1-angular-momentum">1. Angular Momentum</h2>
<h3 id="11-angular-momentum-operators">1.1. Angular momentum operators</h3>
<p>We will have operators corresponding to angular momentum about different orthogonal axes \({\hat{L}_{x}}\), \({\hat{L}_{y}}\), and \({\hat{L}_{z}}\) though they will not commute with each other, in contrast to the linear momentum operator \({\hat{p}_{x}}\), \({\hat{p}_{y}}\) and \({\hat{p}_{z}}\) which do commute. We will, however, find another useful angular momentum operator \({\hat{L}^{2}}\) which does commute separately with each of \({\hat{L}_{x}}\), \({\hat{L}_{y}}\) and \({\hat{L}_{z}}\). The eignfunctions for \({\hat{L}_{x}}\), \({\hat{L}_{y}}\), \({\hat{L}_{z}}\) are simple. Those for \({\hat{L}^{2}}\), the spherical harmonics, are more complicated, but can be understood relatively simply and form the angular shapes of the hydrogen atom orbitals.</p>
<p>The classical angular momentum of a small object of (vector) linear momentum \(\mathbf{p}\) centered at a point given by the vector displacement \(\mathbf{r}\) relative to some origin is \(\mathbf{L}=\mathbf{r}\times\mathbf{p}\).</p>
<p>As usual</p>
\[\mathbf{C}=\mathbf{A}\times\mathbf{B}=\mathbf{c}Ab\sin\theta\equiv\begin{vmatrix}\mathbf{i} & \mathbf{j} & \mathbf{k} \\ A_{x} & A_{y} & A_{z} \\ B_{x} & B_{y} & B_{z}\end{vmatrix}\]
<p>where \({\mathbf{i}}\), \({\mathbf{j}}\), \({\mathbf{k}}\) are unit vectors in \({x}\), \({y}\) and \({z}\) directions. \({\mathbf{C}}\) is perpendicular to the plane of \({\mathbf{A}}\) and \({\mathbf{B}}\). \({\theta}\) is the angle between the vector \({\mathbf{A}}\) and \({\mathbf{B}}\). \({\mathbf{c}}\) is unit vector in the direction of the vector \({\mathbf{C}}\). Note that, in previous equation, the ordering of the multiplications in the second line is chosen to work also for operators instead of numbers for one or other vector, the sequence of multiplications in each term is always in the sequence of the rows from top to bottom.</p>
<p>With classical angular momentum, we can explicitly write out the various components</p>
\[L_{x}=yp_{z}-zp_{y}\quad L_{y}=zp_{x}-xp_{z}\quad L_{z}=xp_{y}-yp_{x}\]
<p>Now we can propose a quantum mechanical angular momentum operator \({\hat{\mathbf{L}}}\) based on substituting the position and momentum operators</p>
\[\hat{\mathbf{L}}=\hat{\mathbf{r}}\times\hat{\mathbf{p}}=-i\hbar(\mathbf{r}\times\nabla)\]
<p>and similarly write out component operators</p>
\[\begin{array}{rcl} \hat{L}_{x}&=&\hat{y}\hat{p}_{z}-\hat{z}\hat{p}_{y}=-i\hbar\left(y\frac{\partial}{\partial z}-z\frac{\partial}{\partial y}\right) \\ \hat{L}_{y}&=&\hat{z}\hat{p}_{x}-\hat{x}\hat{p}_{z}=-i\hbar\left(z\frac{\partial}{\partial x}-x\frac{\partial}{\partial z}\right) \\ \hat{L}_{z}&=&\hat{x}\hat{p}_{y}-\hat{y}\hat{p}_{x}=-i\hbar\left(x\frac{\partial}{\partial y}-y\frac{\partial}{\partial x}\right) \end{array}\]
<p>which are each Hermitian, and so, correspondingly, is the operator \({\hat{\mathbf{L}}}\) itself.</p>
<p>The operators corresponding to individual coordinate directions obey commutation relations</p>
\[\begin{array}{rcl} \hat{L}_{x}\hat{L}_{y}-\hat{L}_{y}\hat{L}_{x}&=&[\hat{L}_{x}, \hat{L}_{y}]=i\hbar\hat{L}_{z} \\ \hat{L}_{y}\hat{L}_{z}-\hat{L}_{z}\hat{L}_{y}&=&[\hat{L}_{y}, \hat{L}_{z}]=i\hbar\hat{L}_{x} \\ \hat{L}_{z}\hat{L}_{x}-\hat{L}_{x}\hat{L}_{z}&=&[\hat{L}_{z}, \hat{L}_{x}]=i\hbar\hat{L}_{y} \\ \end{array}\]
<p>These individual commutation relations can be written in a more compact form</p>
\[\hat{\mathbf{L}}\times \hat{\mathbf{L}}=i\hbar\hat{\mathbf{L}}\]
<p>Unlike operators for position and for linear momentum, the different components of this angular momentum operator do not commute with one another. Though a particle can have simultaneously a well-defined position in both \({x}\) and \({y}\) directions or have simultaneously a well-defined momentum in both the \({x}\) and \({y}\) directions. A particle cannot in general simultaneously have a well-defined angular momentum component in more than one direction.</p>
<p>###1.2. Angular momentum eigenfunctions</p>
<p>The relation between spherical polar and Cartesian coordinates is</p>
\[\begin{array}{rcl} x&=& r\sin\theta\cos\phi \\ y&=& r\sin\theta\sin\phi \\ z&=& r\cos\theta \end{array}\]
<p>\({\theta}\) is called polar angle and \({\phi}\) is the azimuthal angle.</p>
<p>In inverse form</p>
\[\begin{array}{rcl} r&=&\sqrt{x^{2}+y^{2}+z^{2}} \\ \theta&=&\sin^{-1}\left(\frac{\sqrt{x^{2}+y^{2}}}{\sqrt{x^{2}+y^{2}+z^{2}}}\right) \\ \phi&=&\tan^{-1}\left(\frac{y}{x}\right) \end{array}\]
<p>With these definitions of spherical polar coordinates and with standard partial derivative relations of the form</p>
\[\frac{\partial}{\partial x}=\frac{\partial r}{\partial x}\frac{\partial}{\partial r}+\frac{\partial\theta}{\partial x}\frac{\partial}{\partial\theta}+\frac{\partial\phi}{\partial x}\frac{\partial}{\partial\phi}\]
<p>for each of the Cartesian coordinate directions, we can rewrite the angular momentum operator components in spherical polar coordinates.</p>
<p>From previous obtained commutators, we obtain</p>
\[\begin{array}{rcl} \hat{L}_{x}&=&i\hbar\left(\sin\phi\frac{\partial}{\partial\theta}+\cot\theta\cos\phi\frac{\partial}{\partial\phi}\right) \\ \hat{L}_{y}&=&i\hbar\left(-\cos\phi\frac{\partial}{\partial\theta}+\cot\theta\cos\phi\frac{\partial}{\partial\phi}\right) \\ \hat{L}_{z}&=&-i\hbar\frac{\partial}{\partial\phi} \end{array}\]
<p>Using \({\hat{L}_{z}=-i\hbar\frac{\partial}{\partial\phi}}\), we solve for the eigenfunctions and eigenvalues of \({\hat{L}_{z}}\). The eigen equation is</p>
\[\hat{L}_{z}\Phi(\phi)=m\hbar\Phi(\phi)\]
<p>where \({m\hbar}\) is the eigenvalue to be determined. The solution of this equation is</p>
\[\Phi(\phi)=\exp(im\phi)\]
<p>The requirements that the wavefunction and its derivative are continuous when we return to where we started, mean that \({m}\) must be an integer (positive, negative or zero). Hence we find that the angular momentum around the \({z}\) axis is quantized with units of angular momentum of \({\hbar}\).</p>
<p>##2. The L Squared Operator</p>
<p>###2.1. Separating the L squared operator</p>
<p>In quantum mechanics, we also consider another operator associated with angular momentum the operator \({\hat{L}^{2}}\). This should be thought of as the “dot” product of \({\hat{\mathbf{L}}}\) with itself and is defined as</p>
\[\hat{L}^{2}=\hat{L}_{x}^{2}+\hat{L}_{y}^{2}+\hat{L}_{z}^{2}\]
<p>It is straightforward to show when that</p>
\[\hat{L}^{2}=-\hbar^{2}\nabla_{\theta,\phi}^{2}\]
<p>when the operator \({\nabla_{\theta,\phi}^{2}}\) is given by</p>
\[\nabla_{\theta,\phi}^{2}=\left[\frac{1}{\sin\theta}\frac{\partial}{\partial\theta}\left(\sin\theta\frac{\partial}{\partial\theta}\right)+\frac{1}{\sin^{2}\theta}\frac{\partial^{2}}{\partial\phi^{2}}\right]\]
<p>which is actually \({\theta}\) and \({\phi}\) part of the Laplacian (\({\nabla^{2}}\)) operator in spherical polar coordinator hence the notation.</p>
<p>\({\hat{L}^{2}}\) commutes with each of \({\hat{L}_{x}}\), \({\hat{L}_{y}}\), and \({\hat{L}_{z}}\). Of course, the choice of the \({z}\) direction is arbitrary. We could equally well have chosen the polar axis along the \({x}\) or \({y}\) directions. Then it would similarly be obvious that \({\hat{L}^{2}}\) commutes with \({\hat{L}_{x}}\) or \({\hat{L}_{y}}\). And the reason why \({\hat{L}^{2}}\) commute with each of \({\hat{L}_{x}}\), \({\hat{L}_{y}}\) and \({\hat{L}_{z}}\) is that we can choose the eigenfunctions of \({\hat{L}^{2}}\) to be the same as those of any one of \({\hat{L}_{x}}\), \({\hat{L}_{y}}\), and \({\hat{L}_{z}}\).</p>
<p>We want eigenfunctions of \({\hat{L}^{2}}\) or, equivalently, \({\nabla_{\theta, \phi}^{2}}\) and so the equation we hope to solve is of the form</p>
\[\nabla_{\theta,\phi}^{2}Y_{lm}(\theta,\phi)=-l(l+1)Y_{lm}(\theta,\phi)\]
<p>We anticipate the answer by writing the eigenvalue in the form \({-l(l+1)}\) but it is just any arbitrary number to be determined. The notation \({Y_{lm}(\theta,\phi)}\) also anticipates the final answer but it is arbitrary function to be determined.</p>
<p>We presume that the final eigenfunctions can be separated in the form</p>
\[Y_{lm}(\theta,\phi)=\Theta(\theta)\Phi(\phi)\]
<p>where \({\Theta(\theta)}\) only depends on \({\theta}\) and \({\Phi(\phi)}\) only depends on \({\phi}\).</p>
<p>Substituting this form in the previous equation gives</p>
\[\frac{\Phi(\phi)}{\sin\theta}\frac{\partial}{\partial\theta}\left(\sin\theta\frac{\partial}{\partial\theta}\right)\Theta(\theta)+\frac{\Theta(\theta)}{\sin^{2}\theta}\frac{\partial^{2}\Phi(\phi)}{\partial\phi^{2}}=-l(l+1)\Theta(\theta)\Phi(\phi)\]
<p>Multiplying by \({\sin^{2}\theta/\Theta(\theta)\Phi(\phi)}\) and rearranging, gives</p>
\[\frac{1}{\Phi(\phi)}\frac{\partial^{2}\Phi(i)}{\partial\phi^{2}}=-l(l+1)\sin^{2}\theta-\frac{\sin\theta}{\Theta(\theta)}\frac{\partial}{\partial\theta}\left(\sin\theta\frac{\partial}{\partial\theta}\right)\Theta(\theta)\]
<p>The left hand side depends on only \({\phi}\) whereas the right hand side depends only on \({\theta}\) so these must both equal a (“separation”) constant. Anticipating the answer, we choose a separation constant of \({-m^{2}}\) where \({m}\) is still to be determined.</p>
<p>Now for \({\phi}\) equation, we have in the following form:</p>
\[\frac{d^{2}\Phi(\phi)}{d\phi^{2}}=-m^{2}\Phi(\phi)\]
<p>The solutions to an equation like this are of the form \({\sin m\phi}\), \({\cos m\phi}\) or \({\exp im\phi}\). We choose the exponential form \({\exp im\phi}\), so \({\Phi}\) is also a solution of the \({\hat{L}_{z}}\) eigen equation</p>
\[\hat{L}_{z}\Phi(\phi)=m\hbar\Phi(\phi)\]
<p>We expect that \({\Phi}\) and its derivative are continuous, so this wavefunction must repeat every \({2\pi}\) of angle \({\phi}\), hence, \({m}\) must be an integer.</p>
<p>For \({\theta}\) equation, we have in the following form (already rearranged)</p>
\[\frac{1}{\sin\theta}\frac{d}{d\theta}\left(\sin\theta\frac{d}{d\theta}\right)\Theta(\theta)-\frac{m^{2}}{\sin^{2}\theta}\Theta(\theta)+l(l+1)\Theta(\theta)=0\]
<p>This is the associated Legendre equation whose solution are the associated Legendre function</p>
\[\Theta(\theta)=P_{l}^{m}(\cos\theta)\]
<p>The solutions required that \({l=0,1,2,3,\ldots,}\) and \({-l\leq m\leq l}\) (\({m}\) is integer).</p>
<p>The associated Legendre functions can conveniently be defined as using Rodrigue’s formula</p>
\[P_{l}^{m}(x)=\frac{1}{2^{l}l!}(1-x^{2})^{m/2}\frac{d^{l+m}}{dx^{l+m}}(x^{2}-1)^{l}\]
<p>We see that these functions \({P_{l}^{m}(x)}\) have following properties:</p>
<ul>
<li>The highest power of the argument \({x}\) is always \({x^{l}}\).</li>
<li>The functions for a given \({l}\) for \({+m}\) and \({-m}\) are identical other than for numerical perfactors.</li>
<li>Less obviously, between -1 and +1 and not including the values at those end points the function have \({l-\vert m\vert}\) zeros.</li>
</ul>
<p>Putting this all together, the eigen equation is</p>
\[\hat{L}^{2}Y_{lm}(\theta,\phi)=\hbar^{2}l(l+1)Y_{lm}(\theta,\phi)\]
<p>with <em>spherical harmonics</em> \({Y_{lm}(\theta,\phi)}\) as the eigenfunctions which, after normalization, can be written</p>
\[Y_{lm}(\theta,\phi)=(-1)^{m}\sqrt{\frac{2l+1}{4\pi}\frac{(l-m)!}{(l+m)!}}P_{l}^{m}(\cos\theta)\exp(im\phi)\]
<p>where \({l=0,1,2,3,\ldots}\), where \({m}\) is an integer, \({-l\leq m\leq l}\) and the eigenvalues are \({\hbar^{2}l(l+1)}\)</p>
<p>As is easily verify these spherical harmonics are also eigenfunctions of the \({\hat{L}_{z}}\) operator. Explicitly, we have the eigen equation</p>
\[\hat{L}_{z}Y_{lm}(\theta,\phi)=m\hbar Y_{lm}(\theta,\theta)\]
<p>with eigenvalues of \({\hat{L}_{z}}\) being \({m\hbar}\).</p>
<p>It makes no difference to the \({\hat{L}_{z}}\) eigenfunctions if we multiply them by a function of \({\theta}\).</p>
<p>###2.2. Visualizing spherical harmonics</p>
<p>The lowest solution \({l=0}\), \({m=0}\) is called “breathing” mode. The spherical shell expands and contracts periodically. For all other solutions there are one or more nodal circles on the sphere. A nodal circle is one that is unchanged in that particular oscillating mode.</p>
<p>Note the following rules for the spherical shell modes</p>
<ul>
<li>the surfaces on opposite sides of a nodal circle oscillate in opposite directions.</li>
<li>the total number of nodal circles is equal to \({l}\).</li>
<li>the number of total nodal circles passing through the poles in \({m}\), and they divide the sphere equally in the azimuthal angle \({\phi}\).</li>
<li>the remaining nodal circles are either equatorial or parallel to the equator symmetrically distributed between top and bottom halves of the sphere.</li>
</ul>
<p>###2.3. Notations for spherical harmonics</p>
<p>We often use Dirac notation in writing equations associated with angular momentum. It is common to write</p>
\[\hat{L}^{2}\vert l,m\rangle=\hbar^{2}l(l+1)\vert l,m\rangle\]
<p>and</p>
\[\hat{L}_{z}\vert l,m\rangle=m\hbar\vert l,m\rangle\]
<p>The spherical harmonics arises in the solution of the hydrogen atom problem. Different value of \({l}\) give rise to different sets of spectral lines from hydrogen identified empirically in the 19th century. Spectroscopists identified groups of lines called</p>
<ul>
<li>“spectral” (s)</li>
<li>“principal” (p)</li>
<li>“diffuse” (d) and</li>
<li>“fundamental” (f)</li>
</ul>
<p>Each of these is now identified with the specific values of \({l}\). Now we also alphabetically extend to higher \({l}\) values: s: \({l=0}\), p: \({l=1}\), d: \({l=2}\), f: \({l=3}\), g: \({l=4}\), h: \({l=5}\) and so on. It is convenient that the “s” wavefunctions are all spherically symmetric even though the “s” of the notation originally had nothing to do with the spherical symmetry.</p>
<p>##3. The Hydrogen Atom</p>
<p>###3.1. Multiple particle wavefunctions</p>
<p>We start by generalizing the Schroedinger equation, writing generally for time-dependent problems</p>
\[\hat{H}\psi=E\psi\]
<p>where now we mean that the Hamiltonian \({\hat{H}}\) is the operator representing the energy of the entire system. And \({\psi}\) is the wavefunction representing the state of the entire system.</p>
<p>For a hydrogen atom, there are two particles: the electron and the proton. Each of these has a set of coordinates associated with it: \({x_{e}}\), \({y_{e}}\) and \({z_{e}}\) for the electron and \({x_{p}}\), \({y_{p}}\) and \({z_{p}}\) for the proton. The wavefunction will therefore in general be a function of all six of these coordinates.</p>
<p>###3.2. Solving the hydrogen atom problem</p>
<p>The electron and proton each have a mass \({m_{e}}\) and \({m_{p}}\) respectively. We expected kinetic energy operators associated with each of theses masses and potential energy from the electrostatic attraction of electron and proton.</p>
<p>Hence, the Hamiltonian becomes</p>
\[\hat{H}=-\frac{\hbar^{2}}{2m_{e}}\nabla_{e}^{2}-\frac{\hbar^{2}}{2m_{p}}\nabla_{p}^{2}+V(\vert \mathbf{r}_{e}-\mathbf{r}_{p}\vert)\]
<p>where we mean \({\nabla_{e}^{2}\equiv\frac{\partial^{2}}{\partial x_{e}^{2}}+\frac{\partial^{2}}{\partial y_{e}^{2}}+\frac{\partial^{2}}{\partial z_{e}^{2}}}\) and similarly for \({\nabla_{p}^{2}}\) and \({\mathbf{r}_{e}=x_{e}\mathbf{i}+y_{e}\mathbf{j}+z_{e}\mathbf{k}}\) is the position vector of the electron coordinates and similarly for \({\mathbf{r}_{p}}\).</p>
<p>The Coulomb potential energy</p>
\[V(\vert\mathbf{r}_{e}-\mathbf{r}_{p}\vert)=-\frac{e^{2}}{4\pi\varepsilon_{0}\vert\mathbf{r}_{e}-\mathbf{r}_{p}\vert}\]
<p>depends on the distance between the electron and proton coordinates which is important in simplifying the solution.</p>
<p>The potential here is only a function of \({\vert \mathbf{r}_{e}-\mathbf{r}_{p}\vert}\). We could choose a new set of six coordinates in which three are the relative positions \({x=x_{e}-x_{p}}\), \({y=y_{e}-y_{p}}\), \({z=z_{e}-z_{p}}\) from which we obtain</p>
\[r=\sqrt{x^{2}+y^{2}+z^{2}}=\vert \mathbf{r}_{e}-\mathbf{r}_{p}\vert\]
<p>The position \({\mathbf{R}}\) of the center of mass of two masses is the same as the balance point of a light-weight beam with the two masses at possible ends and so is the weighted average of the positions of the two individual masses</p>
\[\mathbf{R}=\frac{m_{e}\mathbf{r}_{e}+m_{p}\mathbf{r}_{p}}{M}\]
<p>where \({M}\) is the total mass \({M=m_{e}+m_{p}}\).</p>
<p>Now we construct the differential operators we need in terms of these coordinates with</p>
\[\mathbf{R}=X\mathbf{i}+Y\mathbf{j}+Z\mathbf{k}\]
<p>then for the new coordinates in the \({x}\) direction, we have</p>
\[X=\frac{m_{e}x_{e}+m_{p}x_{p}}{M}\]
<p>and similarly for the \({y}\) and \({z}\) directions.</p>
<p>Using the standard method of changing partial derivatives to new coordinates and fully notating the variables held constant. The first derivatives in the \({x}\) direction become</p>
\[\left.\frac{\partial}{\partial x_{e}}\right\vert_{x_{p}}=\left.\frac{\partial X}{\partial x_{e}}\right\vert_{x_{p}}\left.\frac{\partial}{\partial X}\right\vert_{x}+\left.\frac{\partial x}{\partial x_{e}}\right\vert_{x_{p}}\left.\frac{\partial}{\partial x}\right\vert_{X}=\frac{m_{e}}{M}\left.\frac{\partial}{\partial X}\right\vert_{x}+\left.\frac{\partial}{\partial x}\right\vert_{X}\]
<p>and similarly</p>
\[\left.\frac{\partial}{\partial x_{p}}\right\vert_{x_{e}}=\left.\frac{\partial X}{\partial x_{p}}\right\vert_{x_{e}}\left.\frac{\partial}{\partial X}\right\vert_{x}+\left.\frac{\partial x}{\partial x_{p}}\right\vert_{x_{e}}\left.\frac{\partial}{\partial x}\right\vert_{X}=\frac{m_{p}}{M}\left.\frac{\partial}{\partial X}\right\vert_{x}+\left.\frac{\partial}{\partial x}\right\vert_{X}\]
<p>The second derivatives become</p>
\[\left.\frac{\partial^{2}}{\partial x_{e}^{2}}\right\vert_{x_{p}}=\left(\frac{m_{e}}{M}\right)^{2}\left.\frac{\partial^{2}}{\partial X^{2}}\right\vert_{x}+\left.\frac{\partial^{2}}{\partial x^{2}}\right\vert_{X}+\frac{m_{e}}{M}\left(\left.\frac{\partial}{\partial x}\right\vert_{X}\left.\frac{\partial}{\partial X}\right\vert_{x}+\left.\frac{\partial}{\partial X}\right\vert_{x}\left.\frac{\partial}{\partial x}\right\vert_{X}\right)\]
<p>and similarly</p>
\[\left.\frac{\partial^{2}}{\partial x_{p}^{2}}\right\vert_{x_{e}}=\left(\frac{m_{p}}{M}\right)^{2}\left.\frac{\partial^{2}}{\partial X^{2}}\right\vert_{x}+\left.\frac{\partial^{2}}{\partial x^{2}}\right\vert_{X}-\frac{m_{p}}{M}\left(\left.\frac{\partial}{\partial x}\right\vert_{X}\left.\frac{\partial}{\partial X}\right\vert_{x}+\left.\frac{\partial}{\partial X}\right\vert_{x}\left.\frac{\partial}{\partial x}\right\vert_{X}\right)\]
<p>So dropping the explicit statement of variables held constant</p>
\[\frac{1}{m_{e}}\frac{\partial^{2}}{\partial x_{e}^{2}}+\frac{1}{m_{p}}\frac{\partial^{2}}{\partial x_{p}^{2}}=\frac{1}{M}\frac{\partial^{2}}{\partial X^{2}}+\frac{1}{\mu}\frac{\partial^{2}}{\partial x^{2}}\]
<p>where \({\mu}\) is the so-called reduced mass \({\mu=\frac{m_{e}m_{p}}{m_{e}+m_{p}}}\).</p>
<p>The same kind of relations can be written for each of the other Cartesian directions, so if we define</p>
\[\nabla_{\mathbf{R}}^{2}\equiv\frac{\partial^{2}}{\partial X^{2}}+\frac{\partial^{2}}{\partial Y^{2}}+\frac{\partial^{2}}{\partial Z^{2}}\quad \nabla_{\mathbf{r}}^{2}\equiv\frac{\partial^{2}}{\partial x^{2}}+\frac{\partial^{2}}{\partial y^{2}}+\frac{\partial^{2}}{\partial z^{2}}\]
<p>We can write the Hamiltonian in a new form with center of mass coordinates</p>
\[\hat{H}=-\frac{\hbar}{2M}\nabla_{\mathbf{R}}^{2}-\frac{\hbar^{2}}{2\mu}\nabla_{\mathbf{r}}^{2}+V(\mathbf{r})\]
<p>which now allows us to separate the problem.</p>
<p>Presume the wavefunction can be written</p>
\[\psi(\mathbf{R},\mathbf{r})=S(\mathbf{R})U(\mathbf{r})\]
<p>Substituting this form in the Schroedinger equation with the Hamiltonian above. We obtain</p>
\[-\frac{1}{S(\mathbf{R})}\frac{\hbar^{2}}{2M}\nabla_{\mathbf{R}}^{2}S(\mathbf{R})=E-\frac{1}{U(\mathbf{r})}\left[-\frac{\hbar^{2}}{2\mu}\nabla_{\mathbf{r}}^{2}+V(\mathbf{r})\right]U(\mathbf{r})=E_{CoM}\]
<p>The left hand side depends only on \({\mathbf{R}}\) and the right hand side depends only on \({\mathbf{r}}\), so both must equal a “separation” constant which we call \({E_{CoM}}\).</p>
<p>Hence, we have two separated equations</p>
\[-\frac{\hbar^{2}}{2M}\nabla_{\mathbf{R}}^{2}S(\mathbf{R})=E_{CoM}S(\mathbf{R})\quad\text{Center of mass motion}\]
\[\left[-\frac{\hbar^{2}}{2\mu}\nabla_{\mathbf{r}}^{2}+V(\mathbf{r})\right]U(\mathbf{r})=E_{H}U(\mathbf{r})\quad\text{Relative motion}\]
<p>where \({E_{H}=E-E_{CoM}}\)</p>
<p>The <em>center of mass motion</em> is the Schroedinger equation for free particle of mass \({M}\) with wavefunction solutions</p>
\[S(\mathbf{R})=\exp(i\mathbf{K}\cdot\mathbf{R})\]
<p>and eigenenergies</p>
\[E_{CoM}=\frac{\hbar^{2}K^{2}}{2M}\]
<p>This is the motion of the entire hydrogen atom as a particle of mass \({M}\).</p>
<p>The <em>relative motion</em> equation corresponds to the “internal” relative motion of the electron and proton and will give us the internal states of the hydrogen atom.</p>
<p>###3.3. Informal solution for the relative motion</p>
<p>We presume that the hydrogen atom will have some characteristic size which is called the Bohr radius \({a_{0}}\). We expect that the “average” potential energy strictly, its expectation value will therefore be</p>
\[\langle E_{potential}\rangle\approx-\frac{e^{2}}{4\pi\varepsilon_{0}a_{0}}\]
<p>For a reasonable smooth wavefunction \({\psi(\mathbf{r})}\) of size \({\sim a_{0}}\), the second spatial derivative will be</p>
\[\sim-\psi(0)/a_{0}^{2}\]
<p>Remembering that for a mass \({\mu}\), the kinetic energy operator is \({-(\hbar^{2}/2\mu)\nabla^{2}}\). The “average” kinetic energy will therefore be</p>
\[\langle E_{kinetic}\rangle\approx\frac{\hbar^{2}}{2\mu a_{0}^{2}}\]
<p>Now, in the spirit of a “variational” calculation. We adjust the parameter \({a_{0}}\) to get the lowest value of the total energy. Some variational approaches can be justified rigorously as approximation for the lowest energy.</p>
<p>With our very simple model, the total energy is</p>
\[\langle E_{total}\rangle\approx\frac{\hbar^{2}}{2\mu a_{0}^{2}}-\frac{e^{2}}{4\pi\varepsilon_{0}a_{0}}\]
<p>The total energy is balance between the potential energy and the kinetic energy. For this simple model, differentiation shows that the chose of \({a_{0}}\) that minimizes the energy overall is</p>
\[a_{0}=\frac{4\pi\varepsilon_{0}\hbar^{2}}{e^{2}\mu}\approx 0.529 A=5.29\times 10^{-11} m\]
<p>which is the standard definition of the Bohr radius. With this choice of \({a_{0}}\), the corresponding total energy of the state is</p>
\[E_{total}=-\frac{\mu}{2}\left(\frac{e^{2}}{4\pi\varepsilon_{0}\hbar}\right)^{2}\]
<p>We can usually define a “Rydberg” energy unit.</p>
\[Ry=-\langle E_{total}\rangle\approx 13.6eV\]
Thu, 14 Nov 2013 00:00:00 +0000
https://dgyblog.com/oldtimes/2013/11/14/quantum-for-scieng-7/
https://dgyblog.com/oldtimes/2013/11/14/quantum-for-scieng-7/Quantum Mechanics for Scientists and Engineers Notes 6<p>###1. Types of Linear Operators</p>
<p>####1.1. Bilinear expansion of operators</p>
<p>We know that we can expand functions in a basis set as in \({f(x)=\sum_{n}c_{n}\psi_{n}(x)}\) or \({\vert f(x)\rangle=\sum_{n}c_{n}\vert \psi_{n}(x)\rangle}\). What is the equivalent expansion for an operator? We can deduce this from our matrix representation.</p>
<p>Consider an arbitrary function \({f}\), written as the ket \({\vert f\rangle}\) from which we can calculate a function \({g}\), written as the ket \({\vert g\rangle}\) by acting with a specific operator \({\hat{A}}\):</p>
\[\vert g\rangle=\hat{A}\vert f\rangle\]
<p>We expand \({g}\) and \({f}\) on the basis set \({\psi_{i}}\): \({\vert g\rangle=\sum_{i}d_{i}\vert \psi_{i}\rangle}\), \({\vert f\rangle=\sum_{j}c_{j}\vert \psi_{j}\rangle}\). From our matrix representation of \({\vert g\rangle=\hat{A}\vert f\rangle}\), we know that \({d_{i}=\sum_{j}A_{ij}c_{j}}\), and, by the definition of the expansion coefficient, we know that \({c_{j}=\langle\psi_{j}\vert f\rangle}\) so</p>
\[d_{i}=\sum_{j}\langle\psi_{j}\vert f\rangle\]
<p>Substituting the above equation to \({\vert g\rangle=\sum_{i}d_{i}\vert \psi_{i}\rangle}\), gives</p>
\[\vert g\rangle=\sum_{i,j}A_{ij}\langle\psi_{i}\vert f\rangle\vert \psi_{i}\rangle\]
<p>Remember that \({\langle\psi_{j}\vert f\rangle\equiv c_{j}}\) is simply a number, so we can move it within the multiplication expression. Hence we have</p>
\[\vert g\rangle=\sum_{i,j}A_{ij}\vert \psi_{i}\rangle\langle\psi_{j}\vert f\rangle=\left[\sum_{i,j}A_{ij}\vert \psi_{i}\rangle\langle\psi_{j}\vert \right]\vert f\rangle\]
<p>But \({\vert g\rangle=\hat{A}\vert f\rangle}\) and \({\vert g\rangle}\) and \({\vert f\rangle}\) are arbitrary, so</p>
\[\hat{A}\equiv\sum_{i,j}A_{ij}\vert \psi_{i}\rangle\langle\psi_{j}\vert\]
<p>This form is referred to as a “bilinear expansion” of the operator \({\hat{A}}\) on the basis \({\vert \psi_{i}\rangle}\) and is analogous to the linear expansion of a vector on a basis. Any linear operator that operates within the space can be written this way.</p>
<p>Though the Dirac notation is more general and elegant for functions of a simple variables where</p>
\[g(x)=\int\hat{A}f(x_{1})\,dx_{1}\]
<p>We can analogously write the bilinear expansion in the form</p>
\[\hat{A}\equiv\sum_{i,j}A_{ij}\psi_{i}(x)\psi_{j}^{*}(x_{1})\]
<p>The Dirac form of expansion contains an outer product of two vectors. An outer product expression of the form \({\vert g\rangle\langle f\vert }\) generates matrix. The specific notation \({\hat{A}\equiv A_{ij}\vert \psi_{i}\rangle\langle\psi_{j}\vert }\) is actually, then, a sum of matrices. In the matrix \({\vert \psi_{i}\rangle\langle\psi_{j}\vert }\) the element in the \({j}\)th column and \({i}\)th row is 1, all other elements are zero.</p>
<p>####1.2. The identity operator</p>
<p>The identity operator \({\hat{I}}\) is the operator that when it operates on a vector (function) leaves it unchanged. In matrix form, the identity operator is</p>
\[\left[\begin{array}{cccc}1 & 0 & 0 & \cdots \\ 0 & 1 & 0 & \cdots \\ 0 & 0 & 1 & \cdots \\ \vdots & \vdots & \vdots & \ddots\end{array}\right]\]
<p>In bra-ket form, the identity operator can be written as</p>
\[\hat{I}=\sum_{i}\vert \psi_{i}\rangle\langle\psi_{i}\vert\]
<p>where the \({\vert \psi_{i}\rangle}\) form a complete basis for the space. This statement is trivial if \({\vert \psi_{i}\rangle}\) is the basis used to represent the space.</p>
<p>Note, however that \({\hat{I}=\sum_{i}\vert \psi_{i}\rangle\langle\psi_{i}\vert }\) even if the basis used is not the set \({\vert \psi_{i}\rangle}\). Then some specific \({\vert \psi_{i}\rangle}\) is not a vector with an \({i}\)th element of 1 and all other elements 0 and the matrix \({\vert \psi_{i}\rangle\langle\psi_{i}\vert }\) in general has possibly and of its elements non-zero. Nonetheless, the sum of all matrices \({\vert \psi_{i}\rangle\langle\psi_{i}\vert }\) still gives the identity matrix \({\hat{I}}\).</p>
<p>The expression above has a simple vector meaning. In the expression \({\vert f\rangle=\sum_{i}\vert \psi_{i}\rangle\langle\psi_{i}\vert f\rangle}\), \({\langle\psi_{i}\vert f\rangle}\) is just the projection of \({\vert f\rangle}\) onto the \({\vert \psi_{i}\rangle}\) axis, so multiplying \({\vert \psi_{i}\rangle}\) by \({\langle\psi_{i}\vert f\rangle}\) that is \({\langle\psi_{i}\vert f\rangle\vert \psi_{i}\rangle=\vert \psi_{i}\rangle\langle\psi_{i}\vert f\rangle}\) gives the vector component of \({\vert f\rangle}\) on the \({\vert \psi_{i}\rangle}\) axis.</p>
<p>Since the identity matrix is the identity matrix, so no matter what complete orthonormal basis we use to represent it, we can use the following tricks: First, we “insert” the identity matrix in some basis into a expression. Then, we rearrange the expression. Then, we find an identity matrix we can take out of the result.</p>
<p>Consider the sum \({S}\) of the diagonal elements of an operator \({\hat{A}}\) on some complete orthonormal basis \({\vert \psi_{i}\rangle}\)</p>
\[S=\sum_{i}\langle\psi_{i}\vert \hat{A}\vert \psi_{i}\rangle\]
<p>Now we suppose we have some other complete orthonormal basis \({\vert \phi_{m}\rangle}\). We can therefore also write the identity operator as</p>
\[\hat{I}=\sum_{m}\vert \phi_{m}\rangle\langle\phi_{m}\vert\]
<p>In \({S}\), we can insert an identity operator just before \({\hat{A}}\) which makes no difference to the result since \({\hat{I}\hat{A}=\hat{A}}\), so we have</p>
\[S=\sum_{i}\langle\psi_{i}\vert \hat{I}\hat{A}\vert \psi_{i}\rangle=\sum_{i}\langle\psi_{i}\vert \left(\sum_{m}\vert \phi_{m}\rangle\langle\phi_{m}\vert \right)\hat{A}\vert \psi_{i}\rangle\]
<p>Then we can rearranging the above equation in following sequence:</p>
\[\begin{array}{rcl} S&=&\sum_{m}\sum_{i}\langle\psi_{i}\vert \phi_{m}\rangle\langle\phi_{m}\vert \hat{A}\vert \psi_{i}\rangle \\ &=&\sum_{m}\sum_{i}\langle\phi_{m}\vert \hat{A}\vert \psi_{i}\rangle\langle\psi_{i}\vert \phi_{i}\rangle \\ &=&\sum_{m}\langle\phi_{m}\vert \hat{A}\left(\sum_{i}\vert \psi_{i}\rangle\langle\psi_{i}\vert \right)\vert \phi_{m}\rangle \\ &=&\sum_{m}\langle\phi_{m}\vert \hat{A}\vert \phi_{m}\rangle \end{array}\]
<p>Hence the trace of an operator (the sum of the diagonal elements) is independent of the basis used to represent the operator.</p>
<p>####1.3. Inverse and unitary operators</p>
<p>For an operator \({\hat{A}}\) on an arbitrary function \({\vert f\rangle}\), then the inverse operator, if it exists is that operator \({\hat{A}^{-1}}\) such that</p>
\[\vert f\rangle=\hat{A}^{-1}\hat{A}\vert f\rangle\]
<p>Since the function \({\vert f\rangle}\) is arbitrary, we can therefore identify</p>
\[\hat{A}^{-1}\hat{A}=\hat{I}\]
<p>Since the operator can be represent by a matrix, finding the inverse of the operator reduces to finding the inverse of a matrix.</p>
<p>A unitary operator \({\hat{U}}\), is one for which</p>
\[\hat{U}^{-1}=\hat{U}^{\dag}\]
<p>that is, its inverse is its Hermitian adjoint.</p>
<p>Note first that it can shown generally that for two matrices \({\hat{A}}\) and \({\hat{B}}\) that can be multiplied</p>
\[(\hat{A}\hat{B})^{\dag}=\hat{B}^{\dag}\hat{A}^{\dag}\]
<p>That is, the Hermitian adjoint of the product is the “flipped round” product of the Hermitian adjoint. Explicitly, for matrix-vector multiplication</p>
\[(\hat{A}\vert h\rangle)^{\dag}=\langle h\vert \hat{A}^{\dag}\]
<p>Consider the unitary operator \({\hat{U}}\) and vectors \({\vert f_{old}\rangle}\) and \({\vert g\rangle}\). We form two new vector by operating with \({\hat{U}}\)</p>
\[\vert f_{new}\rangle=\hat{U}\vert f_{old}\rangle\qquad \vert g_{new}\rangle=\hat{U}\vert g_{new}\rangle\]
<p>Then \({\langle g_{new}\vert =\langle g_{old}\vert \hat{U}^{\dag}}\).</p>
<p>So,</p>
\[\langle g_{new}\vert f_{new}\rangle=\langle g_{old}\hat{U}^{\dag}\hat{U}\vert f_{old}\rangle=\langle g_{old}\vert f_{old}\rangle\]
<p>Hence, the unitary operation does not change the inner product. So, in particular \({\langle f_{new}\vert f_{new}\rangle=\langle f_{old}\vert f_{old}\rangle}\), the length of a vector is not changed by a unitary operator.</p>
<p>###2. Unitary and Hermitian Operators</p>
<p>####2.1. Using unitary operators</p>
<p>Suppose that we have a vector (function) \({\vert f_{old}\rangle}\) that is represented when expressed as an expansion on the function \({\vert \psi_{n}\rangle}\) as the mathematical column vector</p>
\[\vert f_{old}\rangle=\left[\begin{array}{c}c_{1} \\ c_{2} \\ c_{3} \\ \vdots\end{array}\right]\]
<p>These numbers \({c_{1}}\), \({c_{2}}\), \({c_{3}}\), … are projections of \({\vert f_{old}\rangle}\) on the orthogonal coordinate axes in the vector space labeled with \({\vert \psi_{1}\rangle}\), \({\psi_{2}\rangle}\), \({\vert \psi_{3}\rangle}\), …</p>
<p>Suppose we want to represent this vector on a set of orthogonal axes which will label \({\vert \phi_{1}\rangle}\), \({\vert \phi_{2}\rangle}\), \({\vert \phi_{3}\rangle}\), … Changing the axes which is equivalent to changing the basis set of functions does not change the vector we are representing but it does change the column numbers used to represent the vector.</p>
<p>Write transformations for each basis vector \({\vert \psi_{n}\rangle}\), we get the correct transformation if we define a matrix</p>
\[\hat{U}=\left[\begin{array}{cccc}u_{11} & u_{12} & u_{13} & \cdots \\ u_{21} & u_{22} & u_{23} & \cdots \\ u_{31} & u_{32} & u_{33} & \cdots \\ \vdots & \vdots & \vdots & \ddots\end{array}\right]\]
<p>where \({u_{ij}=\langle\phi_{i}\vert \psi_{j}\rangle}\) and we define our new column of numbers \({\vert f_{new}\rangle}\)</p>
\[\vert f_{new}\rangle=\hat{U}\vert f_{old}\rangle\]
<p>Now we can prove \({\hat{U}}\) is unitary. Writing the matrix multiplication in its sum form</p>
\[\begin{array}{rcl} (\hat{U}^{\dag}\hat{U})_{ij}&=&\sum_{m}u_{mi}^{*}u_{mj}=\sum_{m}\langle\phi_{m}\vert \psi_{i}\rangle^{*}\langle\phi_{m}\vert \psi_{j}\rangle \\ &=&\langle\psi_{i}\vert \left(\sum_{m}\vert \phi_{m}\rangle\langle\phi_{m}\vert \right)\vert \psi_{j}\rangle \\ &=&\langle\psi_{i}\vert \psi_{j}\rangle=\delta_{ij} \end{array}\]
<p>So, \({\hat{U}^{\dag}\hat{U}=\hat{I}}\), hence, \({\hat{U}}\) is unitary.</p>
<p>Consider a number such as \({\langle g\vert \hat{A}\vert f\rangle}\) where vector \({\vert g\rangle}\), \({\vert f\rangle}\) and operator \({\hat{A}}\) are arbitrary. This result should not depend on the coordinate system. So the result in an “old” coordinate system should be the same in a “new” coordinate system, that is, we should have \({\langle g_{new}\vert \hat{A}_new\vert f_{new}\rangle=\langle g_{old}\vert \hat{A}\vert f_{old}\rangle}\). Note the subscripts “new” and “old” refer to representations not the vectors (or operators) themselves which are not changed by change of representation. Only the numbers that represent them are changed. With unitary \({\hat{U}}\) operator to go from “old” to “new” systems, we can write</p>
\[\begin{array}{rcl} \langle g_{new}\vert \hat{A}_{new}\vert f_{new}\rangle&=&(\vert g_{new}\rangle^{\dag})\hat{A}_{new}\vert f_{new}\rangle \\ &=&(\hat{U}\vert g_{old}\rangle)^{\dag}\hat{A}_{new}(\hat{U}\vert f_{old}\rangle) \\ &=&\langle g_{old}\vert U^{\dag}\hat{A}_{new}\hat{U}\vert f_{old}\rangle=\langle g_{old}\vert \hat{A}_{old}\vert f_{old}\rangle \end{array}\]
<p>since \({\hat{A}_{old}=\hat{U}^{\dag}\hat{A}_{new}\hat{U}}\) or \({\hat{U}\hat{A}_{old}\hat{U}^{\dag}=(\hat{U}\hat{U}^{\dag})\hat{A}_{new}(\hat{U}\hat{U}^{\dag})=\hat{A}_{new}}\).</p>
<p>If the quantum state \({\vert \psi\rangle}\) is expanded on the basis \({\vert \psi_{n}\rangle}\) to give</p>
\[\vert \psi\rangle=\sum_{n}a_{n}\vert \psi_{n}\rangle\]
<p>then \({\sum_{n}\vert a_{n}\vert ^{2}=1}\). And if the particle is to be conserved then this sum is retained as the quantum mechanical system evolves in time. Hence, a unitary operator, which conserves length describes changes that conserve the particle.</p>
<p>####2.2. Hermitian operators</p>
<p>A Hermitian operator is equal to its own Hermitian adjoint</p>
\[\hat{M}^{\dag}=\hat{M}\]
<p>Equivalently it is self-adjoint. In matrix terms, the Hermiticity implies \({\hat{M}_{ij}=\hat{M}_{ji}^{*}}\) for all \({i}\) and \({j}\). So, also the diagonal elements of a Hermitian operator must be real.</p>
<p>To understand Hermiticity in the most general sense, consider \({\langle g\vert \hat{M}\vert f\rangle}\) for arbitrary \({\vert f\rangle}\) and \({\vert g\rangle}\) and some operator \({\hat{M}}\).</p>
<p>We examine</p>
\[(\langle g\vert \hat{M}\vert f\rangle)^{\dag}\]
<p>Since this is just a number, it is also true that</p>
\[(\langle g\vert \hat{M}\vert f\rangle)^{\dag}=(\langle g\vert \hat{M}\vert f\rangle)^{*}\]
<p>We can analyze \({(\langle g\vert \hat{M}\vert f\rangle)^{\dag}}\) using the rule \({(\hat{A}\hat{B})^{\dag}=\hat{B}^{\dag}\hat{A}^{\dag}}\) for Hermitian adjoints of products. So</p>
\[(\hat{A}\hat{B})^{\dag}=\hat{B}^{\dag}\hat{A}^{*}=(\hat{A}\hat{B})^{\dag}=\hat{B}^{\dag}\hat{A}^{\dag}=(\hat{M}\vert f\rangle)^{\dag}(\langle g\vert )^{\dag}=\langle f\vert \hat{M}^{\dag}\vert g\rangle\]
<p>Hence, if \({\hat{M}}\) is Hermitian, with therefore \({\hat{M}^{\dag}=\hat{M}}\), then</p>
\[(\langle g\vert \hat{M}\vert f\rangle)^{*}=\langle f\vert \hat{M}^{\dag}\vert g\rangle\]
<p>even if \({\vert f\rangle}\) and \({\vert g\rangle}\) are not orthogonal.</p>
<p>In integral form, for function \({f(x)}\) and \({g(x)}\), the statement above can be written</p>
\[\int g^{*}(x)\hat{M}f(x)\,dx=\left[\int f^{*}(x)\hat{M}g(x)\,dx\right]^{*}\]
<p>We can rewrite the right hand side and a simple rearrangement leads to</p>
\[\int\hat{g}(x)\hat{M}f(x)\,dx=\int\{\hat{M}g(x)\}^{*}f(x)\,dx\]
<p>which is a common statement of Hermiticity in integral form.</p>
<p>Suppose \({\vert \psi_{n}\rangle}\) is a normalized eigenvector of the Hermitian operator \({\hat{M}}\) with eigenvalue \({\mu_{n}}\). Then by definition</p>
\[\hat{M}\vert \psi_{n}\rangle=\mu_{n}\vert \psi{n}\rangle\]
<p>Therefore</p>
\[\langle\psi_{n}\vert \hat{M}\vert \psi_{n}\rangle=\mu_{n}\langle\psi_{n}\vert \psi_{n}\rangle=\mu_{n}\]
<p>But from the Hermiticity of \({\hat{M}}\) we know</p>
\[\langle\psi_{n}\vert \hat{M}\vert \psi_{n}\rangle=(\langle\psi_{n}\vert \hat{M}\vert \psi_{n}\rangle)^{*}=\mu_{n}^{*}\]
<p>And hence \({\mu_{n}}\) must be real.</p>
<p>Now, let’s prove that orthogonality of eigenfunctions for different eigenvalues</p>
\[\begin{array}{rcl} 0&=&\langle\psi_{m}\vert \hat{M}\vert \psi_{n}\rangle-\langle\psi_{m}\vert \hat{M}\vert \psi_{n}\rangle \\ 0&=&(\langle\psi_{m}\vert \hat{M})\vert \psi_{n}\rangle-\langle\psi_{m}\vert (\hat{M}\vert \psi_{n}\rangle) \\ 0&=&(\hat{M}^{\dag}\vert \psi_{m}\rangle)^{\dag}\vert \psi_{n}\rangle-\langle\psi_{m}\vert (\hat{M}\vert \psi_{n}\rangle) \\ 0&=&(\hat{M}\vert \psi_{m}\rangle)^{\dag}\vert \psi_{n}\rangle-\langle\psi_{m}\vert (\hat{M}\vert \psi_{n}\rangle) \\ 0&=&(\mu_{m}\vert \psi_{m}\rangle)^{\dag}\vert \psi_{n}\rangle-\langle\psi_{m}\vert \mu_{n}\vert \psi_{n}\rangle \\ 0&=&\mu_{m}\vert (\psi_{m}\rangle)^{\dag}\vert \psi_{n}\rangle-\mu_{n}\langle\psi_{m}\vert \psi_{n}\rangle \\ 0&=&(\mu_{m}-\mu_{n})\langle\psi_{m}\vert \psi_{n}\rangle \\ \end{array}\]
<p>But \({\mu_{m}}\) and \({\mu_{n}}\) are different, so \({0=\langle\psi_{m}\vert \psi_{n}\rangle}\), i.e., orthogonality, presuming we are working with non-zero functions.</p>
<p>It is quite possible and common in symmetric problems to have more than one eigenfunction associated with a given eigenvalue. This situation is known as degeneracy. It is provable that the number of such degenerate solutions for a given finite eigenvalue is itself finite.</p>
<p>####2.3. Matrix from of derivative operators</p>
<p>Returning to our original discussion of functions as vectors, we can postulate a form for the differential operator</p>
\[\frac{d}{dx}\equiv\left[\begin{array}{cccccc} & \ddots & & & & \\ \cdots & -\frac{1}{2\delta x} & 0 & \frac{1}{2\delta x} & 0 & \cdots \\ \cdots & 0 & -\frac{1}{2\delta x} & 0 & \frac{1}{2\delta x} & \cdots \\ & & & & \ddots & \end{array}\right]\]
<p>where we presume we can take the limits as \({\delta x\rightarrow 0}\).</p>
<p>If we multiply the column vector whose elements are the values of the function then</p>
\[\left[\begin{array}{cccccc} & \ddots & & & & \\ \cdots & -\frac{1}{2\delta x} & 0 & \frac{1}{2\delta x} & 0 & \cdots \\ \cdots & 0 & -\frac{1}{2\delta x} & 0 & \frac{1}{2\delta x} & \cdots \\ & & & & \ddots & \end{array}\right]\left[\begin{array}{c}\vdots \\ f(x_{i}-\delta x) \\ f(x_{i})\\ f(x_{i}+\delta x) \\f(x_{i}+2\delta x) \\ \vdots\end{array}\right]=\left[\begin{array}{c}\vdots \\ \frac{f(x_{i}-\delta x)-f(x_{i}+\delta x)}{2\delta x} \\\frac{f(x_{i}+2\delta x)-f(x_{i})}{2\delta x}\\ \vdots\end{array}\right]=\left[\begin{array}{c}\vdots \\ \left.\frac{df}{dx}\right\vert _{x_{i}} \\ \left.\frac{df}{dx}\right\vert _{x_{i}+\delta x} \\ \vdots\end{array}\right]\]
<p>Note this matrix is antisymmetric reflection about the diagonal and it is not Hermitian. By similar arguments, though \({d^{2}/dx^{2}}\) gives a symmetric matrix and is Hermitian.</p>
<p>We can formally “operate” on the function \({f(x)}\) by multiplying it by the function \({V(x)}\) to generate another function</p>
\[g(x)=V(x)f(x)\]
<p>Since \({V(x)}\) is performing the role of an operator. We can if we wish represent it as a (diagonal) matrix whose diagonal elements are the values of the function at each of the different points.</p>
<p>If \({V(x)}\) is real then its matrix is Hermitian as required for \({\hat{H}}\).</p>
<p>###3. Operators and Quantum Mechanics</p>
<p>####3.1. Hermitian operators in quantum mechanics</p>
<p>For Hermitian operators \({\hat{A}}\) and \({\hat{B}}\) representing physical variables. It is very important to know if they commute, i.e., \({\hat{A}\hat{B}=\hat{B}\hat{A}}\). Remember that because these linear operators obey the same algebra as matrices in general operators do not commute.</p>
<p>For quantum mechanics, we formally define an entity</p>
\[[\hat{A},\hat{B}]=\hat{A}\hat{B}-\hat{B}\hat{A}\]
<p>This entity is called the commutator.</p>
<p>An equivalent statement to saying \({\hat{A}\hat{B}=\hat{B}\hat{A}}\) is then \({[\hat{A},\hat{B}]=0}\). Strictly, this should be written</p>
\[[\hat{A},\hat{B}]=0\hat{I}\]
<p>where \({\hat{I}}\) is the identity operator but this is usually omitted.</p>
<p>If the operator do not commute, then \({[\hat{A},\hat{B}]}\) does not hold an in general we can choose to write</p>
\[[\hat{A},\hat{B}]=i\hat{C}\]
<p>where \({\hat{C}}\) is sometimes referred to the reminder of commutation or the commutation rest.</p>
<p>Operators that commute share the same set of eigenfunctions and operators that share same set of eigenfunctions commute.</p>
<p>Suppose that operators \({\hat{A}}\) and \({\hat{B}}\) commute and suppose that \({\vert \psi_{n}\rangle}\) are the eigenfunctions of \({\hat{A}}\) with eigenvalues \({A_{i}}\), then</p>
\[\hat{A}\hat{B}\vert \psi_{i}\rangle=\hat{B}\hat{A}\vert \psi_{i}\rangle=\hat{B}A_{i}\vert \psi_{i}\rangle=A_{i}\hat{B}\vert \psi_{i}\rangle\]
<p>So,</p>
\[\hat{A}[\hat{B}\vert \psi_{i}\rangle]=A_{i}[\hat{B}\vert \psi\rangle]\]
<p>This means that the vector \({\hat{B}\vert \psi_{i}\rangle}\) is also the eigenvector \({\vert \psi\rangle}\) or is proportional to it. i.e., for some number \({B_{i}}\)</p>
\[\hat{B}\vert \psi_{i}\rangle=B_{i}\vert \psi_{i}\rangle\]
<p>This kind of relation holds for all the eigenfunctions \({\vert \psi_{i}\rangle}\). So these eigenfunctions are also the eigenfunctions of the vector \({\hat{B}}\) with associated eigenvalues \({B_{i}}\). Hence we have proved the first statement that operators that commute share the same set of eigenfunctions. Note that the eigenvalues \({A_{i}}\) and \({B_{i}}\) are not in general equal to one another.</p>
<p>Now we consider the statement: operators that share the same set of eigenfunctions commute. Suppose that the Hermitian operators \({\hat{A}}\) and \({\hat{B}}\) share the same complete set \({\vert \psi_{i}\rangle}\) of eigenfunctions with associated sets of eigenvalues \({A_{n}}\) and \({B_{n}}\) respectively. Then</p>
\[\hat{A}\hat{B}\vert \psi_{i}\rangle=\hat{A}B_{i}\vert \psi_{i}\rangle=A_{i}B_{i}\vert \psi_{i}\rangle\]
<p>and similarly</p>
\[\hat{B}\hat{A}\vert \psi_{i}\rangle=\hat{B}A_{i}\vert \psi_{i}\rangle=B_{i}A_{i}\vert \psi_{i}\rangle\]
<p>Hence, for any function \({\vert f\rangle}\) which can always be expanded in this complete set of function \({\vert \psi_{n}\rangle}\), i.e., \({\vert f\rangle=\sum_{i}c_{i}\vert \psi_{i}\rangle}\), we have</p>
\[\hat{A}\hat{B}\vert f\rangle=\sum_{i}c_{i}A_{i}B_{i}\vert \psi_{i}\rangle=\sum_{i}c_{i}B_{i}A_{i}\vert \psi_{i}\rangle=\hat{B}\hat{A}\vert f\rangle\]
<p>Since we have proved this for an arbitrary function, we have proved that the operators commute hence proving the statement operators that share the same set of eigenfunctions commute.</p>
<p>####3.2. General form of the uncertainty principle</p>
<p>First, we need to set up the concepts of the mean and variance of an expectation value. Using \({\bar{A}}\) to denote the mean value of a quantity \({\hat{A}}\). We have, in the bra-ket notation, for a measurable quantity associated with the Hermitian operator \({\hat{A}}\) when the state of the system is \({\vert f\rangle}\)</p>
\[\bar{A}\equiv\langle A\rangle=\langle f\vert \hat{A}\vert f\rangle\]
<p>Let us define a new operator \({\Delta\hat{A}}\) associated with the difference between the measured value of \({A}\) and its average value</p>
\[\Delta\hat{A}=\hat{A}-\bar{A}\]
<p>Strictly, we should write \({\Delta\hat{A}=\hat{A}-\bar{\hat{I}}}\), but we take such an identity operator to be understood. Note that this operator is also Hamiltonian.</p>
<p>Variance in statistics is the “mean square” deviation from the average. To examine the variance of the quantity \({A}\), we examine the expectation value of the operator \({(\Delta\hat{A})^{2}}\). Expanding the arbitrary function \({\vert f\rangle}\) on the basis of the eigenfunction \({\vert \psi_{i}\rangle}\) of \({\hat{A}}\).</p>
\[\vert f\rangle=\sum_{i}c_{i}\vert \psi_{i}\rangle\]
<p>We can formally evaluate the expectation value of \({(\Delta\hat{A})^{2}}\) when the system is in the state \({\vert f\rangle}\).</p>
\[\begin{array}{rcl} \langle(\Delta\hat{A})^{2}\rangle&=&\langle f\vert (\Delta\hat{A})^{2}\vert f\rangle=\left(\sum_{i}c_{i}^{*}\langle\psi_{i}\vert \right)(\hat{A}-\bar{A})^{2}\left(\sum_{j}c_{j}\vert \psi_{j}\rangle\right) \\ &=&\left(\sum_{i}c_{i}^{*}\langle\psi_{i}\vert \right)(\hat{A}-\bar{A})\left(\sum_{j}c_{j}(A_{j}-\bar{A})\vert \psi_{j}\rangle\right) \\ &=&\left(\sum_{i}c_{i}^{*}\langle\psi_{i}\vert \right)\left(\sum_{j}c_{j}(A_{j}-\bar{A})^{2}\vert \psi_{j}\rangle\right)=\sum_{i}\vert c_{i}\vert ^{2}(A_{i}-\bar{A})^{2} \end{array}\]
<p>Because the \({\vert c_{i}\vert ^{2}}\) are the probabilities that the system is found, on measurement, to be in the state \({\vert \psi_{i}\rangle}\) and \({(A_{i}-\bar{A})^{2}}\) for that state simply represents the squared deviation of the value of the quantity \({A}\) from its average value then by definition</p>
\[\overline{(\Delta A)^{2}}\equiv\langle(\Delta\hat{A})^{2}\rangle=\langle(\hat{A}-\bar{A})^{2}\rangle=\langle f\vert (\hat{A}-\bar{A})^{2}\vert f\rangle\]
<p>is the mean squared deviation for the quantity \({A}\) on repeatedly measuring the system prepared in state \({\vert f\rangle}\).</p>
<p>In statistical language, the quantity \({\overline{(\Delta A)^{2}}}\) is called the variance, and the square root of the variance which we can write as \({\Delta A=\sqrt{\overline{(\Delta A)^{2}}}}\) is the standard deviation. In statistics, the standard deviation gives a well-defined measure of the width of a distribution.</p>
<p>We can also consider some other quantity \({B}\) associated with the Hermitian operator \({\hat{B}}\) and, with similar definitions</p>
\[\overline{(\Delta B)^{2}}\equiv\langle(\Delta\hat{B})^{2}\rangle=\langle f\vert (\hat{B}-\bar{B})^{2}\vert f\rangle\]
<p>So we have ways of calculating the uncertainty in the measurements of the quantities \({A}\) and \({B}\) when the system is in the state \({\vert f\rangle}\) to use in a general proof of the uncertainty principle.</p>
<p>Suppose two Hermitian operators \({\hat{A}}\) and \({\hat{B}}\) do not commute and have commutation rest \({\hat{C}}\) as defined above in \({[\hat{A},\hat{B}]=i\hat{C}}\). Consider, for some arbitrary real number \({\alpha}\), the number</p>
\[G(\alpha)=\langle(\alpha\Delta\hat{A}-i\Delta\hat{B})f\vert (\alpha\Delta\hat{A}-i\Delta\hat{B})f\rangle\geq 0\]
<p>By \({\vert (\alpha\Delta\hat{A}-i\Delta\hat{B})f\rangle}\) we mean the vector \({(a\Delta\hat{A}-i\Delta\hat{B})\vert f\rangle}\) written this way to emphasize it is simply a vector so it must have an inner product with itself that is greater than or equal to zero. So,</p>
\[\begin{array}{rcl} G(\alpha)&=&\langle f\vert (\alpha\Delta\hat{A}-i\Delta\hat{B})^{\dag}(\alpha\Delta\hat{A}-i\Delta\hat{B})\vert f\rangle(\geq 0) \\ &=&\langle f\vert (\alpha\Delta\hat{A}^{\dag}+i\Delta\hat{B}^{\dag})(\alpha\Delta\hat{A}-i\Delta\hat{B})\vert f\rangle \\ &=&\langle f\vert (\alpha\Delta\hat{A}+i\Delta\hat{B})(\alpha\Delta\hat{A}-i\Delta\hat{B})\vert f\rangle \\ &=&\langle f\vert \alpha^{2}(\Delta\hat{A})^{2}+(\Delta\hat{B})^{2}-i\alpha(\Delta\hat{A}\Delta\hat{B}-\Delta\hat{B}\Delta\hat{A})\vert f\rangle \\ &=&\langle f\vert \alpha^{2}(\Delta\hat{A})^{2}+(\Delta\hat{B})^{2}-i\alpha[\Delta\hat{A},\Delta\hat{B}]\vert f\rangle \\ &=&\langle f\vert \alpha^{2}(\Delta\hat{A})^{2}+(\Delta\hat{B})^{2}+\alpha\hat{C}\vert f\rangle \\ &=&\alpha^{2}\overline{(\Delta A)^{2}}+\overline{(\Delta B)^{2}}+\alpha\bar{C} (\geq 0) \end{array}\]
<p>where \({\bar{C}\equiv\langle C\rangle=\langle f\vert \hat{C}\vert f\rangle}\).</p>
<p>By a simple (though not obvious) rearrangement</p>
\[G(\alpha)=\overline{(\Delta A)^{2}}\left[\alpha+\frac{\bar{C}}{2\overline{(\Delta A)^{2}}}\right]^{2}+\overline{(\Delta B)^{2}}-\frac{(\bar{C})^{2}}{4\overline{(\Delta A)^{2}}}\geq 0\]
<p>But the equation above must be true for arbitrary \({\alpha}\), so it is true for</p>
\[\alpha=-\frac{\bar{C}}{2\overline{(\Delta A)^{2}}}\]
<p>which sets the first term equal to zero, so</p>
\[\overline{(\Delta A)^{2}}\,\overline{(\Delta B)^{2}}\geq\frac{(\bar{C})^{2}}{4}\]
<p>So, for two operators \({\hat{A}}\) and \({\hat{B}}\), corresponding to measurable quantities \({A}\) and \({B}\) for which \({[\hat{A},\hat{B}]=i\hat{C}}\) in some state \({\vert f\rangle}\) for which \({\bar{C}\equiv\langle C\rangle=\langle f\vert \hat{C}\vert f\rangle}\), we have the uncertainty principle</p>
\[\Delta A\Delta B\geq\frac{\vert \bar{C}\vert }{2}\]
<p>where \({\Delta A}\) and \({\Delta B}\) are the standard deviations of the values of \({A}\) and \({B}\) we would measure.</p>
<p>The conclusion above is generally true. Only if the operators \({\hat{A}}\) and \({\hat{B}}\) commute. or if they do not commute, but we are in a state \({\vert f\rangle}\) for which \({\langle f\vert \hat{C}\vert f\rangle=0}\). Only in those cases it is possible for both \({A}\) and \({B}\) simultaneously to have exact measurable values.</p>
<p>####3.3. Specific uncertainty principles</p>
<p>We now formally derive the position-momentum relation. Consider the commutator of \({\hat{p}_{x}}\) and \({x}\) (we treat the function \({x}\) as the operator for position). To be sure we are taking derivatives correctly, we have the commutator operate on an arbitrary function:</p>
\[\begin{array}{rcl} [\hat{p}_{x},x]\vert f\rangle&=&-i\hbar\left(\frac{d}{dx}x-x\frac{d}{dx}\right)\vert f\rangle=-i\hbar\left\{\frac{d}{x}(x\vert f\rangle)-x\frac{d}{dx}\vert f\rangle\right\} \\ &=&-i\hbar\left\{\vert f\rangle+x\frac{d}{dx}\vert f\rangle-x\frac{d}{dx}\vert f\rangle\right\}=-i\hbar\vert f\rangle \end{array}\]
<p>Since \({\vert f\rangle}\) is arbitrary, we can write \({[\hat{p}_{x},x]=-i\hbar}\) and the commutation rest operator \({\hat{C}}\) is simply the number \({\hat{C}=-\hbar}\). And so, from \({\Delta A\Delta B\geq\vert \bar{C}\vert /2}\), we have</p>
\[\Delta\hat{p}_{x}\Delta x\geq\frac{\hbar}{2}\]
<p>The energy operator is the Hamiltonian \({\hat{H}}\) and from Schroedinger’s equation</p>
\[\hat{H}\vert \psi\rangle=i\hbar\frac{\partial}{\partial t}\vert\psi\rangle\]
<p>so we use \({\hat{H}=i\hbar\partial/\partial t}\).</p>
<p>If we take the time operator to be just \({t}\) then using essentially identical algebra as used for the momentum-position uncertainty principle.</p>
\[[\hat{H},t]=i\hbar\left(\frac{\partial}{\partial t}t-t\frac{\partial}{\partial t}\right)=i\hbar\]
<p>so, similarly we have</p>
\[\Delta E\Delta t\geq\frac{\hbar}{2}\]
<p>We can relate this result mathematically to the frequency-time uncertainty principle that occurs in Fourier analysis. Noting the \({E=\hbar\omega}\) in quantum mechanics, we have</p>
\[\Delta\omega\Delta t\geq\frac{1}{2}\]
Fri, 01 Nov 2013 00:00:00 +0000
https://dgyblog.com/oldtimes/2013/11/01/quantum-for-scieng-6/
https://dgyblog.com/oldtimes/2013/11/01/quantum-for-scieng-6/