<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-4782099858070252235</id><updated>2011-09-14T08:27:56.999-07:00</updated><category term='simulation'/><category term='computer science'/><category term='visualization'/><category term='math'/><category term='statistics'/><category term='R MatLab'/><category term='bioinformatics'/><category term='Latex R'/><title type='text'>Tieming Ji  /  Stat+Plot</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://blog.tieming.org/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4782099858070252235/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://blog.tieming.org/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Tieming Ji</name><uri>http://www.blogger.com/profile/03978016933125530140</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>14</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-4782099858070252235.post-7898983171533627284</id><published>2011-03-15T10:21:00.000-07:00</published><updated>2011-03-15T10:30:37.423-07:00</updated><title type='text'>A little visualization practice of pharmaceutical companies world wide</title><content type='html'>I do not know if you have had similar questions with me when you get your medicine from the pharmacies. I have questions such as "Where are my medicine developed?" "What other alternative choices do I have from a different medicine company?", or "Is USA better than other countries in the world for medicine development?" "How the people in other countries deal with their illness?" etc.&lt;br /&gt;&lt;br /&gt;I downloaded data from &lt;a href="http://en.wikipedia.org/wiki/List_of_pharmaceutical_companies"&gt;wikipedia&lt;/a&gt;. The following visualization practice does not solve all the questions in depth, we clearly need more data and more sophisticated statistical analysis for that, but these visualizations for this small amount of data could give us a first look at the questions.&lt;br /&gt;&lt;br /&gt;I extracted the top 50 large companies' data online, sort them by countries. The following figure is a histogram showing how many large pharmaceutical companies in different countries. USA has more than 20 companies in the top 50 company list. Following US, Japan has 10 companies among the top 50 list. In the second figure, I sum up the revenue of companies in different countries (in million dollars). Clearly, US based companies (in sum) have more total revenue than other countries. Of course, there will be a currency problem, for example, 1 US dollars = 5.5 RMB. Well, it might be more fair if we only compare US with European countries, it looks like US have more companies, more people, more investment on health care research and development. &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-cKQSdnp1KNs/TX-aLvLGYMI/AAAAAAAAA_8/ytrNYAtq3Ss/s1600/fig1.gif" imageanchor="1" style="margin-left:1em; margin-right:1em"&gt;&lt;img border="0" height="356" width="400" src="http://2.bp.blogspot.com/-cKQSdnp1KNs/TX-aLvLGYMI/AAAAAAAAA_8/ytrNYAtq3Ss/s400/fig1.gif" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-ZuBTe4Z_Gzo/TX-aSIbSVDI/AAAAAAAABAE/NBnuVyR5g_Y/s1600/fig2.gif" imageanchor="1" style="margin-left:1em; margin-right:1em"&gt;&lt;img border="0" height="356" width="400" src="http://4.bp.blogspot.com/-ZuBTe4Z_Gzo/TX-aSIbSVDI/AAAAAAAABAE/NBnuVyR5g_Y/s400/fig2.gif" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4782099858070252235-7898983171533627284?l=blog.tieming.org' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.tieming.org/feeds/7898983171533627284/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4782099858070252235&amp;postID=7898983171533627284' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4782099858070252235/posts/default/7898983171533627284'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4782099858070252235/posts/default/7898983171533627284'/><link rel='alternate' type='text/html' href='http://blog.tieming.org/2011/03/little-visualization-practice-of.html' title='A little visualization practice of pharmaceutical companies world wide'/><author><name>Tieming Ji</name><uri>http://www.blogger.com/profile/03978016933125530140</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-cKQSdnp1KNs/TX-aLvLGYMI/AAAAAAAAA_8/ytrNYAtq3Ss/s72-c/fig1.gif' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4782099858070252235.post-3548801733196770048</id><published>2011-01-16T12:10:00.000-08:00</published><updated>2011-01-16T12:11:31.051-08:00</updated><title type='text'>What People Complain about New York</title><content type='html'>&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/_qhXsEbrebC8/TTNQfd3n0RI/AAAAAAAAA_s/TQkC9oCtE6o/s1600/311call.png" imageanchor="1" style="margin-left:1em; margin-right:1em"&gt;&lt;img border="0" height="300" width="450" src="http://2.bp.blogspot.com/_qhXsEbrebC8/TTNQfd3n0RI/AAAAAAAAA_s/TQkC9oCtE6o/s400/311call.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Beautiful &lt;a href="http://www.wired.com/magazine/2010/11/ff_311_new_york/all/1"&gt;visualization&lt;/a&gt; tells the story of New York.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4782099858070252235-3548801733196770048?l=blog.tieming.org' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.tieming.org/feeds/3548801733196770048/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4782099858070252235&amp;postID=3548801733196770048' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4782099858070252235/posts/default/3548801733196770048'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4782099858070252235/posts/default/3548801733196770048'/><link rel='alternate' type='text/html' href='http://blog.tieming.org/2011/01/what-people-complain-about-new-york.html' title='What People Complain about New York'/><author><name>Tieming Ji</name><uri>http://www.blogger.com/profile/03978016933125530140</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_qhXsEbrebC8/TTNQfd3n0RI/AAAAAAAAA_s/TQkC9oCtE6o/s72-c/311call.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4782099858070252235.post-8085905481873866198</id><published>2011-01-16T10:05:00.000-08:00</published><updated>2011-01-16T12:04:23.530-08:00</updated><title type='text'>Netflix Rental Patterns</title><content type='html'>&lt;center&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_qhXsEbrebC8/TTM8B-mzd2I/AAAAAAAAA_I/Sg-9Ns-JXq4/s1600/sf_netflix.png"&gt;&lt;img style="float:center; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 500px; height: 340px;" src="http://4.bp.blogspot.com/_qhXsEbrebC8/TTM8B-mzd2I/AAAAAAAAA_I/Sg-9Ns-JXq4/s400/sf_netflix.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5562855969415984994" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;/center&gt;&lt;br /&gt;&lt;br /&gt;By Matthew Bloch, Amanda Cox, Jo Craven McGinty and Kevin Quealy/&lt;a href="http://www.nytimes.com/interactive/2010/01/10/nyregion/20100110-netflix-map.html?ref=nyregion"&gt;The New York Times&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4782099858070252235-8085905481873866198?l=blog.tieming.org' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.tieming.org/feeds/8085905481873866198/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4782099858070252235&amp;postID=8085905481873866198' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4782099858070252235/posts/default/8085905481873866198'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4782099858070252235/posts/default/8085905481873866198'/><link rel='alternate' type='text/html' href='http://blog.tieming.org/2011/01/netflix-rental-patterns.html' title='Netflix Rental Patterns'/><author><name>Tieming Ji</name><uri>http://www.blogger.com/profile/03978016933125530140</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_qhXsEbrebC8/TTM8B-mzd2I/AAAAAAAAA_I/Sg-9Ns-JXq4/s72-c/sf_netflix.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4782099858070252235.post-1513442714900643160</id><published>2010-12-17T09:03:00.000-08:00</published><updated>2011-01-16T11:05:39.181-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='visualization'/><title type='text'>FireFox Visualization Competition</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_qhXsEbrebC8/TQuYzWmACWI/AAAAAAAAA-E/b_1oW8Zc7fo/s1600/Tieming_Rplot1.jpg"&gt;&lt;img style="float:center; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 450px; height: 450px;" src="http://3.bp.blogspot.com/_qhXsEbrebC8/TQuYzWmACWI/AAAAAAAAA-E/b_1oW8Zc7fo/s400/Tieming_Rplot1.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5551698973669984610" /&gt;&lt;/a&gt; &lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_qhXsEbrebC8/TQuY8aCG5jI/AAAAAAAAA-M/g0OTfrdbNqE/s1600/Tieming_Rplot2.jpg"&gt;&lt;img style="float:center; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 450px; height: 450px;" src="http://1.bp.blogspot.com/_qhXsEbrebC8/TQuY8aCG5jI/AAAAAAAAA-M/g0OTfrdbNqE/s400/Tieming_Rplot2.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5551699129212003890" /&gt;&lt;/a&gt; &lt;br /&gt;&lt;/a&gt; &lt;br /&gt;The official competition website is &lt;a href="http://design-challenge.mozillalabs.com/open-data/OpenDataCompetition.php"&gt;here&lt;/a&gt;. In this competition, I did some analysis of user type and their web activity pattern. &lt;br /&gt;&lt;br /&gt;I categorize users into 3 major groups based on their main reason to use the web. The users who do coding work have the highest computer skill. The users who do not do coding work but do non-coding work or school work have the intermediate computer skill. And the users who use web only for entertainment (networking, socialization, etc.) have the lowest computer skill. I call these three different groups of users coding type, non-coding-school type, and entertainment-only type users respectively. &lt;br /&gt;&lt;br /&gt;In the first exploration, we want to examine how much time different types of users spend on web. The data are collected from the entry of their main reason to use the web in the survey table. After removing empty entries and free text entries, we totally get 3,788 complete user input with unique IDs and their gender, age, time spend on web, self-evaluated computer skill, main reason on web, and most frequently visited websites. The following analysis is based on these data. &lt;br /&gt;&lt;br /&gt;According to the definition of the three types of users in the summary, there are 1,360 coding type users, 1,605 non-coding-school type users, and 823 entertainment-only users.  &lt;br /&gt;&lt;br /&gt;Based on the entry of time they spend on the web each day in the survey table, we plot the approximate distribution of time on web for each user type. The data are displayed in Figure 1. The figure shows that the entertainment-only users more often spend 2 to 6 hours each day online than the other two type users. While more percentage of coding users spend more than 6 hours online daily. And the majority of non-coding-school type users spend around 2 to 8 hours.   &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Based on the first exploration, we want to know how do different types of users spend their time online, what do they do when they are online, and whether there is any difference among them.  &lt;br /&gt;&lt;br /&gt;We use the entry of their most frequently visited websites in the survey table. &lt;br /&gt;We count the number of users in each user type for the listed 13 activities. &lt;br /&gt;We plot the data in Figure 2. We find that there is a substantially larger percentage (&gt;15%) of users in the coding group than the entertainment-only group claiming that they frequently visit search pages, news pages, and their mail boxes. There is also a moderately larger percentage (&gt;10%) of users in the coding type than the entertainment-only type claiming that they frequently visit forums, banking and online word processing pages.  And a slightly larger percentage (6%) of users in the coding type than the entertainment-only type claim that they frequently visit shopping websites. The percentages of non-coding-school type users on the above activities are in the middle between the other two types. There is not much difference among these three type users for other activities, such as network, download, video, adult page, games, and gambling.  &lt;br /&gt;&lt;br /&gt;This result suggests that coding type users do not only use online resource more frequently for their work, such as search, forums, mails, etc., they also visit websites for their life more frequently, such as news pages, banking, shopping, etc. &lt;br /&gt;&lt;br /&gt;Another point is that the most popular and useful web applications are search applications, news pages, mails, and network applications. &lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4782099858070252235-1513442714900643160?l=blog.tieming.org' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.tieming.org/feeds/1513442714900643160/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4782099858070252235&amp;postID=1513442714900643160' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4782099858070252235/posts/default/1513442714900643160'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4782099858070252235/posts/default/1513442714900643160'/><link rel='alternate' type='text/html' href='http://blog.tieming.org/2010/12/firefox-visualization-competition.html' title='FireFox Visualization Competition'/><author><name>Tieming Ji</name><uri>http://www.blogger.com/profile/03978016933125530140</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_qhXsEbrebC8/TQuYzWmACWI/AAAAAAAAA-E/b_1oW8Zc7fo/s72-c/Tieming_Rplot1.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4782099858070252235.post-6080797992455416332</id><published>2010-12-12T10:46:00.000-08:00</published><updated>2010-12-12T11:16:27.785-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='R MatLab'/><title type='text'>Optimization with constraints in R and MatLab</title><content type='html'>Often we need to compute the maximum (or minimum) of a function with parameter(s) in a constraint interval. For example, we want to get the maximum output with limited budget, or maximum product with limited time, etc.&lt;br /&gt;&lt;br /&gt;There are several R functions for computing optimal solutions for optimizing a goal function with constraints. In my research and projects, I often use optim(), optimize() or nlminb(). &lt;br /&gt;&lt;br /&gt;optim() supports high dimensional optimization with box constraints. While, optimize() is a special case of optim() that optimize() only supports one-dimensional optimization. optimize() adapts golden search algorithm and successive interpolation which does not require any additional information, such as gradient function etc. And in many cases, one-dimension optimization is sufficient for our problems. nlminb() supports high dimension optimization with box constraints using a Newton-Type algorithm. It does not require gradient (1st order derivative) or hessian matrix (2nd order derivative) if it is a one-dimension problem (see PORT routines by at&amp;t bell labs), but they are required if it is a high dimension problem. The derivatives may behave unstable at the edge of box constraints. In this case, nlminb() does not give a very sensible solution. Thus it is always a good check to look at extreme parameter values. optimize() and nlminb() give similar results up to 5 digits after decimal point when the parameter values are not the edge point (or close to edge).&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;In MatLab, I use fmincon() function for one-dimension optimization with box constraints.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4782099858070252235-6080797992455416332?l=blog.tieming.org' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.tieming.org/feeds/6080797992455416332/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4782099858070252235&amp;postID=6080797992455416332' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4782099858070252235/posts/default/6080797992455416332'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4782099858070252235/posts/default/6080797992455416332'/><link rel='alternate' type='text/html' href='http://blog.tieming.org/2010/12/optimization-with-constraints-in-r-and.html' title='Optimization with constraints in R and MatLab'/><author><name>Tieming Ji</name><uri>http://www.blogger.com/profile/03978016933125530140</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4782099858070252235.post-4532262760257481678</id><published>2010-12-10T10:58:00.000-08:00</published><updated>2010-12-12T10:46:11.306-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Latex R'/><title type='text'>Sweave</title><content type='html'>&lt;span class="Apple-style-span"&gt;Copying and pasting of a huge amount of data from &lt;/span&gt;&lt;span class="Apple-style-span"&gt;R&lt;/span&gt;&lt;span class="Apple-style-span"&gt; output to &lt;/span&gt;&lt;span class="Apple-style-span"&gt;Latex&lt;/span&gt;&lt;span class="Apple-style-span"&gt; report is labourious. This motivates me to learn sweave. Imagine that I have a table of simulation results which contains 5 rows and 10 columns, and I run 3 times of simulations with different parameter values, without &lt;/span&gt;&lt;meta charset="utf-8"&gt;&lt;span class="Apple-style-span" style="font-family: 'courier new'; "&gt;sweave&lt;/span&gt;&lt;span class="Apple-style-span"&gt;, I need to manually copy and paste 150 times! Here is the link for the tutorial I found. Hope it is helpful for the people who have the same need with me.&lt;/span&gt;&lt;div&gt;&lt;span class="Apple-style-span"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"&gt;&lt;a href="http://www.cepe.ethz.ch/education/NPecoHS2010/Sartori-Sweave.pdf"&gt;http://www.cepe.ethz.ch/education/NPecoHS2010/Sartori-Sweave.pdf&lt;/a&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4782099858070252235-4532262760257481678?l=blog.tieming.org' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.tieming.org/feeds/4532262760257481678/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4782099858070252235&amp;postID=4532262760257481678' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4782099858070252235/posts/default/4532262760257481678'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4782099858070252235/posts/default/4532262760257481678'/><link rel='alternate' type='text/html' href='http://blog.tieming.org/2010/12/sweave.html' title='Sweave'/><author><name>Tieming Ji</name><uri>http://www.blogger.com/profile/03978016933125530140</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4782099858070252235.post-8986591793977747821</id><published>2009-07-23T09:37:00.000-07:00</published><updated>2010-12-10T21:48:10.556-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='statistics'/><title type='text'>Statistics on Tipping at Germany</title><content type='html'>&lt;div style="text-align: left;"&gt;I was&lt;span class="Apple-style-span"&gt; in a &lt;/span&gt;&lt;span class="Apple-style-span"&gt;ggplot2 &lt;/span&gt;&lt;span class="Apple-style-span"&gt;workshop. The following figures are examples in that workshop. In Figure 1, we were trying to explore the relationship of total bills and the amount of tips that females (&lt;/span&gt;&lt;span class="Apple-style-span"&gt;F&lt;/span&gt;&lt;span class="Apple-style-span"&gt;) and males (&lt;/span&gt;&lt;span class="Apple-style-span"&gt;M&lt;/span&gt;&lt;span class="Apple-style-span"&gt;) &lt;/span&gt; would give &lt;span class="Apple-style-span"&gt;when they drink more than one cup of alcohol (&lt;/span&gt;&lt;span class="Apple-style-span"&gt;Yes&lt;/span&gt;&lt;span class="Apple-style-span"&gt;) or not (&lt;/span&gt;&lt;span class="Apple-style-span"&gt;No&lt;/span&gt;&lt;span class="Apple-style-span"&gt;). The straight line is a first-order regression line, and the curve line is a &lt;/span&gt;&lt;span class="Apple-style-span"&gt;lowess&lt;/span&gt;&lt;span class="Apple-style-span"&gt; fit. One obvious observation from Figure 1 is that when people drink alcohol, there is more variation in their tip. Some times, they tip up to around 20% of the total bill, and other times as low as 3%. Secondly, females do not spend as much money as males do in restaurant. The total bill for women on average are less than males. There are more observations from men which may indicate that either males pay for females more or males go out to eat more frequently. In addition, when people drink alcohol, the total bills for them are usually more than the ones who do not drink. Maybe people who do not drink alcohol have more sense of saving (or not wasting) money. At last, comparing the tipping rate at Germany with that in America, American people are more generous in tipping. We usually give 15% to 20% of the total bill for tips, don't we?&lt;/span&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_qhXsEbrebC8/SmiaNAqJx4I/AAAAAAAAAs8/lBbSuKfWEq0/s1600-h/tipping.bmp"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 396px; height: 400px;" src="http://4.bp.blogspot.com/_qhXsEbrebC8/SmiaNAqJx4I/AAAAAAAAAs8/lBbSuKfWEq0/s400/tipping.bmp" alt="" id="BLOGGER_PHOTO_ID_5361704904690354050" border="0" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;In Figure 2, we try to explore tipping patterns for groups. The shaded area is 95% confidence interval. It looks like a group of 5 and 6 exhibit dramatically more variation in tipping than other groups.&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;div style="text-align: justify;"&gt;&lt;span class="Apple-style-span"&gt;&lt;u&gt;&lt;br /&gt;&lt;/u&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;span class="Apple-style-span"&gt;&lt;u&gt;&lt;br /&gt;&lt;/u&gt;&lt;/span&gt;&lt;/div&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_qhXsEbrebC8/SmiaNAqJx4I/AAAAAAAAAs8/lBbSuKfWEq0/s1600-h/tipping.bmp"&gt;&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_qhXsEbrebC8/SmiaIhHWpaI/AAAAAAAAAs0/S25302ykr2U/s1600-h/tipping2.bmp"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 373px;" src="http://3.bp.blogspot.com/_qhXsEbrebC8/SmiaIhHWpaI/AAAAAAAAAs0/S25302ykr2U/s400/tipping2.bmp" alt="" id="BLOGGER_PHOTO_ID_5361704827503420834" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_qhXsEbrebC8/SmiZ_1LW56I/AAAAAAAAAss/G90oMtLWOOo/s1600-h/tipping3.bmp"&gt;&lt;br /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4782099858070252235-8986591793977747821?l=blog.tieming.org' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.tieming.org/feeds/8986591793977747821/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4782099858070252235&amp;postID=8986591793977747821' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4782099858070252235/posts/default/8986591793977747821'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4782099858070252235/posts/default/8986591793977747821'/><link rel='alternate' type='text/html' href='http://blog.tieming.org/2009/07/statistics-on-tipping-at-germany.html' title='Statistics on Tipping at Germany'/><author><name>Tieming Ji</name><uri>http://www.blogger.com/profile/03978016933125530140</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_qhXsEbrebC8/SmiaNAqJx4I/AAAAAAAAAs8/lBbSuKfWEq0/s72-c/tipping.bmp' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4782099858070252235.post-8321838535694695889</id><published>2008-10-16T15:54:00.000-07:00</published><updated>2008-10-16T16:47:35.071-07:00</updated><title type='text'>Are You Ready for Statistics</title><content type='html'>A famous saying goes as "The world does not lack beauty but lacks the eyes to discover the beauty". Perhaps, both the eyes to find beauty and the mind to sense the random world need to be trained.&lt;br /&gt;&lt;br /&gt;Dr. Bradley Efron at Stanford University came to Iowa State three days ago for a biostatistics conference as well as Lawrence Baker lectures at our university. He gave a talk titled as "Learning the experiences from others" in which he provided several examples of empirical Bayesian ideas utilized in practical problems. His talk motivated me to find more of his works including this &lt;a href="http://www.amstat.org/publications/amsn/index.cfm?fuseaction=pres122004"&gt;article&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;This article points out that previously people believe science should be exact. Such as how exact time goes by; how sure the light speed is, etc. However, years ago, science began to unveil another side of its face - the science of randomness, and it is played nearly everywhere. Statistics repeats its principles all the time, such as the rainfall example at Palo Alto given in the article in the first paragraph.  Thus, statistics study should not be restricted to classrooms or examples in books. It should be widely noticed and studied. People who study statistics should also open their mind.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4782099858070252235-8321838535694695889?l=blog.tieming.org' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.tieming.org/feeds/8321838535694695889/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4782099858070252235&amp;postID=8321838535694695889' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4782099858070252235/posts/default/8321838535694695889'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4782099858070252235/posts/default/8321838535694695889'/><link rel='alternate' type='text/html' href='http://blog.tieming.org/2008/10/are-you-ready-for-statistics.html' title='Are You Ready for Statistics'/><author><name>Tieming Ji</name><uri>http://www.blogger.com/profile/03978016933125530140</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4782099858070252235.post-8061565490448608300</id><published>2008-05-26T14:56:00.000-07:00</published><updated>2008-05-26T15:45:36.239-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='statistics'/><category scheme='http://www.blogger.com/atom/ns#' term='simulation'/><title type='text'>False Discovery Rate (FDR) Reliability Evaluation</title><content type='html'>In analysis involving multiple tests, we can control false discovery rate (FDR) by several approaches, such as Benjamini-Hochberg adjusted p value (BH adj.p value), Storey-Tibshirani q value (q value), as well as BH adj.p value and q value based on moderated t-test after empirical bayes adjustment.&lt;br /&gt;&lt;br /&gt;The question of interest is which method gives us a better control of FDR. That is, when we want to control FDR at alpha level, which gives us a more reliable control such that the FDR will not deviate far from alpha.&lt;br /&gt;&lt;br /&gt;To study this problem, we simulated 14,118 genes with expression under two treatments where the first n (n&lt;14,118) were designed to be truly differentially expressed. Then, we applied three methods for identifying differentially expressed genes - (1) q value based on usual t-test (2) BH adj.p value based on usual t-test (3) q value based on moderated t-test. The purpose is to examine the precision and accuracy of controlling FDR by these different approaches.   One result of several obtained results looks like this: (simulate 100 times of 14,118 gene expression under 2 treatment with first 3000 genes truly differently expressed. Set alpha level at 0.05)&lt;br /&gt; &lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_qhXsEbrebC8/SDs22wOH2AI/AAAAAAAAAJ0/EMBhYghiCG8/s1600-h/5.26.208.img.JPG"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 402px; height: 398px;" src="http://4.bp.blogspot.com/_qhXsEbrebC8/SDs22wOH2AI/AAAAAAAAAJ0/EMBhYghiCG8/s320/5.26.208.img.JPG" alt="" id="BLOGGER_PHOTO_ID_5204814108641712130" border="0" /&gt;&lt;/a&gt;Summary statistics are listed as follows:&lt;br /&gt;&lt;p class="MsoNormal"&gt;&lt;span style=""&gt;&lt;span style=""&gt; &lt;/span&gt;(1) q value based on usual t statistics&lt;br /&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p class="MsoNormal"&gt;&lt;span style=""&gt;mean.q      :&lt;span style=""&gt;&lt;/span&gt;&lt;span style=""&gt;&lt;/span&gt;0.04565834&lt;br /&gt;&lt;/span&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;span style=""&gt;var.q &lt;/span&gt;&lt;span style=""&gt;         :6.964762e-05 &lt;span style=""&gt; &lt;/span&gt;&lt;/span&gt;&lt;span style="" lang="DE"&gt;&lt;/span&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;span style="" lang="DE"&gt;(2) BH adj.p value based on usual t statistics&lt;/span&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;span style=""&gt;mean.bh.p:&lt;/span&gt;&lt;span style=""&gt;0.03763231&lt;/span&gt;&lt;span style=""&gt;                 &lt;br /&gt;&lt;/span&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;span style=""&gt;var.bh.p:6.955762e-05&lt;/span&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;span style="" lang="DE"&gt;(3) q value based on moderated t statistics&lt;/span&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;span style=""&gt;mean.ebayes.q:&lt;/span&gt;&lt;span style=""&gt;0.04702102&lt;span style=""&gt;   &lt;/span&gt;&lt;/span&gt;&lt;span style=""&gt;       &lt;br /&gt;&lt;/span&gt;&lt;/p&gt;&lt;p class="MsoNormal"&gt;&lt;span style=""&gt;var.ebayes.q:3.8082&lt;/span&gt;&lt;span style="" lang="DE"&gt;9e-05&lt;/span&gt;&lt;/p&gt;The above statistics tell us that the q value based on the empirical Bayes moderated t test gives the best precision and accuracy. Similar tests can be run many times to compare and evaluate these FDR control approaches.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4782099858070252235-8061565490448608300?l=blog.tieming.org' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.tieming.org/feeds/8061565490448608300/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4782099858070252235&amp;postID=8061565490448608300' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4782099858070252235/posts/default/8061565490448608300'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4782099858070252235/posts/default/8061565490448608300'/><link rel='alternate' type='text/html' href='http://blog.tieming.org/2008/05/false-discovery-rate-fdr-reliability.html' title='False Discovery Rate (FDR) Reliability Evaluation'/><author><name>Tieming Ji</name><uri>http://www.blogger.com/profile/03978016933125530140</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_qhXsEbrebC8/SDs22wOH2AI/AAAAAAAAAJ0/EMBhYghiCG8/s72-c/5.26.208.img.JPG' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4782099858070252235.post-7983074905607641332</id><published>2008-04-20T14:27:00.000-07:00</published><updated>2008-05-26T14:50:21.452-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='statistics'/><category scheme='http://www.blogger.com/atom/ns#' term='bioinformatics'/><category scheme='http://www.blogger.com/atom/ns#' term='computer science'/><title type='text'>Occurance Competing problem</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_qhXsEbrebC8/SDssngOH19I/AAAAAAAAAJU/qvUClK2ymEY/s1600-h/4.20.2008.img.JPG"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer; width: 322px; height: 169px;" src="http://3.bp.blogspot.com/_qhXsEbrebC8/SDssngOH19I/AAAAAAAAAJU/qvUClK2ymEY/s320/4.20.2008.img.JPG" alt="" id="BLOGGER_PHOTO_ID_5204802851532429266" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;span style="text-decoration: underline;"&gt;&lt;/span&gt;Problem: Suppose we flip a coin. Given two patternss of the same length composed of "Head" and "Tail", compute the probability of seeing the first string before the second one.&lt;br /&gt;&lt;br /&gt;Solution: First-step Analysis.&lt;br /&gt;&lt;br /&gt;Suppose the two strings are: HHTH, HHHT.&lt;br /&gt;&lt;br /&gt;You have graph with states and transition probabilities in the Figure.&lt;br /&gt;&lt;br /&gt;Define P&lt;span style="font-size:78%;"&gt;H&lt;/span&gt;&lt;span style="font-size:100%;"&gt;: The probability of seeing pattern HHTH before HHHT if we are currently at state H.&lt;/span&gt;&lt;br /&gt;Similary, we can define P&lt;span style="font-size:78%;"&gt;T&lt;/span&gt;, P&lt;span style="font-size:78%;"&gt;HH&lt;/span&gt;, P&lt;span style="font-size:78%;"&gt;HHH&lt;/span&gt;, P&lt;span style="font-size:78%;"&gt;HHT&lt;/span&gt;, P&lt;span style="font-size:78%;"&gt;HHHT&lt;/span&gt;, P&lt;span style="font-size:78%;"&gt;HHTH&lt;/span&gt;.&lt;br /&gt;Define P(H) and P(T) is the probability of emiting an H or a T. For a fair coin, we have P(H) = P(T) = 0.5.&lt;br /&gt;&lt;br /&gt;Thus, we have the system of equations as follows:&lt;br /&gt;P&lt;span style="font-size:78%;"&gt;H&lt;/span&gt; = P(H)*P&lt;span style="font-size:78%;"&gt;HH&lt;/span&gt; + P(T)*P&lt;span style="font-size:78%;"&gt;T&lt;/span&gt;&lt;span style="font-size:100%;"&gt; ---------(1)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;(explanation: When you stand at state H, you have probability P(H) to go to the state HH, and probability P(T) to go to the state T)&lt;/span&gt;&lt;br /&gt;P&lt;span style="font-size:78%;"&gt;T&lt;/span&gt; = P(H)*P&lt;span style="font-size:78%;"&gt;H&lt;/span&gt; + P(T)*P&lt;span style="font-size:78%;"&gt;T &lt;/span&gt;----------(2)&lt;br /&gt;P&lt;span style="font-size:78%;"&gt;HH&lt;/span&gt; = P(H)*P&lt;span style="font-size:78%;"&gt;HHH&lt;/span&gt; + P(T)*&lt;span style="font-size:78%;"&gt;PHHT&lt;/span&gt; -----------(3)&lt;br /&gt;P&lt;span style="font-size:78%;"&gt;HHH&lt;/span&gt; = P(H)*P&lt;span style="font-size:78%;"&gt;HHH&lt;/span&gt;+P(T)*P&lt;span style="font-size:78%;"&gt;HHHT&lt;/span&gt; -----------(4)&lt;br /&gt;P&lt;span style="font-size:78%;"&gt;HHT&lt;/span&gt; = P(H)*P&lt;span style="font-size:78%;"&gt;HHTH&lt;/span&gt;+P(T)*P&lt;span style="font-size:78%;"&gt;T &lt;/span&gt;&lt;span style="font-size:100%;"&gt;------------(5)&lt;/span&gt;&lt;br /&gt;P&lt;span style="font-size:78%;"&gt;HHHT&lt;/span&gt; = 0 ---------(6)&lt;br /&gt;P&lt;span style="font-size:78%;"&gt;HHTH&lt;/span&gt; = 1 ---------(7)&lt;br /&gt;&lt;br /&gt;After plugging the equations (6) and (7) in the first 5 equations, you get five unknowns and 5 equations. Now you should be able to solve the system equations to get the value of these five unknowns. They are the probabilities of seeing the first pattern before the second one given the current state.&lt;br /&gt;&lt;br /&gt;Thus, the probability of seeing pattern HHTH before the pattern HHHT is:&lt;br /&gt;Porb{HHTH occur before HHHT} = P(H)*P&lt;span style="font-size:78%;"&gt;H&lt;/span&gt; + P(T)*P&lt;span style="font-size:78%;"&gt;T&lt;/span&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4782099858070252235-7983074905607641332?l=blog.tieming.org' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.tieming.org/feeds/7983074905607641332/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4782099858070252235&amp;postID=7983074905607641332' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4782099858070252235/posts/default/7983074905607641332'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4782099858070252235/posts/default/7983074905607641332'/><link rel='alternate' type='text/html' href='http://blog.tieming.org/2008/04/occurance-competing-problem.html' title='Occurance Competing problem'/><author><name>Tieming Ji</name><uri>http://www.blogger.com/profile/03978016933125530140</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_qhXsEbrebC8/SDssngOH19I/AAAAAAAAAJU/qvUClK2ymEY/s72-c/4.20.2008.img.JPG' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4782099858070252235.post-2996822585991410934</id><published>2008-03-28T18:47:00.000-07:00</published><updated>2008-05-26T14:49:51.177-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='statistics'/><title type='text'>A Tutorial for Probability</title><content type='html'>http://www.math.uah.edu/stat/foundations/index.xhtml&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4782099858070252235-2996822585991410934?l=blog.tieming.org' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.tieming.org/feeds/2996822585991410934/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4782099858070252235&amp;postID=2996822585991410934' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4782099858070252235/posts/default/2996822585991410934'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4782099858070252235/posts/default/2996822585991410934'/><link rel='alternate' type='text/html' href='http://blog.tieming.org/2008/03/tutorial-for-probability.html' title='A Tutorial for Probability'/><author><name>Tieming Ji</name><uri>http://www.blogger.com/profile/03978016933125530140</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4782099858070252235.post-6537148648663128855</id><published>2008-03-18T16:22:00.000-07:00</published><updated>2008-05-26T14:49:31.082-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='statistics'/><category scheme='http://www.blogger.com/atom/ns#' term='bioinformatics'/><title type='text'>Nugget</title><content type='html'>Interdisciplinary study is not a way to avoid hard work and deep understanding of one specific field. In contrast, people who want to be experts in interdisciplinary study need to do a lot more work than others.&lt;br /&gt;&lt;br /&gt;I am currently a graduate student at Bioinformatics and Computational Biology program. Every year our program recruits around 10 students from hundreds of applicants all over the world. After the first two-year training, no matter what your college background is, one is supposed to stand at the graduate level of computer science, biology, and statistics, and be able to talk about any of them with sufficient understanding.&lt;br /&gt;&lt;br /&gt;This is hard, challenging, and perhaps having increasing need in future. Bioinformatics here only serves as an example.&lt;br /&gt;&lt;br /&gt;When I worked with biologists on statistical analysis, though I have had several graduate level biology courses, I still need to sit down and ask some questions about biology. Sometimes I would wonder what if I do not know so much biology as well as computer science. What if I am only a statistics person, would I be able to do as much as I can do now? Would I be able to collabrate with people and communicate with them efficiently? Would I be able to read math biology papers and computational biology papers with ease? I think the answers to all of them are no.&lt;br /&gt;&lt;br /&gt;In our program, not many of professors are well rounded in all these three fields. And it is very hard to find qualified TAs to correct homeworks since this field has just started.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4782099858070252235-6537148648663128855?l=blog.tieming.org' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.tieming.org/feeds/6537148648663128855/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4782099858070252235&amp;postID=6537148648663128855' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4782099858070252235/posts/default/6537148648663128855'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4782099858070252235/posts/default/6537148648663128855'/><link rel='alternate' type='text/html' href='http://blog.tieming.org/2008/03/nugget.html' title='Nugget'/><author><name>Tieming Ji</name><uri>http://www.blogger.com/profile/03978016933125530140</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4782099858070252235.post-4086981898481164512</id><published>2008-02-23T14:58:00.000-08:00</published><updated>2008-05-26T14:48:57.069-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='bioinformatics'/><category scheme='http://www.blogger.com/atom/ns#' term='computer science'/><title type='text'>Shuffle Compositions of a Sequence to Remain Di- and Mono- letter Frequencies</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_qhXsEbrebC8/R8CyOqWrotI/AAAAAAAAAGs/2SaV2Wd8sbs/s1600-h/graph.bmp"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer;" src="http://4.bp.blogspot.com/_qhXsEbrebC8/R8CyOqWrotI/AAAAAAAAAGs/2SaV2Wd8sbs/s320/graph.bmp" alt="" id="BLOGGER_PHOTO_ID_5170328337178665682" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;There is a very interesting application of graph in solving a bioinformatics problem raised in our bioinformatics class.&lt;br /&gt;&lt;br /&gt;Given a sequence constructed by the alphabet A, T, G, C, how to find all possible sequences which keep both the frequencies of di-nucleotide and mono-nucleotide.&lt;br /&gt;&lt;br /&gt;Suppose we have sequence CGTGAGC. Our aim is to remain the occurring number of all possible 16 di-nucleotide frequencies, 4 mono-nucleotide frequencies, as well as the length of the sequence.&lt;br /&gt;&lt;br /&gt;This could be viewed as a topology question if we convert it to a graph such that each vertex is a node A, T, G or C, and the occurance of each di-nucleotide is a directed bridge from the first letter to the second one. This idea is displayed in the above figure.&lt;br /&gt;&lt;br /&gt;To find a sequence which keeps the di-letter frequencies should be the one starting from one vertex and traversing all edges. If any of the node in the graph has odd number of edges, this node should be the starting point or the ending point (Euler's seven bridge puzzle). While, here, every vertex has even number of edges. Starting from any vertex and traversing all edges would give us a sequence which keeps the di-nucleotide frequencies. However, if and only if one begins with C and ends with C can it give us a sequences which also remains the mono-letter frequencies. We can prove this by contradiction. Suppose we can find a sequence satisfying the di-letter frequencies and not begun with C, then, a C must appear in the middle. With respect to this example, CG must in the middle. Then, another di-letter which ends with C must be right ahead of it in order to connect with CG. So we have GC right ahead of CG, and form GCG. Since in the original sequence, the beginning C and the ending C are counted separately twice, and only once in the new sequence GCG, the mono-letter frequency of C is automatically deducted once. Thus, we could keep the mono-nucleotide frequency if and only if we begin and end the with the same letters of the original one(s).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4782099858070252235-4086981898481164512?l=blog.tieming.org' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.tieming.org/feeds/4086981898481164512/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4782099858070252235&amp;postID=4086981898481164512' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4782099858070252235/posts/default/4086981898481164512'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4782099858070252235/posts/default/4086981898481164512'/><link rel='alternate' type='text/html' href='http://blog.tieming.org/2008/02/using-graphs-in-finding-di-nucleotide.html' title='Shuffle Compositions of a Sequence to Remain Di- and Mono- letter Frequencies'/><author><name>Tieming Ji</name><uri>http://www.blogger.com/profile/03978016933125530140</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_qhXsEbrebC8/R8CyOqWrotI/AAAAAAAAAGs/2SaV2Wd8sbs/s72-c/graph.bmp' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4782099858070252235.post-8468327393439280704</id><published>2008-02-01T12:27:00.001-08:00</published><updated>2008-05-26T15:33:13.609-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='math'/><title type='text'>Interesting Combinatoric Puzzles</title><content type='html'>http://www.mathpages.com/home/icombina.htm&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4782099858070252235-8468327393439280704?l=blog.tieming.org' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.tieming.org/feeds/8468327393439280704/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4782099858070252235&amp;postID=8468327393439280704' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4782099858070252235/posts/default/8468327393439280704'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4782099858070252235/posts/default/8468327393439280704'/><link rel='alternate' type='text/html' href='http://blog.tieming.org/2008/02/interesting-combinatoric-puzzles.html' title='Interesting Combinatoric Puzzles'/><author><name>Tieming Ji</name><uri>http://www.blogger.com/profile/03978016933125530140</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
