This was presented to two statistics and data mining classes at the University of Alabama on March 14, 2013. The purpose was to provide a high level overview of the open source tools and community with an emphasis on the Python world. As I went through preparing, I was impressed with how much activity there is and how it is accelerating. Those who already participate in the Python community will know many if not most of these resources, for new people, there will be some useful links. Again, those who know Python will recognize how much I left out, but this is just the intro! So I don't want to overwhelm anyone.
Python For Data Analysis. Author: Wes McKinney, Publisher: O’Reilly Media, Inc, Sebastopol, CA 95472 isbn: 978-1-449-31979-3 copyright 2013, 472 pages, cover price: $39.99
No matter what your skill level, you need this book.
Once you get the basics of the Python language down you need to lift your skills to the next level to do useful work. For me, I wanted to put up a few web sites so that encouraged me to learn about Django, a web framework written in Python. This leap left me with some major gaps in my conceptual understanding of many Python idioms, which with time I guess I’ll fill in.
Another area where I spend time is doing various forms of Data Analysis. The Python tools and skills needed for this work are quite different from building web sites. I was very interested when I first saw the Pandas data analysis library which recommended the use of I-Python, another tool that I had not really begun to explore.
In short, the Pandas library and I-Python tool set make for a very powerful data manipulation and analysis toolset. There is a fascinating confluence of activity in the Python world with packages such as Numpy, scipy and I-Python and now Pandas, stats models, scikit learn and Numba increasingly supporting the Scientific and Data analysis communities. While a lot of the tools emphasize Matrix (array) operations, which initially put me off, Pandas makes it way more approachable since it more closely resembles spreadsheet structures which in fact resemble matrices once you wrap your head around the concepts.
Another major data manipulation capability introduced by Pandas, as explained in the text, is a set of SQL- like operators for Array operations enabling joining, summarizing and other SQL like operations on in-memory datasets.
I bought an early access copy of Python for Data Analysis and I have since kept it up to date which is a great feature of O’Reilly early access publications.
The book covers basic prerequisite information on the following:
The book is excellent taking one through the conceptual issues through to the execution of sophisticated analysis of data sets from a variety of sources. The problems are well documented and the code can be executed with available data (something I have yet to do). Examples include: Getting and using data from:
- Federal Election Commission
- Yahoo finance
Throughout the book examples and code are presented with thorough explanations, way beyond simple code commentary in a teacher-like style. Due to the nature of the code being explained there was for me a constant set of aha moments as I began to understand not just the syntax of the code but why the code should be used to achieve the desired result, and also how to use some of the less obvious Python and Pandas language elements to better effect, in short helping me to be a more fluent coder.
Wes is truly a polymath, who understands analysis, advanced math and statistics as well as being an awesome coder of the Pandas package itself, and, to boot, he can write clearly in a way to be understood by mere mortals attempting to get up to speed on the tools and concepts embraced and enabled by the tools he has built and assembled. I also find the appendix summary of the Python language to be compact and useful in its own right.
Can’t recommend the book highly enough.
The Signal And The Noise, By Nate Silver, The Penguin Press, New York, copyright 2012 isbn 978-1-59420-411-1, 534 pages, Cover price: $27.95
I’m no math whiz but I love analysis. This book has little math but much useful insight on how to view and analyze data and information.
Nate Silver became famous for his predictions about the most recent presidential election in his 538 blog and columns for the New York Times.
The book is highly readable and one almost does not realize that they are in the hands of a master teacher. You just learn a ton of useful ways to think about analytical problems.
People looking for a more detailed explanation of how Nate built his election models will be disappointed. Conversely, if statistics, formulas, and detailed mathematical explanations scare or bore you , have no fear, those items are not present.
This places the book in a somewhat awkward spot, not tech and yet analytical. Once you “get it”, it becomes quite enjoyable.. As I told my wife, as I got to the the 8th chapter, I was getting to the “good part” where Nate goes into what is perhaps the most technical part of the book with an explanation of Bayesian statistics and the difference from conventional frequentist statistics that most of us have learned in school.
While, in truth, I was hoping for a “bit more” of math, and or computer code, to help me learn how to apply the techniques which are so well explored in the book, I was very satisfied with what I got. My frustration is due to my desire to translate theory and examples into code and personal utility on various projects I’m working on. So, in short, a personal issue.
What is very well done is to explore a wider variety of “problem spaces” such as baseball prediction, political prediction, the financial meltdown, Texas Holdem betting strategy, predicting terrorist activity, climate change and picking correct data vs noisy or biased data, and how to tell the difference. The examples are explained in sufficient detail so you learn a great deal about how to approach similar problems and how to evaluate similar data and conclusions. You will certainly view your investment process in a different and more informed manner.
I strongly recommend this book for anyone interested in day to day evaluation of life choices,which should be everyone. The book would also be a good college text for learning about “statistical thinking”.
Dave Girouard, the former head of Google's' Apps business, says it very well in this piece on the GIGAOM site, which I strongly recommend reading.
Insane excuses for not moving to the cloud.
- Insanity #1: These big outages mean we should keep things in house
- Insanity #2: I need somebody to talk to when a service interruption occurs
- Insanity #3; Cloud is OK for non-critical applications with non-sensitive data
See the full article Here.
And this followup post about legal risk makes good points as well Here.
And even more from Google on how they handle legal request/warrants Here.
I would also add that I see many people just staying put because the default decision to do nothing seems so appealing. After all who wants to leave Outlook for something new that they have to learn, just remember back on how difficult it was to learn Outlook in the first place. Not to mention the confusion and trauma that a new release presents to the end users.
The Google product has been hugely successful due to its ease of use and continuous non traumatic update cycle, and cost savings are simply a bonus. In addition the Google products are built from the ground up to be completely Mobile friendly. It is amazing how many individuals choose to use Gmail for personal use and feel they step back in time every-time they have to use company provided exchange mail.
In short not moving is simply falling further behind and spending more than is needed for the comfort of not learning a bit of new technology.
Google's' latest Quarterly report included the following on Google apps acceptance.
"Our enterprise business continued to grow at an impressive pace, gaining traction among some of the largest companies in the world. New customers include Nintendo, the Canadian Broadcast Company, Shaw Industries, POSCO, Randstad and Hyundai, to name a few. And after signing in May, the U.S. Department of the Interior moved more than 70,000 employees to the cloud during Q4 making it the largest federal agency to date using Google Apps."
Here is a brief introduction to Google Apps for business.By: Tom Brander
I recently converted a 45 seat company from hosted Exchange to Google Apps. All was working well. Then.....
A few weeks later when we deactivated Exchange, I began to get a few complaints that some people sending us e-mails were receiving bounce messages. A quick glance at the messages seemed to indicate that the senders were somehow still sending the messages to the old host. So, I felt that the rejections were exactly as they should be, and I advised the senders to determine why they were not updating their MX records, as the vast majority was working fine.
That got me nowhere and the complaints escalated with more companies reporting the same issue. Finally, one of our affiliate law firms with the issue worked through it with Microsoft, although I could not believe the solution at first.
Apparently, our former host also subscribed to Forefront (from Microsoft) which provides some security and policy enforcement. It also nails down the IP address of the mail server for ALL users across Forefront. So, no matter what the MX record says mail will be routed by IP for all Forefront users. This occurs even if the MX record is changed and the old Exchange server is removed.
The solution is to remove the domain from Forefront which must be done by request to the Forefront service by the Forefront/Old Exchange client.
Now all mail from all external companies is flowing correctly to Google Apps.
I have not found ANY mention of this issue on the web or in Google Apps documentation.
This impacted our receipt of mail from several banks, law firms and public authorities (all Forefront users).By: Tom Brander Google +:http://profiles.google.com/tombrander
This was presented to the Alabama Center For Real Estate Advisory Board on August 2, 2012.By: Tom Brander
Updated as of 10/25/2012
The slide show below is the class reference material for the course which serves as an overview of how to use the Internet to effectively market your company and yourself, There are numerous links with additional reference material. A lot of additional material is covered in the verbal presentation but I think you will find some useful stuff here, no matter what your current level of expertise.
Many thanks to the fine people, Particularly Grayson Glaze, the director at the Alabama Center for Real Estate for Sponsoring me to develop and present the course
Market review material is here: