Menú Principal

Tuesday, May 3, 2011

My Review of Data Analysis with Open Source Tools

Originally submitted at O'Reilly

Turning raw data into something useful requires that you know how to extract precisely what you need. With this insightful book, intermediate to experienced programmers interested in data analysis will learn techniques for working with data in a business environment. You'll learn how to look...

Data Analysis w/ Professional Experience
By Eder Andres Avila from Paipa, Colombia on 5/3/2011

4out of 5
Pros: Well-written, Accurate
Best Uses: Intermediate, Novice, Student
Describe Yourself: University Student
This is a book about how to design a strategy to understand the organization's data collected using statistical, graphical, analytical and reporting methods and open source tools. This book explains the major concerns about how to extract the information that the data tries to show about products, finances, processes and others. For that purpose, every information engineer should consider:

• The underlying properties of data
• The ways to represent the current status of the data
• The criteria to select relevant data and attributes
• The algorithms to analyze the selected data and attributes
• The ways to report the conclusions of the performed data analysis.

The author Philipp K. Janert takes a designer approach rather than an implementer approach. That means that you will gain important suggestions and tips to propose a plan for data analysis, instead of how to build an entire or partial information infrastructure using open source tools like Python, R, PostgreSQL and Weka.

Then, for some developers the lack of full programming constructs may be disappointing. However, I feel that Philipp K. Janert's main goal is to share with us his own professional experiences in real world enterprise analytical projects from a requirements perspective. In fact, many reference and recipe books cover deeply the aforementioned open source technologies so you can start to build a data analysis subsystem from zero, but without this book, you can lack the enterprise's point of view, something much more related to data architecture and data policies.

Despite the implementer approach is not fully covered, you'll be able to understand how the analytical demands can be satisfied using specifically the programming languages Python and R given its speed of execution, numerical analysis capabilities and cross-platform support. Each chapter contains both the Philipp K. Janert's professional experience and the core programming snippets that make such concepts a programming asset.

In conclusion, if it is true that this book will not guide you to develop a data analysis tool with all the specific programming details of Python and R, it is also true that you will gain worthy professional experiences to design strategies, architectures and policies for data analysis.

This review is in exchange of the O'Reilly Bloggers Review Program (

No comments:

Post a Comment