Graph Analysis and Visualization |
Author: Richard Brath and David Jonker Reviewed by: Kay Ewbank The title of this book may suggest heavy numerical analysis, but its subtitle, Discovering Business Opportunity in Linked Data, gives a clearer idea of what it covers – graphs and how they can be used to solve business problems. The graphs in question aren’t bar or line charts, they’re node-link diagrams that the authors use to present a structured representation of connected things and how they’re related. The book opens with a good explanation of why graphs are useful, and what the different elements of node-link graphs are. The authors then go on to look at different types of graphs and how you can use them to visualize data. The classic Fisher’s Iris data set it used to illustrate relationships, and hierarchies. Other ideas including communities, flows and spatial networks are shown using other interesting data sets. The next part of the book looks at process and tools; the steps involved in taking raw data and transforming it into something that can be viewed using a graph data set. There’s a chapter on collecting, cleaning and connecting the data before the authors move on to discussing stats and layout. The stats mentioned here are the basic graph statistics – the number of nodes and edges, density, number of components, etc. Visual attributes – colors, line widths, labels and so on – are the topic of the next chapter. If this sounds a bit simple for a whole chapter, the authors point out that there’s some hard science behind getting it right. The next chapter looks at how you can turn a graph that looks like a plate of spaghetti into something more meaningful using filters and selection.
By Chapter 7, Brath and Jonker move on to discussing the point and click graph tools you might use, including Excel, NodeXL, Gephi, Cytoscape, and yEd. A chapter titled ‘lightweight programming’ gives examples of how you can go further using Python and JavaScript, with sample code. The next four chapters look at different aspects of visual analysis of graphs, starting with relationships, and the way multiple links between nodes can be used to find patterns such as anomalies in the data for fraud detection or cybersecurity. The use of hierarchies for decision trees and how you can extract hierarchies from more complex graphs is next on the agenda, followed by a good chapter on communities, and how they can be used to give a high level view of your data. The chapter on flow visualization uses route data from Napoleon’s 1812 campaign against Russia to illustrate how route and traffic analysis isn’t actually a graph, but can still be used to describe the structure and state of a system. This section of the book closes with a chapter on spatial networks, where the nodes have physical characteristics that need to be preserved as the layout is part of the information – electric power networks or oil pipelines, for example. The final section of the book is on advanced techniques, with chapters introducing analysis of big data using graph databases and Gremlin; dynamic graphs containing a time dimension so the graph changes depending on when you view it; and graph design – how to make your graphs understandable by changing the design. This is an interesting read, and the authors write clearly and understandably. This isn’t a book that teaches lots of advanced techniques; instead, it’s more about the philosophy of using graphs to analyze data, and as such it works well. It doesn’t have heavy duty numerical analysis or statistics, nor does it go into depth on big data or programming. I got the feeling the authors were trying to impart what their experience had taught them about why some graphs work to tell a story, and others are just lines and boxes.
|
|||
Last Updated ( Saturday, 28 April 2018 ) |