Sunday, September 16, 2012

Big Data, Yay!

I attended a few presentations a couple of weeks ago about "Big Data." Apparently, the huge amount of data being created and recorded due to recent technological advances is just sort of hanging out, waiting for interpretation. More software/programming savvy scholars are trying to figure out ways of translating and presenting all of that information to humanities and social science scholars. Though admittedly a lot of it was over my head and not necessarily of direct use, there were some interesting tools that were presented by Marc Smith from the Social Media Foundation.

NodeXL is one of the programs that has been developed to help researchers gather and interpret data from social networks. The Graph Gallery is made up of submitted data sets that, if I understand correctly, are collected and submitted by users in order to make their information available to others. This gives some great examples of the potential applications of quantitative examinations of social networking. Here's a graph of some words that were used in conjunction with BGSU, and the links between users:


Now, obviously this is a little bit hard for some of us to interpret, so information about how to read the graph is also included:
The graph represents a network of 206 Twitter users whose recent tweets contained "bgsu". The network was obtained on Thursday, 06 September 2012 at 00:27 UTC. There is an edge for each follows relationship. There is an edge for each "replies-to" relationship in a tweet. There is an edge for each "mentions" relationship in a tweet. There is a self-loop edge for each tweet that is not a "replies-to" or "mentions". The tweets were made over the 8-hour, 34-minute period from Wednesday, 05 September 2012 at 15:46 UTC to Thursday, 06 September 2012 at 00:21 UTC...
The edge colors are based on relationship values. The edge widths are based on edge weight values. The edge opacities are based on edge weight values. The vertex sizes are based on followers values. The vertex opacities are based on followers values.


There is also a list of the top users in terms of centrality and mentions. So how could this be used for my interests? I think that overall, an attempt to account for the connections between users or the circulation of imagery might give me a better idea of what is going on below the surface. Perhaps it might allow me to make inferences about the function and use of imagery such as thinspiration if I could understand the social network analytics. I'm not sure that NodeXL is set up for Tumblr yet, but I am certain I can find some way to incorporate this into my independent study and thesis work.

No comments:

Post a Comment