Today’s talk stems from a topic that has been covered in my Digital Libraries module by Kalpana Shankar, here the topic surrounded the concept of “Big Data” and what is contains.Kalpana realised that it was something we could not handle, she illustrated a model which showed 2 subsets 1) highly rated data, what we do in the module, 2) Open data , access to all.
Then a few days later a program was on BBC’s Horizon series called “Age of Big Data”. This amazing episode brought together all the threads within our Masters and really opened my eyes to a larger and diverse set of skills.
This topic really got me thinking so I dipped my toe into the stream of information that surrounds Big Data and found that Ireland are embracing this new idea, with many reports analysing the benefits. Worldwide they have also been engaging in a conversation to discuss the topic.
The more I read about “Big Data” the more I heard a small voice saying “Librarian, Librarian”, along the route there were references made to various professions, yet somehow Librarians were not included. I felt like it was back to a post at the start of term about how libraries and librarians are perceived, as Alan Barrett said why do we call users, users, they don’t use us we give them information based on the skills we have to give them the right information, who is using who?
Last week we had a class about Digital libraries and bridging this concept to Digital Repositories, part of this was the topic of Data Sharing which is the basis for this talk by Steve. The topic is centered around a study that Steve and co are doing by focusing on using documents and documenting practices to better understand distributed scientific work.
Steve starts by asking “SCIENCE” what is it? Here he furthers this answer to say it is “Economic, political, education, talent, knowledge work, and one way of looking at it which I like is “one window into studying high end work”.
Next Steves asks who uses this information, Societies use science, the emigration form asks about disease, interesting ways of looking at the openness of one profession yet linking it all back around to our discipline. Steve notes that a Scientific decision, is a politically wise one!
So a new concept gets thrown into the mix: Digital infrastructure! Steve notes” Everything we touch on is computation” and Digitally enabled. He makes an amazing point about the relativity to us here in Ireland where he states: “Virtual organisation” differs because geographically Ireland is not really here yet, because we are small and can get to each other pretty quick.
An Example was given of how to “do good science” a project was analysed where an MRI scan is done of every person in the world, for that so you need a lot of MRI machines, to do this, if possible with 7 billion people in the world that is = BIG DATA
Different terms are used for this eScience or Cyberinfrastructure as it appears in different cultures, America and Europe.This dialogue Steve illustrates need a lot of backup, he states: Scientists need money to get them to talk to each other, and science does not like to talk, here he proposes “lightweight organising” is a solution. Here you can check out the VOOS website here they are trying to make effective science.
Practise Perspective: documenting practices. How in the medical world, do they deal with documents, memo, reports. The practise of scientists. Contextualise the way these people operation, doctors nurses, surgeons how they operate on a day to day basis. So how they put together a set of tools, see what they do! These are really busy people!
The proposal here is to “See what they do model” Steve asks the tough question “How do you watch a virtual group”? All their documents are on a computer how they share or seeking information is all done virtually. Embedded into this is, Social scientists are rarely studied.
What Steve found is very interesting: Most of the the groups were 1 of 2 things: A Pre-existing relation=friendship, or what Steve calls a “pedigree relationships” like a lecture or supervisor that has been collaborated with on a projects and then given further skills on the back of the work done with them.
However these groups or shared working teams only had a Life span, collaborations max 3 yr! Many times the project lost steam and was forgotten or it may have been a Funded project and there was no money left. Where is all this DATA?Who owns it? Who can reuse it?
Seeking similar people out for these collaborations is noted in the Distribution of these people, here they were remarkably dispersed and rarely saw each other However they did use conferences, etc to find reasons to get together.
What do they do? Do they share literature that they found, NO! But they did write and collaborate on proposals. Even though they were working together each had it’s own individual data. So where is the data? If it was all individual with no data repositories, how would they all find someone’s data in case of emergency?
Steve gives another slogan: Least effort to best share, he shows how in these situations a Power/Status is based whereby a senior person will add parts to it and it will be disjointed, however he knows the student further down the line will clean it up!
Another area in which they noted a significant use was in the respondents Digital Infrastructure: they used Email, always! Steve gave a comparison to people’s Digital Infrastructure he says “Its like a Kitchen everyone has one yet everyones is just different, same items in the kitchen just different model! What is also mentioned is that for America it is a National Priority CI! A COLLABORATION COOKBOOK! Questions need to be asked: Who will own it and how to store it and get access to it? Steves solution is to build more integrated tools, connect them together! You could OUTSOURCE! However they have no concept of good data practise, where is your data! Who could find it if you are no longer there!
Steve opened up a whole can of worms within this study as it is so relevant to everything that is happening online at this moment, my computer has a lot of important data, my netbook also has similar yet a bit more on it from my undergrad and my USB key is a whole other story. There is my data, but where is everything? Could my partner find my Capstone documents if I got sick to give to my other group members so they can work on it? Take this blog for example where is all these posts going to go once I lose interest? Do I copy and paste them into a word doc so I can keep them forever on another file, or print them out and keep them in a box?
All of the above needs to be addressed, Big Data is out there, and Kalpana is right we don’t know how to handle it, as they are too many threads, we can’t connect them quick enough as they grow too quickly, we can’t get the whole world to stop creating Data so we can “contain” it in order to get a “hold”on it.
So what do we do! Stay tuned I am intrigued 🙂