Text Mining – THATCamp Wellington 2013 http://wellington2013.thatcamp.org Just another THATCamp site Wed, 04 Dec 2013 22:46:14 +0000 en-US hourly 1 https://wordpress.org/?v=4.9.12 Very rough notes from the Digitising NZ books & Audience session http://wellington2013.thatcamp.org/2013/11/28/very-rough-notes-from-the-digitising-nz-books-audience-session/ Thu, 28 Nov 2013 04:25:28 +0000 http://wellington2013.thatcamp.org/?p=316 Continue reading ]]>

Follows is rough minutes from this session. Apologies if this is wrong or misleading – I did not recognise half of it 🙂

In 1950-59, 4037 publications in NZ (publication = pamphlet of 5+ pages, or a novel). Where are they, how do we get hold of them, and can they be freed?

  • Publications NZ – has all of the things, in MARC. Federated to Worldcat. US Copyright law restricts stuff since 1870.
  • Some stuff will need consultation with Iwi
  • Reclaiming New Zealand’s Digitised Heritage project – Kiwi Alex
  • Today’s bibliographies from libraries may not match online or physical storage. Stuff can be in stack, lost, moved, destroyed, storage, etc. It may be possible to use cross-catalogue data or cross-media data to track books down (e.g. mentions in Papers Past)
  • Maybe leave out unpublished data
  • How do we get institutions to lend the books to digitise? NL won’t lend out valuable material without conservator reporting.

What format, and how do we make it text-searchable?

  • Agree on something like METS-ALTO and DC, and federate with OAI-PMH and/or use Digital NZ.
  • NL scanned 300dpi colour TIFF images per page, into PDF with page image + OCR.
  • e-books in EPUB, which is (more or less) zipped HTML. Kindle uses MobiPocket, another format based on Open eBook.
  • OPDS is a syndication format – like RSS but for e-books.
  • stats.govt.nz/ ← fully searchable open access (XML) yearbook data.
  • TEI XML can be huge and possibly redundant for many use cases, but cab be used to embed contextual semantic data – www.tei-c.org/
  • Gutenberg Project offer many formats – but some are auto-generated from a master

How do we make it available online?

  • Some data will be very specific and of little commercial or even research value.
  • Others will have high commercial value

Some have been digitised already.

  • Digitisation efforts are already under way at National Library.
  • Some stuff is in Google Books/Hathi Trust, but sourced from the US (little comms with Nat Lib NZ), can be tricky to get stuff from them
  • Some is public domain, some copyrighted

Do we want to cover periodicals?

  • Quite probably yes!
  • RILM are going about digitising.

Who will host it, who will “own” it and maintain it over time?

  • National Library seems the most sensible fit
  • Public/Private collaboration; should we delete data that is objected to by one or two stakeholders, or preserve but restrict public access?
    • Reliance on corporate law and upholding contracts
    • Could the private organisations be not-for-profit, or similarly chartered?
    • Retain public ownership

Audience

Where is the data for matching your message, with the right audience, and the platform they tend to use? For example for 16 year-olds, we need Facebook, but not so much for retirees?!

  • Media studies – in primary and secondary education, there are “bring your own device” initiatives which may have good data.
  • Sometimes lack of demand is because people don’t know it’s available and/or don’t know they’re looking for it
  • Local information could be sliced-and-diced by locality, person, and so on (semantic metadata) and be highly relevant to the punters.
]]>
What do you want to learn more about? http://wellington2013.thatcamp.org/2013/09/12/learn/ Thu, 12 Sep 2013 23:36:45 +0000 http://wellington2013.thatcamp.org/?p=162 Continue reading ]]>

Welcome to the the first keen bean campers, who have shared why they’re looking forward to THATCamp Wellington 2013. Perhaps you share similar interests…

  • Data visualisation
  • Social networks
  • Text mining
  • Ontologies
  • Access
  • Crowdsourcing
  • Hacking/coding for newbies
  • Hear about new projects
  • Innovation in the humanities
  • New digital tools and technologies

What would you like to add to the mix? Share your ideas when you sign up here.

W13 is being held on Thursday 28 November, in the Railway Building on Pipitea Campus, Victoria University of Wellington (Bunny Street).

Thanks to the generosity of our major sponsor InternetNZ, and the help of Victoria University of Wellington and Wai-te-ata Press, we’re able to keep registration to a budget-friendly $25. This will be collected on the day, and invoices and receipts can be arranged then too.

We’ll be rolling up our sleeves for a full day, from around 8.30am to 5.30pm, with post-match drinks to follow. The schedule is what we make it, but the day will roughly break down like this.

Once you’ve signed up, start thinking about the kind of session you’re going to propose. To help you get started, check out these tips.

Questions? Ideas? Drop me a line any time at thatcampwgtn@gmail.com or @thatcampwgtn

]]>