Szeptember 26-án tartjuk következő meetupunkat. A rendezvény ingyenes, de kérünk mindenkit, hogy előzetesen regisztráljon itt.
Az előadók listája Benczúr Andrással bővült, aki az MTA SZTAKI Adatbányászat és Keresés Csoportjának laboratóriumi vezetője.
The LAWA Project: Towards a Virtual Web Observatory
The LAWA project on Longitudinal Analytics of Web Archive data builds an Internet-based experimental testbed for large-scale data analytics. Its focus is on developing a sustainable infra-structure, scalable methods, and easily usable software tools for aggregating, querying, and analyzing heterogeneous data at Internet scale for a deep understanding of Internet content characteristics (size, distribution, form, structure, evolution, dynamic).
I will show how far this (overly) ambitious project led us, what are the main achievements and blockers that we have identified. Some of the first (but really preliminary) demos are already up, http://vwo.lawa-project.eu:8080/. Some limitations of current systems for distributed data analysis, especially of Hadoop, are in part resolved. However archival institutions still lack an easy-to-deploy, high quality and stable Web scale search solution and now we are trying to gather forces in collaboration with the Stratosphere project (www.stratosphere.eu) and also for scaling the SZTAKI plagiarism detection service (www.kopi.sztaki.hu) over the BonFIRE experimental cloud.Short bioAndras Benczur received his Ph.D. at the Massachusetts Institute of Technology in applied mathematics in 1997. Since then he is researcher at the Institute for Computer Science and Control of the Hungarian Academy of Sciences (MTA SZTAKI) where he heads the Informatics Laboratory of 30 researchers since 2008. The lab participates in international research and national industry projects in information retrieval and business intelligence. Among others his research on Web information retrieval was honored by a Yahoo! Faculty Research Grant, he lead the KDD Cup 2007 winner team and organized the ECML/PKDD 2010 Discovery Challenge on Web Quality.