Logging, Storing,Processing, CorrelatingA scalable logging infrastructure for the enterprise(with all the bells and whistles)Mario Schrön & Emre Bastuz

Log data - treasure andburden

A treasure . right! Log data is invaluable for Troubleshooting purposes Statistics and report creation Event detection (security events,outages, . )

Definitely a pain In an enterprise environment the storagerequired for the logs is tremendous Processing the data is not easy due to thehigh volume Querying the data is even harder

Legacy approachesput it all in a relational database and access it with a „Let sPHP webinterface“:been there, seen it, might or might not workput it all in flat files and process them with a Perl script „Let s(overnight)“:scales far better, maintaining this programatic approach ishardXYZ has implemented a logging server for his product“: „Vendorthey all do but the solutions are specific for a certaintechnology. Having to maintain logging systems for xvendors also does not scale

Wouldn t it be cool if we hada logging infrastructure with . . scalable storage . scalable log processing capacity . an interface for querying the data . an interface for correlating the data. and all of that usable with many device types anddevice vendors?

Open source to therescue There are some kick-ass technologies outthere Flume log collection Cassandra data storage Solr data indexing and searching OSSIM normalization and correlation

Open source to therescue There are some kick-ass technologies outthere Pig & PigLatin

What is „Flume“anyway? Flume is an open source project,implementing a distributed logging systemwith no single point of failure For further details please see:

What is „Cassandra“anyway? Cassandra is an open source project, implementing theconcept of a distributed NoSQL database It has been donated by Facebook to the public It s extremely „cloudish“ It s a multi master, massively scalable implementation It s pretty state of the art It s cool For further details please see:

What is „Solr“ anyway? Solr is an open source project, implementingan enterprise class search engine It s cool too For further details please see:

What is „OSSIM“anyway? OSSIM is an open source project,implementing a security information andevent management system, includingnormalization and correlation Includes support for log normalization for awide range of technologies For further details please see:

What is „Pig“ anyway? Pig is an interface for executing queriesagainst a cluster storage PigLatin is a minimalistic programminglanguage for specifying queries For further details please see:

Let s build this!

Finished 0%ReplicationReplicationMaster: FlumeSolr: SearchInternet

Flume ArchitectureOverview

Flume ArchitectureConfiguration

Cassandra IntroWhat it s not

Cassandra IntroWhat it s notselect Post.title,User.username from Post p, Useru where p.AuthorID u.ID andusername „hans“

Cassandra IntroKey-Value Datamodel A Column{// this is a columnname: "emailAddress",value: "[email protected]",timestamp: 123456789} A SuperColumn{}// this is a SuperColumnname: "homeAddress",// with an infinite list of Columnsvalue: {// note the keys is the name of the Columnstreet: {name: "street", value: "1234 x street", timestamp: 123456789},city: {name: "city", value: "san francisco", timestamp: 123456789},zip: {name: "zip", value: "94107", timestamp: 123456789},}

CassandraConclusion Relational databases are more complex butmore flexible NoSQL is simpler and more scalable SQL vs. NoSQL Flexibility vs. Scalability

SolrIndexing and Searching

Normalization andOSSIM OSSIM uses a relational database for logstorage Log data is split into different columns andsaved in the DB The fields are all the same for different kindsof technologies

OSSIM DB Schemamysql use ossim;Database changedmysql describe event; ----------------- ------------------ ------ ----- ------------------- ----------------------------- Field Type Null Key Default Extra ----------------- ------------------ ------ ----- ------------------- ----------------------------- id bigint(20) NO PRI NULL timestamp timestamp NO MUL CURRENT TIMESTAMP on update CURRENT TIMESTAMP tzone float NO 0 sensor text NO NULL interface text NO NULL type int(11) NO NULL . protocol int(11) YES NULL src ip int(10) unsigned YES NULL dst ip int(10) unsigned YES NULL src port int(11) YES NULL dst port int(11) YES NULL event condition int(11) YES NULL value text YES NULL . filename text YES NULL username text YES NULL password text YES NULL userdata1 text YES NULL userdata2 text YES NULL userdata3 text YES NULL userdata4 text YES NULL userdata5 text YES NULL userdata6 text YES NULL userdata7 text YES NULL userdata8 text YES NULL userdata9 text YES NULL . ----------------- ------------------ ------ ----- ------------------- ----------------------------- 40 rows in set (0.00 sec)

OSSIM Plugin Config OSSIM uses the concept of plugins Each plugin has a config file and datasource Many plugins with a data source „log“ areavailable:# cd /etc/ossim/agent/plugins# grep "source log" * wc -l118

OSSIM & LogsSupported Technologies# grep "source log" *apache.cfg:source logbluecoat.cfg:source logcisco-asa.cfg:source logcisco-ids.cfg:source logcisco-ips-syslog.cfg:source logcisco-nexus-nx-os.cfg:source logcisco-pix.cfg:source logcisco-router.cfg:source logcisco-vpn.cfg:source logf5.cfg:source logjuniper-srx.cfg:source logjuniper-vpn.cfg:source lognagios.cfg:source lognetscreen-firewall.cfg:source lognetscreen-igs.cfg:source lognetscreen-manager.cfg:source lognetscreen-nsm.cfg:source logpf.cfg:source logpostfix.cfg:source logtarantella.cfg:source logtippingpoint.cfg:source logtrendmicro.cfg:source log. and many many more

OSSIM Mapping of Log2 DB Schema

PigExample Std. Queryrows LOAD 'cassandra://Keyspace1/FlumeData' USINGCassandraStorage() AS (key, columns: bag {T: tuple(name, value)});counted foreach (group rows all) generate COUNT( 1);dump counted;

PigUDF based Query


Thank you! :-)