Context
- Splunk : Splunk light 6.2.6
- OS : MacOSX 10.10.3 (Yosemite)
Purpose
Index a multi-event JSON file with Splunk properly
Source JSON
I had a single line JSON, pretty print is only to ease the reading
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
{ "next_page": null, "comments": [ { "created_at": 1419252704.528995, "replies": [], "is_deleted": 0, "id": 22, "user": { "city": "Denton", "id": 4616, "name": "username2." }, "text": "Suggestions, dear?" }, { "created_at": 1419254321.088771, "replies": [], "is_deleted": 0, "id": 23, "user": { "city": "Glendale", "id": 2200, "name": "username" }, "text": "This is my text" } ] } |
PROPS.CONF file
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
[json-comment] pulldown_type = true KV_MODE = json description = For indexing JSON comments category = Structured TIME_PREFIX = \"created_at\"\:\s TRUNCATE = 0 LINE_BREAKER = }(,)\s{ SHOULD_LINEMERGE = false NO_BINARY_CHECK = true SEDCMD-remove_header = s/.*\"comments\":\s\[//g TIME_FORMAT = %10s disabled = false SEDCMD-add_closing_bracket = s/\s$/ }/g crcSalt = <SOURCE> SEDCMD-correctly-close = s/\}\s\]\s\}/}/g |
- TIME_PREFIX : Regular expression that tells Splunk where to find the date of the event. In my case it was : “Starting with double quotes, followed by ‘create_at’, followed by double quotes, followed by a colon and finally followed by a space”.
- TRUNCATE : Don’t truncate after reaching a length limit. I needed this because I had I single JSON line.
- LINE_BREAKER : When parsing the file, when should Splunk decide to create a new event ? In the above example, that would be between ‘}, {‘. Notice that the capture group only contains ‘,’. When reading the documentation, you will find out that everything before capture group is evicted from the final events.
- SHOULD_LINEMERGE : Don’t merge my events once you’re finishing with the extractions
- SEDCMD-remove_header: SED expression that will remove all the trash JSON before the ‘comments : []’ section
- TIME_FORMAT : Instructs Splunk what is the format it should expect when parsing ‘created_at’
- SEDCMD-add_closing_bracket : Since LINE_BREAKER above is removing our trailing ‘}’, we need to add it back to every event.
- crcSalt : Instructs Splunk to use the filename instead of the salt in order to estrablish the uniqueness of your file. I don’t need this, because I have no log rotation in place. Therefore it is safe to use the filename.
- SEDCMD-correctly-close : Since we did a cleanup in the beginning of the file by using SEDCMD-remove_header, we have to cleanup the end of the JSON. In plain english it says : “Replace ‘} ] }’ by ‘} }'”