# inferno

**Repository Path**: mirrors_mozilla/inferno

## Basic Information

- **Project Name**: inferno
- **Description**: INACTIVE - http://mzl.la/ghe-archive - A rule-based map-reduce scheduling framework
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2020-08-22
- **Last Updated**: 2026-03-21

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

Inferno
=======

Inferno is

1. A **query language** for large amounts of **structured text** (csv, json, etc).
2. A continuous and scheduled **map-reduce daemon** with an HTTP
interface that automatically launches map/reduce jobs to handle a
constant stream of incoming data.

Internally, Inferno uses [Disco](http://discoproject.org/) for launching
map-reduce jobs and operating on big data.

Inferno Query Language
----------------------

In its simplest form, you can think of Inferno as a query language for large
amounts of structured text.  This structured text could be a CSV file, or a
file containing many lines of valid JSON, etc.  For example, consider the
following list of people:

    {"first":"Homer", "last":"Simpson"}
    {"first":"Manjula", "last":"Nahasapeemapetilon"}
    {"first":"Herbert", "last":"Powell"}
    {"first":"Ruth", "last":"Powell"}
    {"first":"Bart", "last":"Simpson"}
    {"first":"Apu", "last":"Nahasapeemapetilon"}
    {"first":"Marge", "last":"Simpson"}
    {"first":"Janey", "last":"Powell"}
    {"first":"Maggie", "last":"Simpson"}
    {"first":"Sanjay", "last":"Nahasapeemapetilon"}
    {"first":"Lisa", "last":"Simpson"}
    {"first":"Maggie", "last":"Términos"}

If you had this same data in a database, you would just use SQL to query it.

    > SELECT last_name, COUNT(*) FROM users GROUP BY last_name;

    Nahasapeemapetilon, 3
    Powell, 3
    Simpson, 5
    Términos, 1

Or if the data was small enough, you might just use command line utilities.

    $ awk -F ',' '{print $2}' people.csv | sort | uniq -c

    3 Nahasapeemapetilon
    3 Powell
    5 Simpson
    1 Términos

However, those methods do not necessarily scale when you are processing
terabytes of data per day.

Here's what a similar query in Inferno looks like.  Assuming that the input data
is in Disco distributed filesystem with the 'example:chunk:users' tag.  We
create the following rule and put it in names.py:

    InfernoRule(
        name='last_names_json',
        source_tags=['example:chunk:users'],
        map_input_stream=chunk_json_keyset_stream,
        parts_preprocess=[count],
        key_parts=['last'],
        value_parts=['count'],
    )

Then we query the data as follows:

    $ inferno -i names.last_names_json

    last,count
    Nahasapeemapetilon,3
    Powell,3
    Simpson,5
    Términos,1

Daemon Mode
-----------

You can also run Inferno in **daemon mode**. The Inferno daemon will
continuously monitor the blobs in DDFS and launch new map/reduce jobs to
process the incoming blobs as the minimum blobs counts are met.
Here is the Inferno daemon in action. Notice that it skips the first
**automatic rule** because the minimum blob count was not met. The next
automatic rule's blob count was met, so the Inferno daemon processes those
blobs and then persists the results to a data warehouse.

    $ sudo start inferno
    2012-03-27 31664 [inferno.lib.daemon] Starting Inferno...
    ...
    2012-03-27 31694 [inferno.lib.job] Processing tags:['incoming:server01:chunk:task']
    2012-03-27 31694 [inferno.lib.job] Skipping job task_stats_daily: 8 blobs required, have only 0
    ...
    2012-03-27 31739 [inferno.lib.job] Processing tags:['incoming:server01:chunk:user']
    2012-03-27 31739 [inferno.lib.job] Started job user_stats@534:d6c58:d5dcb processing 1209 blobs
    2012-03-27 31739 [inferno.lib.job] Done waiting for job user_stats@534:d6c58:d5dcb
    2012-03-27 31739 [rules.core.database] user_stats@534:d6c58:d5dcb: Saving user_stats_daily data in /tmp/_defaultdESAa7
    2012-03-27 31739 [rules.core.database] user_stats@534:d6c58:d5dcb: Finished processing 240811902 lines in 5 keysets.
    2012-03-27 31739 [inferno.lib.archiver] Archived 1209 blobs to processed:server01:chunk:user_stats:2012-03-27

Read More
-------------
[More about the daemon mode](doc/daemon.rst)

[On Inferno Keysets](doc/keyset.rst)

[Count Last Name Example](doc/counting.rst)

[Campaign Finance Example](doc/election.rst)

[Inferno Settings](doc/settings.rst)

Build Status: [Travis-CI](http://travis-ci.org/chango/inferno) ![Travis-CI](https://secure.travis-ci.org/chango/inferno.png)