stdin/stdout, Large Data, etc.
Posted: Wed May 23, 2012 1:17 pm
Hi Dave,
I have taken Rel for a spin, and I like it! I have been using the DBrowser to interact with a custom database specified with the -f switch.
Now, I wish to plan ahead for use-cases that I encounter at work. I would like to "data mine" gigabytes and potentially terabytes of data. It looks like Berkeley DB provides the persistence, and so I think that gigabytes are possible.
Q1. Is there anything I should watch out for? For example, I am tempted to try to sandwich in -Xmx4g (make Java use 4 gigabytes of heap space)
Q2. I would like to have another program synthesize the input into Rel, such as gigabytes of insert statements. Then I would like to use Tutorial D to query, and then parse the resulting output. Is there a mode of operation that uses a) pipes, such as stdin and stdout, and/or b) a network socket
e.g.:
cat inserts.d | java -Xmx16g -jar Rel.jar -f/tmp/mydb > /dev/null
cat query.d | java -Xmx16g -jar Rel.jar -f/tmp/mydb > query.out
I can get this to work, but I would like to ask in case danger lies ahead, known limitations and what not.
Anyway, these are my questions. I don't have much concern for computer time/efficiency, but I care mostly about making it possible and easiest to get correct results using big data, and then move the result down the pipeline.
Many compliments and thanks for this program!
Best regards,
Eric
I have taken Rel for a spin, and I like it! I have been using the DBrowser to interact with a custom database specified with the -f switch.
Now, I wish to plan ahead for use-cases that I encounter at work. I would like to "data mine" gigabytes and potentially terabytes of data. It looks like Berkeley DB provides the persistence, and so I think that gigabytes are possible.
Q1. Is there anything I should watch out for? For example, I am tempted to try to sandwich in -Xmx4g (make Java use 4 gigabytes of heap space)
Q2. I would like to have another program synthesize the input into Rel, such as gigabytes of insert statements. Then I would like to use Tutorial D to query, and then parse the resulting output. Is there a mode of operation that uses a) pipes, such as stdin and stdout, and/or b) a network socket
e.g.:
cat inserts.d | java -Xmx16g -jar Rel.jar -f/tmp/mydb > /dev/null
cat query.d | java -Xmx16g -jar Rel.jar -f/tmp/mydb > query.out
I can get this to work, but I would like to ask in case danger lies ahead, known limitations and what not.
Anyway, these are my questions. I don't have much concern for computer time/efficiency, but I care mostly about making it possible and easiest to get correct results using big data, and then move the result down the pipeline.
Many compliments and thanks for this program!
Best regards,
Eric