Sendfil Service #512

asutula · 2021-03-03T00:46:13Z

This is going to be merged into #480 and integrated into the filrewardsd service code, but splitting this out into a new PR just for clarity.

Ok code all commented.

This PR includes:

Service for sending fil, listing transactions, as well as a summary api for reporting.
Persists all txn data in mongo
Utilizes lotus StateWaitMsg and a ticker to make sure all transaction records are updated with the latest data available from lotus.
Supports a wait param when sending fil and getting a txn record so the caller can be notified when the message is active on chain.

Lots of detail left in the code comments.

Signed-off-by: Aaron Sutula <hi@asutula.com>

api/sendfild/main.go

Signed-off-by: Aaron Sutula <hi@asutula.com>

textileben

The Docker sections look good to me.

You might want to have someone else review the Filecoin portion as I'm not familiar with how the network messaging works.

Signed-off-by: Aaron Sutula <hi@asutula.com>

api/sendfild/client/client.go

asutula · 2021-03-03T20:28:13Z

api/sendfild/main.go

+			"listenAddr": {
+				Key:      "listen_addr",
+				DefValue: "127.0.0.1:5000",
+			},
+			"lotusAddr": {
+				Key:      "lotus_addr",
+				DefValue: "127.0.0.1:7777",
+			},


Notice these are now just plain host:port, no multiaddr. This depends on an open Powergate PR.

asutula · 2021-03-03T20:30:04Z

api/sendfild/pb/sendfil.proto

+service SendFilService {
+  rpc SendFil(SendFilRequest) returns (SendFilResponse) {}
+  rpc Txn(TxnRequest) returns (TxnResponse) {}
+  rpc ListTxns(ListTxnsRequest) returns (ListTxnsResponse) {}
+  rpc Summary(SummaryRequest) returns (SummaryResponse) {}
+}


Here's the API. Pretty straight forward, lets you send fil, query for a specific transaction, list transactions, and get a summary. The summary is mostly intended for monitoring so we can keep an eye on things.

asutula · 2021-03-03T20:30:40Z

api/sendfild/pb/sendfil.proto

+  google.protobuf.Timestamp after = 1;
+  google.protobuf.Timestamp before = 2;


You can generate a summary for a range of time or if you don't provide any constraints, all time.

asutula · 2021-03-03T20:33:01Z

api/sendfild/pb/sendfil.proto

+  int64 count_txns = 1;
+  int64 count_pending = 2;
+  int64 count_active = 3;
+  int64 count_failed = 4;
+  int64 count_waiting = 5;
+  int64 count_from_addrs = 6;
+  int64 count_to_addrs = 7;
+  int64 total_nano_fil_sent = 8;
+  double avg_nano_fil_sent = 9;
+  int64 max_nano_fil_sent = 10;
+  int64 min_nano_fil_sent = 11;


We can discuss how these should all be interpreted, but the idea is to have an idea of what "normal" looks like and keep an eye out for any deviations from that. A monitoring system can generate a summary, say every 30 min for the last hour (or more frequently), and we'll get some rolling data which could be nice.

asutula · 2021-03-03T20:33:54Z

api/sendfild/pb/sendfil.proto

+
+message TxnRequest {
+  string message_cid = 1;
+  bool wait = 2;


Both SendFil and Txn take a wait parameter. If true, the response won't come until the transaction is active on chain or times out.

OK, so confirmed some comment above. Just a thought about the risky-ness of the API timing out or having some network error. I don't have the big picture yet, but maybe that's recoverable by querying later and discovering the sendfil pending/succeeding even if the client never got the initial API return (due to the timeout/network error)?

Yea exactly. We use a reasonable timeout while waiting for the message in the lotus api. If that timeout expires, the error returned here will say that is what happened. The client is free to make another request. The service will also retry any previously timed out waits.

asutula · 2021-03-03T20:34:31Z

api/sendfild/pb/sendfil.proto

+
+enum MessageState {
+  MESSAGE_STATE_UNSPECIFIED = 0;
+  MESSAGE_STATE_PENDING = 1;


The message hasn't been detected as active on chain.

asutula · 2021-03-03T20:34:40Z

api/sendfild/pb/sendfil.proto

+enum MessageState {
+  MESSAGE_STATE_UNSPECIFIED = 0;
+  MESSAGE_STATE_PENDING = 1;
+  MESSAGE_STATE_ACTIVE = 2;


Self explanatory.

asutula · 2021-03-03T20:35:01Z

api/sendfild/pb/sendfil.proto

+  MESSAGE_STATE_UNSPECIFIED = 0;
+  MESSAGE_STATE_PENDING = 1;
+  MESSAGE_STATE_ACTIVE = 2;
+  MESSAGE_STATE_FAILED = 3;


This is a final state that can't be recovered from. The message is permanently marked as failed.

The Txn.failure_msg will explain why.

asutula · 2021-03-03T20:35:59Z

api/sendfild/pb/sendfil.proto

+message SendFilRequest {
+  string from = 1;
+  string to = 2;
+  int64 amount_nano_fil = 3;


Note the use of nano fil everywhere in this api.

api/sendfild/service/service.go

asutula · 2021-03-03T21:35:26Z

api/sendfild/service/service.go

+		if err != nil {
+			ch <- waitResult{err: fmt.Errorf("creating lotus client: %v", err)}
+			return
+		}


Anywhere we come across a temporary or recoverable error, we send that error on the result chan and return. In that case, the txn will be waited again in the future.

asutula · 2021-03-03T21:37:30Z

api/sendfild/service/service.go

+		lastestMsgCid, err := tx.latestMsgCid()
+		if err != nil {
+			tx.MessageState = pb.MessageState_MESSAGE_STATE_FAILED
+			tx.FailureMsg = fmt.Sprintf("getting latest message cid from txn: %v", err)
+			tx.UpdatedAt = time.Now()
+			log.Warnf("failing txn with err: %s", tx.FailureMsg)
+			if err := s.updateTxn(ctx, tx); err != nil {
+				ch <- waitResult{err: err}
+				return
+			}
+			ch <- waitResult{txn: tx}
+			return
+		}
+		c, err := cid.Decode(lastestMsgCid.Cid)
+		if err != nil {
+			tx.MessageState = pb.MessageState_MESSAGE_STATE_FAILED
+			tx.FailureMsg = fmt.Sprintf("decoding latest message cid for txn: %v", err)
+			tx.UpdatedAt = time.Now()
+			log.Warnf("failing txn with err: %s", tx.FailureMsg)
+			if err := s.updateTxn(ctx, tx); err != nil {
+				ch <- waitResult{err: err}
+				return
+			}
+			ch <- waitResult{txn: tx}
+			return
+		}


If for some strange reason we can't make sense of the cid associated with the txn, that is a fatal/final error condition.

api/sendfild/service/service.go

asutula · 2021-03-03T21:53:05Z

api/sendfild/service/service_test.go

+	"google.golang.org/grpc/status"
+	"google.golang.org/grpc/test/bufconn"
+)
+


Tried to be very thorough for these tests. There are an unreasonable number of permutations of all the ListTxns options, so just tried to test the obvious ones there.

asutula · 2021-03-03T21:53:46Z

api/sendfild/service/service_test.go

+	os.Exit(exitVal)
+}
+
+func TestRestartWaiting(t *testing.T) {


This is a very important one. It tests that the waiting state is properly initialized upon service restart.

asutula · 2021-03-03T21:56:47Z

api/sendfild/service/service_test.go

+		more := true
+		var moreToken int64 = 0
+		for more {
+			req := &pb.ListTxnsRequest{CreatedAfter: txFirst.CreatedAt, CreatedBefore: txLast.CreatedAt, Limit: 3, Ascending: ascending}


This paging test makes sure that paging works properly when CreatedAfter and CreatedBefore constraints are also provided, both ascending and descending. This was tricky because paging also internally uses a created_at filter via the MoreToken... so it works!

Signed-off-by: Aaron Sutula <hi@asutula.com>

jsign

LGTM, but I left some comments in the review for some consideration. But maybe the plan is merging and continue in other PR maybe?

Some extra question: Are there some plans to include some sanity checks regarding big send fil amounts or similar? As to avoid errors ins getting clients sending too much FIL maybe by error or something out of control?

BTW, I left comments in order as I was thinking things. So if any other does the review, can follow some line of thought, even if some extra comment is discovered later.

jsign · 2021-03-03T21:06:51Z

api/sendfild/Dockerfile

+COPY --from=0 /usr/lib/*-linux-gnu*/libcrypto.so* /usr/lib/
+
+# listenAddrs
+EXPOSE 7006


Should this be 5000 considering new standards?

jsign · 2021-03-03T21:07:29Z

api/sendfild/Dockerfile.dev

+EXPOSE 10006
+EXPOSE 8010


Same about 1006?
Not sure about 8010 should be changed.

api/sendfild/client/client.go

jsign · 2021-03-03T21:14:42Z

api/sendfild/client/client.go

+
+func SendFilWait() SendFilOption {
+	return func(req *pb.SendFilRequest) {
+		req.Wait = true


Early in review so I'm not entirely sure, but I can imagine that Wait blocks until confirmation.

Open question: should the "Wait" thing be encouraged if there's risk that the API timeouts and somehow the "send fil" continues after the API call errored? Maybe safer to discourage this use? (assuming my guess now of what "wait" means)

Ah good point. I think it is normal and fine to want to wait here, and there is a chance that the underlying lotus call to StateMsgWait will timeout. This doesn't meant the transaction failed or timed out, just that our monitoring of it timed out. We need to make that clear to the caller. I think providing a proper error message in this case is the key.

api/sendfild/main.go

api/sendfild/service/service.go

jsign · 2021-03-04T14:40:10Z

api/sendfild/service/service.go

+	return ch
+}
+
+func (s *Service) Close() {


I see that Close() doesn't wait for any go-routine created by wait(..) finishing it's job. I think that can't lead to a bad thing since partial writes doesn't leave things in a bad state.

The only thing I can imagine is that s.server.Stop(..) might timeout since there're current API calls waiting for that go-routine signaling that the txn got on chain. So most probably it can be common that the <t.C below will fire and kill the API processings (which I guess if fine for other APIs?).

Other option is to signal here in Close() any running go-routine from wait to stop early, so we avoid that timeout.

api/sendfild/service/service.go

api/sendfild/service/service_test.go

Signed-off-by: Aaron Sutula <hi@asutula.com>

asutula · 2021-03-08T22:27:26Z

Ok @jsign I fixed the basic things based on your feedback. I'm going to merge this into asutula/fil-rewards-bookkeeping and follow up with another PR that addresses the more complex issues of:

removing more from list results
dealing with no nanosecond precision in mongo for moreToken
correctly using channels to notify all interested callers of updates to messages

asutula added 6 commits February 24, 2021 18:06

first work on sendfil service

5f66eda

Signed-off-by: Aaron Sutula <hi@asutula.com>

nanofil, list and summary apis

6585907

Signed-off-by: Aaron Sutula <hi@asutula.com>

full tests and some bug fixes found along the way

3c893d8

Signed-off-by: Aaron Sutula <hi@asutula.com>

make timeout and confidence configurable

15e82f5

Signed-off-by: Aaron Sutula <hi@asutula.com>

client code

1b0aea5

Signed-off-by: Aaron Sutula <hi@asutula.com>

executable and docker config and makefile updates

10cb87d

Signed-off-by: Aaron Sutula <hi@asutula.com>

asutula requested a review from textileben March 3, 2021 00:46

asutula self-assigned this Mar 3, 2021

textileben reviewed Mar 3, 2021

View reviewed changes

api/sendfild/main.go Outdated Show resolved Hide resolved

asutula added 2 commits March 3, 2021 09:52

fix config bug

20450fa

Signed-off-by: Aaron Sutula <hi@asutula.com>

more config fix

a3275c2

Signed-off-by: Aaron Sutula <hi@asutula.com>

textileben approved these changes Mar 3, 2021

View reviewed changes

query for txns to be monitored on start up and recurring

f48c2e8

Signed-off-by: Aaron Sutula <hi@asutula.com>