AirVantage asset data commands are sometimes very slow to reach target

nick · May 25, 2018, 5:39pm

Hello all,

I’ve recently been working on issuing commands to our hardware using AirVantage asset data commands and the AirVantage REST API to issue these commands. For the most part, it works well and responds quickly. However, roughly 1 in 10 requests seem to take up to 5 minutes to arrive on the hardware. Is this normal/expected behaviour? If so, where is the bottleneck?

Any help is greatly appreciated. Let me know if you any logs or debug info would be helpful and I can provide it.

Cheers!

nick · May 25, 2018, 6:05pm

Example operation that took ~2 minutes

nick · May 29, 2018, 9:24pm

Hey all,

I’m still seeing this, but it seems worse lately. It’s not uncommon for them to just never get delivered period now.

I’ve been sure to check that my target and operations are using the same path:

May 29 21:00:01 swi-mdm9x15 user.info Legato:  INFO | avcDaemon[1019]/avcDaemon T=main | avData.c le_avdata_AddResourceEventHandler() 2460 | Registering handler on /syncConfig/commands/takePhoto

drewwestrick · May 30, 2018, 12:03am

@nick

I have had similar issues with commands taking a long time to execute. What version of Legato are you currently running? I’ve seen much better AirVantage connection performance (upstream & downstream) using 18.04 with the mangOH legato patches applied.

nick · May 30, 2018, 12:50am

Hey @drewwestrick,

We’re running the same setup (18.04 and patches) and it still performs poorly. It seems today that none of our requests are getting delivered period.

drewwestrick · May 30, 2018, 1:46am

@nick

Yeah, I’ve definitely seen things be hit or miss with AV especially on the data receive side of things.

I’m assuming you can do a cm data connect and ping 8.8.8.8 with no problems?

Is it just on the data receive side of things or are you also seeing any delays / issues on pushing messages?

asyal · May 30, 2018, 4:14am

@rjacolin can you look into this?

asyal · May 30, 2018, 4:22am

@dclark75 can you look into this as well?

@nick can you share the log files?

nick · May 30, 2018, 9:02am

Hey @drewwestrick,

Units are connecting fine as we can see them pushing time series data to AirVantage.

Overall I really like the design and API of AirVantage, but reliability from the target has been marginal at best. Pushing data (primarily with time series) works well for the most part, but we have seen publishes begin to fail after being deployed in the field for a while. We have a watchdog in place to catch this (reboot fixes it), but it’s annoying and suboptimal that the device is required to reboot regularly.

Sending data to the target has been much more unpredictable. A few days ago I was receiving commands almost instantly, then delayed by 2 to 10 minutes, and then not at all. I’m not really sure how to explain this variance. We’ve been sure to keep Legato up to date along with those patches in the hopes that this improves, but it’s still proving pretty problematic.

I have been strongly considering prototyping a connection service implemented over MQTT (@dfrey already did some work on an MQTT client). AirVantage provides excellent abstraction that is powerful to developers, but abstraction is frustrating if it performs unpredictably.

nick · May 30, 2018, 9:03am

Which logs are of primary interest? Things look normal from the target when this problem occurs (connected and pushing data).

dclark75 · May 31, 2018, 2:05pm

To all,

A problem here may be related with the patch 18.04.0/wakaama/0001-Fix-processing-of-duplicate-message-IDs-p2.patch. Try removing this patch from the Legato 18.04 and rebuild. Before I went away on vacation the Legato team showed me there is a problem with this patch. FYI though that removing this patch will add back in a bug where multiple messages may make it to the device due to UDP communication not filtering duplicate messages. This work is in progress for 18.05

NhonChu · May 31, 2018, 5:45pm

Hi @nick,

Do pending commands get delivered to your target when:

you start a new connection to AirVantage?
you send data to AirVantage?

When there is no data exchange for a while (20s or more), the network might release the NAT. In this event, AirVantage can no longer send commands to your device (as external IP address of the device is no longer valid). IP will be refreshed on the next upstream data (new connection or data posting).

Regards
Nhon

nick · May 31, 2018, 9:02pm

Hey @NhonChu, it really does not seem to make a difference. I even tried sending a command while a unit was off, and then watched the logs while it connected to AirVantage and it never saw the command.

nick · May 31, 2018, 9:03pm

Hey @dclark75,

I reverted that patch. Did not really seem to make much of a difference.

NhonChu · June 1, 2018, 7:43am

Hi @nick,

Could you enable the log with debug option turned on (log level debug).
Then try sending a command while the device is not connected, and then capture logs while it connect to AirVantage

Thanks

nick · June 3, 2018, 6:12am

Hey @NhonChu,

Here are the logs for avcService after stopping it, setting log level debug, sending a command, and then starting avcService again.

gist.github.com

https://gist.github.com/nvandoorn/963fd3aa57266be288b79f43c8c722ad

avclogs.log

Jun  3 06:00:38 swi-mdm9x15 user.debug Legato:  DBUG | avcDaemon[1317]/avcDaemon T=main | osDebug.c lwm2m_printf() 84 | [omanager_ObjectsFree:1432] Wakaama RegisteredObjNb 0
Jun  3 06:00:38 swi-mdm9x15 user.debug Legato:  DBUG | avcDaemon[1317]/avcDaemon T=main | osDebug.c lwm2m_printf() 84 | [lwm2m_init:69] Entering
Jun  3 06:00:38 swi-mdm9x15 user.debug Legato:  DBUG | avcDaemon[1317]/avcDaemon T=main | osTime.c lwm2m_gettime() 31 | Device time: 1528005638
Jun  3 06:00:38 swi-mdm9x15 user.debug Legato:  DBUG | avcDaemon[1317]/avcDaemon T=main | osDebug.c lwm2m_printf() 84 | [lwm2mcore_Init:946] Init done -> context 0xb70310c8
Jun  3 06:00:38 swi-mdm9x15 user.debug Legato:  DBUG | avcDaemon[1317]/framework T=main | le_data_client.c DoConnectService() 363 | ======= Starting another client for 'avcDaemon.avcDaemon.le_data' service ========
Jun  3 06:00:38 swi-mdm9x15 user.debug Legato:  DBUG | avcDaemon[1317]/framework T=main | LE_FILENAME le_mem_ForceAlloc() 841 | Memory pool 'avcDaemon.AssetData client' overflowed. Expanded to 1 blocks.
Jun  3 06:00:38 swi-mdm9x15 user.debug Legato:  DBUG | avcDaemon[1317]/framework T=main | LE_FILENAME le_mem_ForceAlloc() 841 | Memory pool 'avcDaemon.AssetData Path' overflowed. Expanded to 1 blocks.
Jun  3 06:00:38 swi-mdm9x15 user.debug Legato:  DBUG | avcDaemon[1317]/framework T=main | LE_FILENAME le_mem_ForceAlloc() 841 | Memory pool 'avcDaemon.AssetData_t' overflowed. Expanded to 1 blocks.
Jun  3 06:00:38 swi-mdm9x15 user.debug Legato:  DBUG | avcDaemon[1317]/framework T=main | LE_FILENAME le_mem_ForceAlloc() 841 | Memory pool 'avcDaemon.AssetData Path' overflowed. Expanded to 2 blocks.
Jun  3 06:00:38 swi-mdm9x15 user.debug Legato:  DBUG | avcDaemon[1317]/framework T=main | LE_FILENAME le_mem_ForceAlloc() 841 | Memory pool 'avcDaemon.AssetData_t' overflowed. Expanded to 2 blocks.

This file has been truncated. show original

I did have a hard time grabbing these logs as my WP85 did not want to connect. I kept getting stuck with this:

root@swi-mdm9x15:~# cm radio
Power:                 OFF
Current Network Operato
Current RAT:           Not available
Status:                Not registered and not currently searching for new operator (LE_MRC_REG_NONE)
Signal:                No signal strength (0)
PS:                    Packet Switched Unknown state (LE_MRC_REG_UNKNOWN)

I tried invoking cm radio off, cm radio on, and cm data connect to get it online, but it didn’t seem to work. I got it working again after a few reboots. I’m starting to think their may be an issue in the modem service somewhere. I’ve been considering stripping out the wifiService dependency since we are not using wifi right now.

Cheers!

NhonChu · June 4, 2018, 11:22am

Hi @nick,
Thank you for capturing the log.

Is “/syncConfig/commands/takePhoto” the command you sent from AirVantage ?
According to the log, your device receives this command twice:

Jun 3 06:01:15 swi-mdm9x15 user.info Legato: INFO | avcDaemon[1317]/avcDaemon T=main | avData.c AvServerRequestHandler() 2393 | >>>>> Request Uri is: [/syncConfig/commands/takePhoto]
Jun 3 06:01:15 swi-mdm9x15 user.debug Legato: DBUG | avcDaemon[1317]/avcDaemon T=main | avData.c ProcessAvServerExecRequest() 2304 | >>>>> COAP_POST - Server executes a command on device

The second one is at 06:01:16.

Does the command remain in “in progress” on the portal ?

Thanks

nick · June 4, 2018, 4:57pm

Hey @NhonChu,

In this particular instance it did receive the command and AirVantage showed one as failed and one as completed. It definitely works sometimes, but to date, performance has been too unreliable for this to be something we can ship to customers.

This seems to really come down to the connection state of the target. If the target is freshly powered on, and freshly connected (with good cell signal), things seem to work OK. Units that have been deployed in the field, or units that have a lousy connection (~2 bars) seem to receive commands very late or not at all.

Prush · June 5, 2018, 8:04pm

Hi Nick,

Unless we get a log that points to an issue on the device, it is hard to answer. I would try though.

As you know the response to a command is sent asynchronously from the app. From the logs it is clear that the device received 2 commands and responded to both of them.

Jun 3 06:01:15 swi-mdm9x15 user.debug Legato: DBUG | avcDaemon[1317]/avcDaemon T=main | osDebug.c lwm2m_printf() 84 | [prv_async_response:832] total payload length = 0
Jun 3 06:01:16 swi-mdm9x15 user.debug Legato: DBUG | avcDaemon[1317]/avcDaemon T=main | osDebug.c lwm2m_printf() 84 | [prv_async_response:832] total payload length = 0

But I don’t know the response code sent from the app. You indicated that one operation failed and the other was successful. This could be due to the error code sent from the app using le_avdata_ReplyExecResult(). Also, it is important to note that if the app fails to respond, the next job on the server could be stuck waiting for the current operation to complete. Can you please track the response codes and timeline of app’s responses?

The original issue you reported is centered around performance and commands taking a long time to arrive. I don’t see a bottleneck in your use case. However, CoAP retransmission is slow. If you have a lossy network, it is possible that the NAT is released before all retransmission attempt is complete. At this point, the device can receive command only at next successful connection. How frequently do you initiate a connection to the server?

Time series or data push will work even when NAT is released as the device performs DTLS resume to refresh the connection.

Can you clarify on the following items to verify if the issue is related to poor network connection?

What is the signal strength when you see this issue?
The network operator and NAT_TIMEOUT (depends on operator)?
What is the ping time (to get a sense of latency of the network). On my device, I have the following compared to 15 ms on my Desktop.

root@swi-mdm9x28:~# ping bs.airvantage.net
PING bs.airvantage.net (52.24.215.63): 56 data bytes
64 bytes from 52.24.215.63: seq=0 ttl=233 time=49.216 ms
64 bytes from 52.24.215.63: seq=1 ttl=233 time=48.255 ms

Regards,
Prush

nick · June 5, 2018, 8:33pm

Hey @Prush,

The logs provided show the system functioning as intended. The failed response in that instance is expected (our app will decline a request if its already busy).

Our devices initiate a connection and AirVantage session on power up and thats it. The intent is for the device to stay online and connected at all times.

We’ve seen it happen at all signal strength levels, but it seems to happen more frequently when the signal goes below 2 bars.
We are using Sierra SIM cards, and they usually seem to connect to Rogers around here. How do I find out about NAT_TIMEOUT?

root@swi-mdm9x15:~# ping bs.airvantage.net
PING bs.airvantage.net (52.24.215.63): 56 data bytes
64 bytes from 52.24.215.63: seq=0 ttl=234 time=616.573 ms
64 bytes from 52.24.215.63: seq=1 ttl=234 time=575.522 ms
64 bytes from 52.24.215.63: seq=2 ttl=234 time=534.046 ms
64 bytes from 52.24.215.63: seq=3 ttl=234 time=494.674 ms
64 bytes from 52.24.215.63: seq=4 ttl=234 time=994.170 ms
64 bytes from 52.24.215.63: seq=5 ttl=234 time=713.535 ms
64 bytes from 52.24.215.63: seq=6 ttl=234 time=673.096 ms
64 bytes from 52.24.215.63: seq=7 ttl=234 time=511.857 ms

Thanks for the help @Prush. Let me know if you need any additional information. We are very motivated to resolve these issues.

Topic		Replies	Views
Ensuring reliable cell connection and retrying? mangOH Red	35	2169	January 22, 2018
MangOH red - Getting Started - Updates only on "Start" of redCloud mangOH Red	17	673	June 15, 2020
Problems in Selective data sending to Airvantage server WP85 Cloud & Connectivity	0	795	November 29, 2017
Check if SMS is enabled mangOH Red	20	2863	August 14, 2020
Airvantage Application update failes with code 500 from avcDaemon Services	5	1746	April 6, 2017

AirVantage asset data commands are sometimes very slow to reach target

Related topics